Difference between revisions of "Open Problems:84"
(Created page with "{{Header |source=focs17 |who=Alon Orlitsky |title=Efficient Pattern Maximum Likelihood Computation }} Given a sequence of samples $\mathbf{s}=(s_1,\dots,s_n)\in\mathbb{N}^n$,...") |
m (Krzysztof Onak moved page Waiting:Efficient Pattern Maximum Likelihood Computation to Open Problems:84 without leaving a redirect: The problem is ready for publication) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
|source=focs17 | |source=focs17 | ||
|who=Alon Orlitsky | |who=Alon Orlitsky | ||
− | |||
}} | }} | ||
− | Given a sequence of samples $\mathbf{s}=(s_1,\dots,s_n)\in\mathbb{N}^n$, the ''sequence maximum likelihood'' estimator is the probability distribution over $\mathbb{N}$ | + | Given a sequence of samples $\mathbf{s}=(s_1,\dots,s_n)\in\mathbb{N}^n$, the ''sequence maximum likelihood'' (SML) estimator is the probability distribution over $\mathbb{N}$ that maximizes the probability of this sequence, i.e., |
$$ | $$ | ||
p^{\rm SML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} p^{\otimes n}(\mathbf{s}) | p^{\rm SML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} p^{\otimes n}(\mathbf{s}) | ||
=\arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \prod_{i=1}^n p(s_i)\,. | =\arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \prod_{i=1}^n p(s_i)\,. | ||
$$ | $$ | ||
− | It is not hard to show that the SML corresponds to the empirical distribution obtained from $(s_1,\dots,s_n)$. | + | It is not hard to show that the SML corresponds to the empirical distribution obtained from $(s_1,\dots,s_n)$, which can be computed in linear time. |
− | In contrast, the ''profile maximum likelihood'' estimator is the estimator which, given $\mathbf{s}$, only considers the ''profile'' $\Phi(\mathbf{s})$ defined as the multi-set of counts: e.g., | + | In contrast, the ''profile maximum likelihood'' (PML) estimator is the estimator which, given $\mathbf{s}$, only considers the ''profile'' $\Phi(\mathbf{s})$ defined as the multi-set of counts: e.g., |
$$ | $$ | ||
\Phi((c,a,b,b,c,d)) = \Phi((b,a, d,c,b,c)) = \{1,2,2,1\} | \Phi((c,a,b,b,c,d)) = \Phi((b,a, d,c,b,c)) = \{1,2,2,1\} | ||
Line 20: | Line 19: | ||
p^{\rm PML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \sum_{\mathbf{s'}: \Phi(\mathbf{s'})=\Phi(\mathbf{s})} p^{\otimes n}(\mathbf{s'})\,. | p^{\rm PML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \sum_{\mathbf{s'}: \Phi(\mathbf{s'})=\Phi(\mathbf{s})} p^{\otimes n}(\mathbf{s'})\,. | ||
$$ | $$ | ||
− | The PML is particularly well-suited to dealing with symmetric properties and functionals of distributions (i.e., those invariant | + | The PML is particularly well-suited to dealing with symmetric properties and functionals of distributions (i.e., those invariant to relabeling of the domain), as shown in {{cite|AcharyaDOS-17}}. In particular, in the sublinear sample regime, it provably outperforms the SML. However, from a computational point of view, it is unclear whether one can compute it efficiently. |
− | Is there a polynomial | + | Is there a polynomial (or even strongly subexponential) time algorithm to compute or (multiplicatively) approximate the PML? |
Latest revision as of 15:01, 8 November 2017
Suggested by | Alon Orlitsky |
---|---|
Source | FOCS 2017 |
Short link | https://sublinear.info/84 |
Given a sequence of samples $\mathbf{s}=(s_1,\dots,s_n)\in\mathbb{N}^n$, the sequence maximum likelihood (SML) estimator is the probability distribution over $\mathbb{N}$ that maximizes the probability of this sequence, i.e., $$ p^{\rm SML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} p^{\otimes n}(\mathbf{s}) =\arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \prod_{i=1}^n p(s_i)\,. $$ It is not hard to show that the SML corresponds to the empirical distribution obtained from $(s_1,\dots,s_n)$, which can be computed in linear time.
In contrast, the profile maximum likelihood (PML) estimator is the estimator which, given $\mathbf{s}$, only considers the profile $\Phi(\mathbf{s})$ defined as the multi-set of counts: e.g., $$ \Phi((c,a,b,b,c,d)) = \Phi((b,a, d,c,b,c)) = \{1,2,2,1\} $$ and maximizes the likelihood of getting the profile $\Phi(\mathbf{s})$: $$ p^{\rm PML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \sum_{\mathbf{s'}: \Phi(\mathbf{s'})=\Phi(\mathbf{s})} p^{\otimes n}(\mathbf{s'})\,. $$ The PML is particularly well-suited to dealing with symmetric properties and functionals of distributions (i.e., those invariant to relabeling of the domain), as shown in [AcharyaDOS-17]. In particular, in the sublinear sample regime, it provably outperforms the SML. However, from a computational point of view, it is unclear whether one can compute it efficiently.
Is there a polynomial (or even strongly subexponential) time algorithm to compute or (multiplicatively) approximate the PML?