# Problem 84: Efficient Profile Maximum Likelihood Computation

Suggested by | Alon Orlitsky |
---|---|

Source | FOCS 2017 |

Short link | https://sublinear.info/84 |

Given a sequence of samples $\mathbf{s}=(s_1,\dots,s_n)\in\mathbb{N}^n$, the *sequence maximum likelihood* (SML) estimator is the probability distribution over $\mathbb{N}$ that maximizes the probability of this sequence, i.e.,
$$
p^{\rm SML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} p^{\otimes n}(\mathbf{s})
=\arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \prod_{i=1}^n p(s_i)\,.
$$
It is not hard to show that the SML corresponds to the empirical distribution obtained from $(s_1,\dots,s_n)$, which can be computed in linear time.

In contrast, the *profile maximum likelihood* (PML) estimator is the estimator which, given $\mathbf{s}$, only considers the *profile* $\Phi(\mathbf{s})$ defined as the multi-set of counts: e.g.,
$$
\Phi((c,a,b,b,c,d)) = \Phi((b,a, d,c,b,c)) = \{1,2,2,1\}
$$
and maximizes the likelihood of getting the profile $\Phi(\mathbf{s})$:
$$
p^{\rm PML} \stackrel{\rm def}{=} \arg\!\!\!\max_{p\in\Delta(\mathbb{N})} \sum_{\mathbf{s'}: \Phi(\mathbf{s'})=\Phi(\mathbf{s})} p^{\otimes n}(\mathbf{s'})\,.
$$
The PML is particularly well-suited to dealing with symmetric properties and functionals of distributions (i.e., those invariant to relabeling of the domain), as shown in [AcharyaDOS-17]. In particular, in the sublinear sample regime, it provably outperforms the SML. However, from a computational point of view, it is unclear whether one can compute it efficiently.

Is there a polynomial (or even strongly subexponential) time algorithm to compute or (multiplicatively) approximate the PML?