Problem 100: Effective Support Size Estimation in the Dual Model

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Suggested by Oded Goldreich
Source WOLA 2019
Short link https://sublinear.info/100

For a probability distribution $p$ over a discrete domain $\Omega$, and a parameter $\varepsilon\in[0,1]$, denote by \[ \operatorname{ess}_\varepsilon(p) \stackrel{\rm def}{=} \min\{ \operatorname{supp}(q) : \operatorname{d}_{\rm TV}(p,q) \leq \varepsilon \} \] the $\varepsilon$-effective suport size of $p$, i.e., the smallest possible support size of any distribution $\varepsilon$-close to $p$. This turns out to be a more robust and interesting measure in general than the support of $p$, which is $\operatorname{ess}_0(p) = \operatorname{supp}(p)$. In recent work, Goldreich [Goldreich-19b] focused on the query complexity of approximating the effective support size of a discrete distribution provided via two oracles: sampling ($\textsf{samp}_p$), and query access (to the probability mass function), $\textsf{eval}_p$. In particular, the goal is, given parameters $\varepsilon$ and $\beta>1$, to output an $f(\varepsilon,\beta,n)$-factor approximation of $\operatorname{ess}_{\varepsilon'}(p)$, for some $\varepsilon' \in [\varepsilon,\beta\varepsilon]$.

In the aforementioned work, algorithms are obtained achieving (for constant $\beta>1$)

  • query complexity $\operatorname{poly}(1/\varepsilon)$ and approximation factor $f = O(\log\log\log\log(n/\varepsilon))$, that is, any constant number of iterated logarithms;
  • query complexity $\operatorname{poly}(\log^\ast n, 1/\varepsilon)$ even for approximation factor $f = O(1)$;

where $n \stackrel{\rm def}{=} \operatorname{ess}_\varepsilon(p)$. (As well as several other results interpolating between the two extremes.)

Question: Can one get the best of both worlds, and get rid of the $\log^\ast n$ to obtain query complexity $\operatorname{poly}(1/\varepsilon)$ and constant approximation factor?