Difference between revisions of "Open Problems:83"
(Created page with "{{Header |title=Instance-optimal Hellinger testing |source=focs17 |who=Clément Canonne }} Given the full description of a fixed distribution $q$ over a discrete domain (say $...") |
|||
Line 6: | Line 6: | ||
Given the full description of a fixed distribution $q$ over a discrete domain (say $[n]=\{1,\dots,n\}$), as well as access to i.i.d. samples from an unknown probability distributions $p$ over $[n]$ and distance parameter $\varepsilon\in(0,1]$, the identity testing problem asks to distinguish w.h.p. between (i) $p=q$ and (ii) $\operatorname{d}_{\rm TV}(p,q)>\varepsilon$. | Given the full description of a fixed distribution $q$ over a discrete domain (say $[n]=\{1,\dots,n\}$), as well as access to i.i.d. samples from an unknown probability distributions $p$ over $[n]$ and distance parameter $\varepsilon\in(0,1]$, the identity testing problem asks to distinguish w.h.p. between (i) $p=q$ and (ii) $\operatorname{d}_{\rm TV}(p,q)>\varepsilon$. | ||
− | The sample complexity of this question as a function of $n$ and $\varepsilon$ is fully understood by now: $\Theta(\sqrt{n}/\varepsilon^2)$ are necessary and sufficient, the worst-case lower bound following from taking $q$ to be the uniform distribution on $[n]$. Valiant and Valiant {{cite|ValiantV-14}} shown an ''instance-optimal'' bound on this problem, where the sample complexity | + | The sample complexity of this question as a function of $n$ and $\varepsilon$ is fully understood by now: $\Theta(\sqrt{n}/\varepsilon^2)$ are necessary and sufficient, the worst-case lower bound following from taking $q$ to be the uniform distribution on $[n]$. Valiant and Valiant {{cite|ValiantV-14}} shown an ''instance-optimal'' bound on this problem, where the sample complexity $\Psi_{\rm TV}$ now only depends on $n$ and the (massive) parameter $q$ instead of $n$: namely, that |
$$\Psi_{\rm TV}(q,\varepsilon) = \Theta\left(\max\left( \frac{\Phi(q,\Theta(\varepsilon))}{\varepsilon^2}, \frac{1}{\varepsilon}\right)\right)$$ | $$\Psi_{\rm TV}(q,\varepsilon) = \Theta\left(\max\left( \frac{\Phi(q,\Theta(\varepsilon))}{\varepsilon^2}, \frac{1}{\varepsilon}\right)\right)$$ | ||
samples were necessary and sufficient, where $\Phi$ is the functional defined by taking the $2/3$-pseudonorm of the vector of probabilities of $q$, once both the biggest element and $\varepsilon$ total mass of the smallest elements had been removed: | samples were necessary and sufficient, where $\Phi$ is the functional defined by taking the $2/3$-pseudonorm of the vector of probabilities of $q$, once both the biggest element and $\varepsilon$ total mass of the smallest elements had been removed: | ||
Line 20: | Line 20: | ||
\operatorname{d}_{\rm H}(p,q) = \frac{1}{\sqrt{2}}\lVert\sqrt{p}-\sqrt{q}\rVert_2\,. | \operatorname{d}_{\rm H}(p,q) = \frac{1}{\sqrt{2}}\lVert\sqrt{p}-\sqrt{q}\rVert_2\,. | ||
$$ | $$ | ||
− | Results of Daskalakis, Kamath, and Wright {{cite|DaskalakisKW-18}} show that the ''worst-case'' sample complexity remains $\Theta(\sqrt{n}/\varepsilon^2)$. Moreover, due to the quadratic dependence between Hellinger and total variation distances, both instance-optimal bounds mentioned above apply, yet with possibly a quadratic gap between upper and lower bounds in terms of $\varepsilon$: | + | Results of Daskalakis, Kamath, and Wright {{cite|DaskalakisKW-18}} show that the ''worst-case'' sample complexity remains $\Theta(\sqrt{n}/\varepsilon^2)$. Moreover, due to the quadratic dependence between Hellinger and total variation distances, both instance-optimal bounds mentioned above apply, yet with possibly a quadratic gap between upper and lower bounds in terms of $\varepsilon$: leading to bounds on the instance-optimal sample complexity $\Psi_{\rm H}$ of Hellinger identity testing of |
$$\Psi_{\rm TV}(q,\varepsilon) \leq \Psi_{\rm H}(q,\varepsilon) \leq \Psi_{\rm TV}(q,\varepsilon^2)$$. | $$\Psi_{\rm TV}(q,\varepsilon) \leq \Psi_{\rm H}(q,\varepsilon) \leq \Psi_{\rm TV}(q,\varepsilon^2)$$. | ||
− | What is the | + | What is the right dependence on $\varepsilon$ of $\Psi_{\rm H}$? |
''Note that in both instance-optimal bounds obtained for $\Psi_{\rm TV}$, there exist (simple) examples of $q$ where $\varepsilon$ ends up ''in the exponent,'' so this quadratic gap is not innocuous even for constant $\varepsilon$.'' | ''Note that in both instance-optimal bounds obtained for $\Psi_{\rm TV}$, there exist (simple) examples of $q$ where $\varepsilon$ ends up ''in the exponent,'' so this quadratic gap is not innocuous even for constant $\varepsilon$.'' |
Revision as of 18:48, 20 October 2017
Suggested by | Clément Canonne |
---|---|
Source | FOCS 2017 |
Short link | https://sublinear.info/83 |
Given the full description of a fixed distribution $q$ over a discrete domain (say $[n]=\{1,\dots,n\}$), as well as access to i.i.d. samples from an unknown probability distributions $p$ over $[n]$ and distance parameter $\varepsilon\in(0,1]$, the identity testing problem asks to distinguish w.h.p. between (i) $p=q$ and (ii) $\operatorname{d}_{\rm TV}(p,q)>\varepsilon$.
The sample complexity of this question as a function of $n$ and $\varepsilon$ is fully understood by now: $\Theta(\sqrt{n}/\varepsilon^2)$ are necessary and sufficient, the worst-case lower bound following from taking $q$ to be the uniform distribution on $[n]$. Valiant and Valiant [ValiantV-14] shown an instance-optimal bound on this problem, where the sample complexity $\Psi_{\rm TV}$ now only depends on $n$ and the (massive) parameter $q$ instead of $n$: namely, that $$\Psi_{\rm TV}(q,\varepsilon) = \Theta\left(\max\left( \frac{\Phi(q,\Theta(\varepsilon))}{\varepsilon^2}, \frac{1}{\varepsilon}\right)\right)$$ samples were necessary and sufficient, where $\Phi$ is the functional defined by taking the $2/3$-pseudonorm of the vector of probabilities of $q$, once both the biggest element and $\varepsilon$ total mass of the smallest elements had been removed: $ \Phi(q,\varepsilon) = \lVert q^{-\max}_{-\varepsilon} \rVert_{2/3} $. Using different techniques, Blais, Canonne, and Gur [BlaisCG-17] then established a similar instance-optimal bound, with regard to a different functional, the "K-functional $\kappa$ between $\ell_1$ and $\ell_2$ spaces:" $ \Psi_{\rm TV}(q,\varepsilon)=\Omega\left({\kappa_p(1-\Theta(\varepsilon))}/{\varepsilon}\right), \Psi_{\rm TV}(q,\varepsilon)=O\left({\kappa_p(1-\Theta(\varepsilon))}/{\varepsilon^2}\right) $.
Now, consider the exact same problem, but replacing the total variation $\operatorname{d}_{\rm TV}(p,q)$ by the Hellinger distance $$ \operatorname{d}_{\rm H}(p,q) = \frac{1}{\sqrt{2}}\lVert\sqrt{p}-\sqrt{q}\rVert_2\,. $$ Results of Daskalakis, Kamath, and Wright [DaskalakisKW-18] show that the worst-case sample complexity remains $\Theta(\sqrt{n}/\varepsilon^2)$. Moreover, due to the quadratic dependence between Hellinger and total variation distances, both instance-optimal bounds mentioned above apply, yet with possibly a quadratic gap between upper and lower bounds in terms of $\varepsilon$: leading to bounds on the instance-optimal sample complexity $\Psi_{\rm H}$ of Hellinger identity testing of $$\Psi_{\rm TV}(q,\varepsilon) \leq \Psi_{\rm H}(q,\varepsilon) \leq \Psi_{\rm TV}(q,\varepsilon^2)$$.
What is the right dependence on $\varepsilon$ of $\Psi_{\rm H}$?
Note that in both instance-optimal bounds obtained for $\Psi_{\rm TV}$, there exist (simple) examples of $q$ where $\varepsilon$ ends up in the exponent, so this quadratic gap is not innocuous even for constant $\varepsilon$.