Difference between revisions of "Open Problems:66"

Suggested by	Eldar Fischer
Source	Bertinoro 2014
Short link	https://sublinear.info/66

Latest revision as of 19:20, 7 November 2018

Suppose we are given access to two distributions $P$ and $Q$ over $\{1,2, \ldots, n\}$ and wish to test if they are the same or are at least $\epsilon$ apart under the $\ell_1$ distance. Assume that we have access to conditional samples: a query consists of a set $S \subseteq \{1,2, \ldots, n\}$ and the output is a sample drawn from the conditional distribution on $S$ [ChakrabortyFGM-13,CanonneRS-14]. In other words, if $p_j$ is the probability of drawing an element $j$ from $P$, a conditional sample from $P$ restricted to $S$ is drawn from the distribution where $$ \text{Pr}(j) = \begin{cases} \frac{p_j}{\sum_{i \in S} p_i} & \mbox{if }j \in S, \\ 0 & \mbox{otherwise.}\end{cases} $$ It is known that if one of the distributions is fixed, then the testing problem requires at most $\tilde O(1/\epsilon^4)$ queries, which is independent of $n$ [CanonneRS-14].

What can we say if both distributions are unknown? The best known upper bound is $\tilde O\left( \frac{\log^5 n}{\epsilon^4} \right)$ [CanonneRS-14].

Updates[edit]

Acharya, Canonne, and Kamath [AcharyaCK-14] showed that $\Omega(\sqrt{\log \log n})$ conditional queries are needed in this case for some constant $\epsilon > 0$. Contrary to the case of only one distribution unknown, if both distributions are unknown, the required number of queries is a function of $n$. Falahatgar, Jafarpour, Orlitsky, Pichapathi, and Suresh [FalahatgarJOPS-15] showed that $O\left(\frac{\log{\log{n}}}{\epsilon^5}\right)$ queries are sufficient. This determines the query complexity of the problem up to a factor of $\sqrt{\log \log n}$.

In the non-adaptive model, Kamath and Tzamos [KamathT-19] showed that $\operatorname{poly} \log n$ conditional queries are sufficient for equivalence testing. A lower bound of $\Omega(\log n)$ by Acharya, Canonne, and Kamath [AcharyaCK-14] for uniformity testing shows that identity and equivalence testing have complexities related by polynomial factors in the non-adaptive model, compared to the gap in the adaptive model.

@@ Line 1: / Line 1: @@
 {{Header
-|title=Distinguishing Distributions with Conditional Samples
 |source=bertinoro14
 |who=Eldar Fischer
 }}
-Suppose we're given access to two distributions $P$ and $Q$ over $[1 \ldots n]$ and wish to test if they are the same or are at least $\epsilon$ apart under the $\ell_1$ distance. Assume that we have access to ''conditional samples'': in other words, a query consists of a set $S \subset [1..n]$ and the output is a sample drawn from the conditional distribution on $A$. In other words, a conditional sample on $P$ given $S$ is drawn from the distribution where
+Suppose we are given access to two distributions $P$ and $Q$ over $\{1,2, \ldots, n\}$ and wish to test if they are the same or are at least $\epsilon$ apart under the $\ell_1$ distance. Assume that we have access to ''conditional samples'': a query consists of a set $S \subseteq \{1,2, \ldots, n\}$ and the output is a sample drawn from the conditional distribution on $S$ {{cite|ChakrabortyFGM-13|CanonneRS-14}}. In other words, if $p_j$ is the probability of drawing an element $j$ from $P$, a conditional sample from $P$ restricted to $S$ is drawn from the distribution where
+$$ \text{Pr}(j) = \begin{cases} \frac{p_j}{\sum_{i \in S} p_i} & \mbox{if }j \in S, \\ 0 & \mbox{otherwise.}\end{cases} $$
+It is known that if one of the distributions is fixed, then the testing problem requires at most $\tilde O(1/\epsilon^4)$ queries, which is independent of $n$ {{cite|CanonneRS-14}}.
-\[ \text{Pr}(j) = \begin{array} p_j / \sum_{i \in A} p_i & j \in A \\ 0 & \text{otherwise} \end{array} \]
+What can we say if both distributions are unknown? The best known upper bound is $\tilde O\left( \frac{\log^5 n}{\epsilon^4} \right)$ {{cite|CanonneRS-14}}.
-It is known that if one of the distributions is fixed, then a constant ($f(1/\epsilon)$) number of queries suffice to test the distributions.
+==Updates==
+Acharya, Canonne, and Kamath {{cite|AcharyaCK-14}} showed that $\Omega(\sqrt{\log \log n})$ conditional queries are needed in this case for some constant $\epsilon > 0$. Contrary to the case of only one distribution unknown, if both distributions are unknown, the required number of queries is a function of $n$. Falahatgar, Jafarpour, Orlitsky, Pichapathi, and Suresh {{cite|FalahatgarJOPS-15}} showed that $O\left(\frac{\log{\log{n}}}{\epsilon^5}\right)$ queries are sufficient. This determines the query complexity of the problem up to a factor of $\sqrt{\log \log n}$.
-What can we say if both distributions are unknown ?
+In the non-adaptive model, Kamath and Tzamos {{cite|KamathT-19}} showed that $\operatorname{poly} \log n$ conditional queries are sufficient for equivalence testing. A lower bound of $\Omega(\log n)$ by Acharya, Canonne, and Kamath {{cite|AcharyaCK-14}} for uniformity testing shows that identity and equivalence testing have complexities related by polynomial factors in the non-adaptive model, compared to the gap in the adaptive model.

Difference between revisions of "Open Problems:66"

Latest revision as of 19:20, 7 November 2018

Updates[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools