Problem 66: Distinguishing Distributions with Conditional Samples

From Open Problems in Sublinear Algorithms
(Redirected from Waiting:Eldar Fischer)
Jump to: navigation, search
Suggested by Eldar Fischer
Source Bertinoro 2014
Short link https://sublinear.info/66

Suppose we are given access to two distributions P and Q over \{1,2, \ldots, n\} and wish to test if they are the same or are at least \epsilon apart under the \ell_1 distance. Assume that we have access to conditional samples: a query consists of a set S \subseteq \{1,2, \ldots, n\} and the output is a sample drawn from the conditional distribution on S [ChakrabortyFGM-13,CanonneRS-14]. In other words, if p_j is the probability of drawing an element j from P, a conditional sample from P restricted to S is drawn from the distribution where \text{Pr}(j) = \begin{cases} \frac{p_j}{\sum_{i \in S} p_i} & \mbox{if }j \in S, \\ 0 & \mbox{otherwise.}\end{cases} It is known that if one of the distributions is fixed, then the testing problem requires at most \tilde O(1/\epsilon^4) queries, which is independent of n [CanonneRS-14].

What can we say if both distributions are unknown? The best known upper bound is \tilde O\left( \frac{\log^5 n}{\epsilon^4} \right) [CanonneRS-14].

Updates[edit]

Acharya, Canonne, and Kamath [AcharyaCK-14] showed that \Omega(\sqrt{\log \log n}) conditional queries are needed in this case for some constant \epsilon > 0. Contrary to the case of only one distribution unknown, if both distributions are unknown, the required number of queries is a function of n. Falahatgar, Jafarpour, Orlitsky, Pichapathi, and Suresh [FalahatgarJOPS-15] showed that O\left(\frac{\log{\log{n}}}{\epsilon^5}\right) queries are sufficient. This determines the query complexity of the problem up to a factor of \sqrt{\log \log n}.

In the non-adaptive model, Kamath and Tzamos [KamathT-19] showed that \operatorname{poly} \log n conditional queries are sufficient for equivalence testing. A lower bound of \Omega(\log n) by Acharya, Canonne, and Kamath [AcharyaCK-14] for uniformity testing shows that identity and equivalence testing have complexities related by polynomial factors in the non-adaptive model, compared to the gap in the adaptive model.