Problem 66: Distinguishing Distributions with Conditional Samples

From Open Problems in Sublinear Algorithms
Revision as of 14:03, 29 May 2014 by Geomblog (talk | contribs)
Jump to: navigation, search
Suggested by Eldar Fischer
Source Bertinoro 2014
Short link https://sublinear.info/66

Suppose we're given access to two distributions $P$ and $Q$ over $[1 \ldots n]$ and wish to test if they are the same or are at least $\epsilon$ apart under the $\ell_1$ distance. Assume that we have access to conditional samples: in other words, a query consists of a set $S \subset [1..n]$ and the output is a sample drawn from the conditional distribution on $A$. In other words, a conditional sample on $P$ given $S$ is drawn from the distribution where

$$ \text{Pr}(j) = \begin{array} p_j / \sum_{i \in A} p_i & j \in A \\ 0 & \text{otherwise} \end{array} $$

It is known that if one of the distributions is fixed, then a constant ($f(1/\epsilon)$) number of queries suffice to test the distributions.

What can we say if both distributions are unknown ?