Editing Open Problems:82
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
{{Header | {{Header | ||
+ | |title=Beyond identity testing | ||
|source=focs17 | |source=focs17 | ||
|who=Clément Canonne | |who=Clément Canonne | ||
}} | }} | ||
− | Given access to i.i.d. samples from two unknown probability distributions $p | + | Given access to i.i.d. samples from two unknown probability distributions $p,q$ over a discrete domain (say $[n]=\{1,\dots,n\}$) and distance parameter $\varepsilon\in(0,1]$, the ''closeness'' problem asks to distinguish w.h.p. between (i) $p=q$ and (ii) $\operatorname{d}_{\rm TV}(p,q)>\varepsilon$. (The ''identity'' problem is the analogue when $q$ is fixed and explicitly known beforehand.) |
− | + | Closeness up to a permutation would then be the variant where one must test whether $p,q$ are equal ''up to relabeling of the elements'': | |
− | (i) $\exists \pi\in\mathcal{S}_n$ s.t. $p=q\circ\pi$ vs. (ii) $\forall \pi\in\mathcal{S}_n$, $\operatorname{d}_{\rm TV}(p,q\circ \pi)>\varepsilon$. Results of Valiant and Valiant {{cite|ValiantV-11}} imply that this question has sample complexity $\Theta | + | (i) $\exists \pi\in\mathcal{S}_n$ s.t. $p=q\circ\pi$ vs. (ii) $\forall \pi\in\mathcal{S}_n$, $\operatorname{d}_{\rm TV}(p,q\circ \pi)>\varepsilon$. Results of Valiant and Valiant {{cite|ValiantV-11}} imply that this question has sample complexity $\Theta(\frac{n}{\varepsilon^2\log n})$. |
− | The most general question then is | + | The most general question then is |
: Given a fixed class $\mathcal{F}$ of functions from $[n]$ to $[m]$, distinguish between (i) $\exists \pi\in\mathcal{F}$ s.t. $p=q\circ\pi$ vs. (ii) $\forall \pi\in\mathcal{F}$, $\operatorname{d}_{\rm TV}(p,q\circ \pi)>\varepsilon$. | : Given a fixed class $\mathcal{F}$ of functions from $[n]$ to $[m]$, distinguish between (i) $\exists \pi\in\mathcal{F}$ s.t. $p=q\circ\pi$ vs. (ii) $\forall \pi\in\mathcal{F}$, $\operatorname{d}_{\rm TV}(p,q\circ \pi)>\varepsilon$. | ||
− | In particular, one can study how the sample complexity depends on $\mathcal{F}$, or what it is for some classes of interest (e.g., $n=m$ for $\mathcal{F}$ a subgroup of the symmetric group $\mathcal{S}_n$; or $m\ll n$ and $\mathcal{F}$ being a class of | + | In particular, one can study how the sample complexity depends on $\mathcal{F}$, or what it is for some classes of interest (e.g., $n=m$ for $\mathcal{F}$ a subgroup of the symmetric group $\mathcal{S}_n$; or $m\ll n$ and $\mathcal{F}$ being a class of "coarsenings," capturing whether $p,q$ are the same distribution but with a different discretization/binning). |