Difference between revisions of "Open Problems:69"
(Created page with "{{Header |title=Title of the problem |source=baltimore16 |who=Clément Canonne }} The open problem will appear here.") |
|||
Line 1: | Line 1: | ||
{{Header | {{Header | ||
− | |title= | + | |title=Correcting Independence of Distributions |
|source=baltimore16 | |source=baltimore16 | ||
|who=Clément Canonne | |who=Clément Canonne | ||
}} | }} | ||
− | The | + | |
+ | Let $p$ be an unknown (discrete) probability distribution over product space $[n]\times [n]$, and $\varepsilon\in(0,1]$. Suppose $p$ is ''$\varepsilon$-close to independent'' (in total variation distance), i.e. there exists a product distribution $q=q_1\times q_2$ on $[n]\times [n]$ such that | ||
+ | $$ | ||
+ | d_{\rm TV}(p,q) = \max_{S\subseteq [n]\times[n]} \left( p(S) - q(S)\right) \leq \varepsilon. | ||
+ | $$ | ||
+ | |||
+ | Given access to independent samples drawn from $p$, the goal is to ''correct'' $p$, that is to provide access to independent samples from a distribution $\tilde{p}$ satisfying (with high probability): | ||
+ | # $d_{\rm TV}(p,\tilde{p}) = O(\varepsilon)$ [the corrected distribution is faithful to the original one] | ||
+ | # $\tilde{p} = p_1\times p_2$ [the corrected distribution is an actual product distribution] | ||
+ | with a ''rate'' as good as possible, where the rate is the number of samples from $p$ required to provide a single sample from $\tilde{p}$. | ||
+ | |||
+ | Achieving a rate of $2$ is simple: drawing $(x_1,y_1)$ and $(x_2,y_2)$ from $p$ and outputting $(x_1,y_2)$ provides sample access to $\tilde{p} = p_1\times p_2$, which can be shown to be $3\varepsilon$-close to $p$. {{cite|BatuFFKRW-01}} | ||
+ | |||
+ | '''Question:''' is a rate $r < 2$ achievable? What about an amortized rate (to provide $q=o(n^2)$ samples from the same distribution $\tilde{p}$)<ref>The restriction $o(n^2)$ comes from the fact that, after $n^2$ samples, one can actually learn the distribution $p$, and then compute a good corrected definition $\tilde{p}$ offline.</ref>? | ||
+ | |||
+ | What about the same question, when relaxing the second item to only ask that $p$ be ''improved'': that is, to provide sample access to a distribution $\tilde{p}$ guaranteed to be $\frac{\varepsilon}{2}$-close to a product distribution? | ||
+ | |||
+ | <references /> |
Revision as of 22:08, 10 January 2016
Suggested by | Clément Canonne |
---|---|
Source | Baltimore 2016 |
Short link | https://sublinear.info/69 |
Let $p$ be an unknown (discrete) probability distribution over product space $[n]\times [n]$, and $\varepsilon\in(0,1]$. Suppose $p$ is $\varepsilon$-close to independent (in total variation distance), i.e. there exists a product distribution $q=q_1\times q_2$ on $[n]\times [n]$ such that $$ d_{\rm TV}(p,q) = \max_{S\subseteq [n]\times[n]} \left( p(S) - q(S)\right) \leq \varepsilon. $$
Given access to independent samples drawn from $p$, the goal is to correct $p$, that is to provide access to independent samples from a distribution $\tilde{p}$ satisfying (with high probability):
- $d_{\rm TV}(p,\tilde{p}) = O(\varepsilon)$ [the corrected distribution is faithful to the original one]
- $\tilde{p} = p_1\times p_2$ [the corrected distribution is an actual product distribution]
with a rate as good as possible, where the rate is the number of samples from $p$ required to provide a single sample from $\tilde{p}$.
Achieving a rate of $2$ is simple: drawing $(x_1,y_1)$ and $(x_2,y_2)$ from $p$ and outputting $(x_1,y_2)$ provides sample access to $\tilde{p} = p_1\times p_2$, which can be shown to be $3\varepsilon$-close to $p$. [BatuFFKRW-01]
Question: is a rate $r < 2$ achievable? What about an amortized rate (to provide $q=o(n^2)$ samples from the same distribution $\tilde{p}$)[1]?
What about the same question, when relaxing the second item to only ask that $p$ be improved: that is, to provide sample access to a distribution $\tilde{p}$ guaranteed to be $\frac{\varepsilon}{2}$-close to a product distribution?
- ↑ The restriction $o(n^2)$ comes from the fact that, after $n^2$ samples, one can actually learn the distribution $p$, and then compute a good corrected definition $\tilde{p}$ offline.