Difference between revisions of "Open Problems:69"
Line 17: | Line 17: | ||
Achieving a rate of is simple: drawing (x_1,y_1) and (x_2,y_2) from p and outputting (x_1,y_2) provides sample access to \tilde{p} = p_1\times p_2, which can be shown to be 3\varepsilon-close to p. {{cite|BatuFFKRW-01}} | Achieving a rate of 2 is simple: drawing (x_1,y_1) and (x_2,y_2) from p and outputting (x_1,y_2) provides sample access to \tilde{p} = p_1\times p_2, which can be shown to be 3\varepsilon-close to p. {{cite|BatuFFKRW-01}} | ||
− | '''Question:''' is a rate r < 2 achievable? What about an amortized rate (to provide q=o(n^2) samples from the same distribution \tilde{p})<ref>The restriction o(n^2) comes from the fact that, after n^2 samples, one can actually learn the distribution p, and then compute a good corrected definition \tilde{p} offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning p would require.</ref>? | + | '''Question 1:''' is a rate r < 2 achievable? What about an amortized rate (to provide q=o(n^2) samples from the same distribution \tilde{p})<ref>The restriction o(n^2) comes from the fact that, after n^2 samples, one can actually learn the distribution p, and then compute a good corrected definition \tilde{p} offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning p would require.</ref>? |
− | What about the same question, when relaxing the second item to only ask that p be ''improved'': that is, to provide sample access to a distribution \tilde{p} guaranteed to be \frac{\varepsilon}{2}-close to a product distribution? | + | '''Question 2:'''What about the same question, when relaxing the second item to only ask that p be ''improved'': that is, to provide sample access to a distribution \tilde{p} guaranteed to be \frac{\varepsilon}{2}-close to a product distribution? |
− | '''Note:''' this question fits within the framework of "sampling improvers," as introduced in | + | '''Note:''' this question fits within the framework of "sampling improvers," as introduced in {{cite|CanonneGR-16}}: where, given access to a probability distribution only close to having a desired property, one aims at providing access to corrected samples from a nearby distribution that exhibits this property. |
<references /> | <references /> |
Revision as of 22:13, 10 January 2016
Suggested by | Clément Canonne |
---|---|
Source | Baltimore 2016 |
Short link | https://sublinear.info/69 |
Let p be an unknown (discrete) probability distribution over product space [n]\times [n], and \varepsilon\in(0,1]. Suppose p is \varepsilon-close to independent (in total variation distance), i.e. there exists a product distribution q=q_1\times q_2 on [n]\times [n] such that d_{\rm TV}(p,q) = \max_{S\subseteq [n]\times[n]} \left( p(S) - q(S)\right) \leq \varepsilon.
Given access to independent samples drawn from p, the goal is to correct p, that is to provide access to independent samples from a distribution \tilde{p} satisfying (with high probability):
- d_{\rm TV}(p,\tilde{p}) = O(\varepsilon) [the corrected distribution is faithful to the original one]
- \tilde{p} = p_1\times p_2 [the corrected distribution is an actual product distribution]
with a rate as good as possible, where the rate is the number of samples from p required to provide a single sample from \tilde{p}.
Achieving a rate of 2 is simple: drawing (x_1,y_1) and (x_2,y_2) from p and outputting (x_1,y_2) provides sample access to \tilde{p} = p_1\times p_2, which can be shown to be 3\varepsilon-close to p. [BatuFFKRW-01]
Question 1: is a rate r < 2 achievable? What about an amortized rate (to provide q=o(n^2) samples from the same distribution \tilde{p})[1]?
Question 2:What about the same question, when relaxing the second item to only ask that p be improved: that is, to provide sample access to a distribution \tilde{p} guaranteed to be \frac{\varepsilon}{2}-close to a product distribution?
Note: this question fits within the framework of "sampling improvers," as introduced in [CanonneGR-16]: where, given access to a probability distribution only close to having a desired property, one aims at providing access to corrected samples from a nearby distribution that exhibits this property.
- Jump up ↑ The restriction o(n^2) comes from the fact that, after n^2 samples, one can actually learn the distribution p, and then compute a good corrected definition \tilde{p} offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning p would require.