Difference between revisions of "Open Problems:69"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
(Created page with "{{Header |title=Title of the problem |source=baltimore16 |who=Clément Canonne }} The open problem will appear here.")
 
(Updating header)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{Header
 
{{Header
|title=Title of the problem
 
 
|source=baltimore16
 
|source=baltimore16
 
|who=Clément Canonne
 
|who=Clément Canonne
 
}}
 
}}
The open problem will appear here.
+
 
 +
Let be an unknown (discrete) probability distribution over product space [n]\times [n], and \varepsilon\in(0,1]. Suppose p is ''\varepsilon-close to independent'' (in total variation distance), i.e., there exists a product distribution q=q_1\times q_2 on [n]\times [n] such that
 +
$$
 +
d_{\rm TV}(p,q) = \max_{S\subseteq [n]\times[n]} \left( p(S) - q(S)\right) \leq \varepsilon.
 +
$$
 +
 
 +
Given access to independent samples drawn from p, the goal is to ''correct'' p, that is, to provide access to independent samples from a distribution \tilde{p} satisfying (with high probability):
 +
* d_{\rm TV}(p,\tilde{p}) = O(\varepsilon) (i.e., the corrected distribution is faithful to the original one),
 +
* \tilde{p} = \tilde{p}_1\times \tilde{p}_2  (i.e., the corrected distribution is a product distribution),
 +
with a ''rate'' as good as possible, where the rate is the number of samples from p required to provide a single sample from \tilde{p}.
 +
 
 +
Achieving a rate of 2 is simple: drawing (x_1,y_1) and (x_2,y_2) from p and outputting (x_1,y_2) provides sample access to \tilde{p} = p_1\times p_2, which can be shown to be 3\varepsilon-close to p {{cite|BatuFFKRW-01}}.
 +
 
 +
'''Question 1:''' Is a rate r < 2 achievable? What about an amortized rate (to provide q=o(n^2) samples from the same distribution \tilde{p}<ref>The restriction o(n^2) comes from the fact that, after n^2 samples, one can actually learn the distribution p, and then compute a good corrected definition \tilde{p} offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning p would require.</ref>)?
 +
 
 +
'''Question 2:''' What about the same question, when relaxing the second item to only ask that p be ''improved'': that is, to provide sample access to a distribution \tilde{p} guaranteed to be \frac{\varepsilon}{2}-close to a product distribution?
 +
 
 +
'''Note:''' This question fits within the framework of &ldquo;sampling improvers,&rdquo; introduced in {{cite|CanonneGR-16}}. In this framework, given access to a probability distribution only close to having a desired property, one aims at providing access to corrected samples from a nearby distribution that exhibits this property.
 +
 
 +
<references />

Latest revision as of 18:36, 18 January 2016

Suggested by Clément Canonne
Source Baltimore 2016
Short link https://sublinear.info/69

Let p be an unknown (discrete) probability distribution over product space [n]\times [n], and \varepsilon\in(0,1]. Suppose p is \varepsilon-close to independent (in total variation distance), i.e., there exists a product distribution q=q_1\times q_2 on [n]\times [n] such that d_{\rm TV}(p,q) = \max_{S\subseteq [n]\times[n]} \left( p(S) - q(S)\right) \leq \varepsilon.

Given access to independent samples drawn from p, the goal is to correct p, that is, to provide access to independent samples from a distribution \tilde{p} satisfying (with high probability):

  • d_{\rm TV}(p,\tilde{p}) = O(\varepsilon) (i.e., the corrected distribution is faithful to the original one),
  • \tilde{p} = \tilde{p}_1\times \tilde{p}_2 (i.e., the corrected distribution is a product distribution),

with a rate as good as possible, where the rate is the number of samples from p required to provide a single sample from \tilde{p}.

Achieving a rate of 2 is simple: drawing (x_1,y_1) and (x_2,y_2) from p and outputting (x_1,y_2) provides sample access to \tilde{p} = p_1\times p_2, which can be shown to be 3\varepsilon-close to p [BatuFFKRW-01].

Question 1: Is a rate r < 2 achievable? What about an amortized rate (to provide q=o(n^2) samples from the same distribution \tilde{p}[1])?

Question 2: What about the same question, when relaxing the second item to only ask that p be improved: that is, to provide sample access to a distribution \tilde{p} guaranteed to be \frac{\varepsilon}{2}-close to a product distribution?

Note: This question fits within the framework of “sampling improvers,” introduced in [CanonneGR-16]. In this framework, given access to a probability distribution only close to having a desired property, one aims at providing access to corrected samples from a nearby distribution that exhibits this property.

  1. Jump up The restriction o(n^2) comes from the fact that, after n^2 samples, one can actually learn the distribution p, and then compute a good corrected definition \tilde{p} offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning p would require.