Difference between revisions of "Open Problems:69"

Suggested by	Clément Canonne
Source	Baltimore 2016
Short link	https://sublinear.info/69

Latest revision as of 18:36, 18 January 2016

Let $p$ be an unknown (discrete) probability distribution over product space $[n]\times [n]$ , and $\varepsilon\in(0,1]$ . Suppose $p$ is $\varepsilon$ -close to independent (in total variation distance), i.e., there exists a product distribution $q=q_1\times q_2$ on $[n]\times [n]$ such that $d_{\rm TV}(p,q) = \max_{S\subseteq [n]\times[n]} \left( p(S) - q(S)\right) \leq \varepsilon.$

Given access to independent samples drawn from $p$ , the goal is to correct $p$ , that is, to provide access to independent samples from a distribution $\tilde{p}$ satisfying (with high probability):

$d_{\rm TV}(p,\tilde{p}) = O(\varepsilon)$ (i.e., the corrected distribution is faithful to the original one),
$\tilde{p} = \tilde{p}_1\times \tilde{p}_2$ (i.e., the corrected distribution is a product distribution),

with a rate as good as possible, where the rate is the number of samples from $p$ required to provide a single sample from $\tilde{p}$ .

Achieving a rate of $2$ is simple: drawing $(x_1,y_1)$ and $(x_2,y_2)$ from $p$ and outputting $(x_1,y_2)$ provides sample access to $\tilde{p} = p_1\times p_2$ , which can be shown to be $3\varepsilon$ -close to $p$ [BatuFFKRW-01].

Question 1: Is a rate $r < 2$ achievable? What about an amortized rate (to provide $q=o(n^2)$ samples from the same distribution $\tilde{p}$ ^[1])?

Question 2: What about the same question, when relaxing the second item to only ask that $p$ be improved: that is, to provide sample access to a distribution $\tilde{p}$ guaranteed to be $\frac{\varepsilon}{2}$ -close to a product distribution?

Note: This question fits within the framework of “sampling improvers,” introduced in [CanonneGR-16]. In this framework, given access to a probability distribution only close to having a desired property, one aims at providing access to corrected samples from a nearby distribution that exhibits this property.

Jump up ↑ The restriction $o(n^2)$ comes from the fact that, after $n^2$ samples, one can actually learn the distribution $p$ , and then compute a good corrected definition $\tilde{p}$ offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning $p$ would require.

[1] Jump up ↑ The restriction $o(n^2)$ comes from the fact that, after $n^2$ samples, one can actually learn the distribution $p$ , and then compute a good corrected definition $\tilde{p}$ offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning $p$ would require.

[1]

@@ Line 1: / Line 1: @@
 {{Header
-|title=Title of the problem
 |source=baltimore16
 |who=Clément Canonne
 }}
-The open problem will appear here.
+Let  $p$  be an unknown (discrete) probability distribution over product space  $[n]\times [n]$ , and  $\varepsilon\in(0,1]$ . Suppose  $p$  is '' $\varepsilon$ -close to independent'' (in total variation distance), i.e., there exists a product distribution  $q=q_1\times q_2$  on  $[n]\times [n]$  such that
+$$
+d_{\rm TV}(p,q) = \max_{S\subseteq [n]\times[n]} \left( p(S) - q(S)\right) \leq \varepsilon.
+$$
+Given access to independent samples drawn from  $p$ , the goal is to ''correct''  $p$ , that is, to provide access to independent samples from a distribution  $\tilde{p}$  satisfying (with high probability):
+*  $d_{\rm TV}(p,\tilde{p}) = O(\varepsilon)$  (i.e., the corrected distribution is faithful to the original one),
+*  $\tilde{p} = \tilde{p}_1\times \tilde{p}_2$   (i.e., the corrected distribution is a product distribution),
+with a ''rate'' as good as possible, where the rate is the number of samples from  $p$  required to provide a single sample from  $\tilde{p}$ .
+Achieving a rate of  $2$  is simple: drawing  $(x_1,y_1)$  and  $(x_2,y_2)$  from  $p$  and outputting  $(x_1,y_2)$  provides sample access to  $\tilde{p} = p_1\times p_2$ , which can be shown to be  $3\varepsilon$ -close to  $p$  {{cite|BatuFFKRW-01}}.
+'''Question 1:''' Is a rate  $r < 2$  achievable? What about an amortized rate (to provide  $q=o(n^2)$  samples from the same distribution  $\tilde{p}$ <ref>The restriction  $o(n^2)$  comes from the fact that, after  $n^2$  samples, one can actually learn the distribution  $p$ , and then compute a good corrected definition  $\tilde{p}$  offline. Hence, the range of interest is when having to provide a number of samples negligible in front of what learning  $p$  would require.</ref>)?
+'''Question 2:''' What about the same question, when relaxing the second item to only ask that  $p$  be ''improved'': that is, to provide sample access to a distribution  $\tilde{p}$  guaranteed to be  $\frac{\varepsilon}{2}$ -close to a product distribution?
+'''Note:''' This question fits within the framework of &ldquo;sampling improvers,&rdquo; introduced in {{cite|CanonneGR-16}}. In this framework, given access to a probability distribution only close to having a desired property, one aims at providing access to corrected samples from a nearby distribution that exhibits this property.
+<references />

Difference between revisions of "Open Problems:69"

Latest revision as of 18:36, 18 January 2016

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools