Problem 88: Separating PDF and CDF Query Models

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Suggested by Clément Canonne
Source FOCS 2017
Short link https://sublinear.info/88

Recall that in the dual and cumulative dual models of distribution testing [CanonneR-14], the testing is provided with two types of access to the unknown probability distribution $p$ (for the sake of the cumulative dual, over a totally ordered domain $[n]=\{1,\dots,n\}$): the first is the usual sampling oracle, which returns i.i.d. samples from $p$; and the second is an evaluation oracle, which on query $i\in[n]$ returns $p(i)$ (or, for the cumulative dual model, returns $p([1,i])=\sum_{j=1}^i p(j)$). In other words, the algorithm has sample access to $p$ and query access to its probability mass function (resp. cumulative distribution function).

It is immediate to see that the cumulative dual model is at least as powerful as the dual model, since any query to the pmf of $p$ can be simulated from 2 queries to its cdf. However, currently there is no strict separation known between dual and cumulative dual models for natural properties, i.e., a property $\mathcal{P}$ (or functional $\Phi$) of distributions for which testing (resp. estimation) requires asymptotically significantly more samples in the dual model than in the cumulative dual.

Does there exist such a separation?

Canonne and Rubinfeld show in [CanonneR-14] a somewhat contrived separation, for estimating the entropy of distributions promised to be close to monotone. A natural conjecture would be that such a separation would exist for a property of functional where the total order (and thus the cumulative access) is essential, e.g., testing monotonicity or estimating distance to it. (The best currently known upper bounds for testing monotonicity are respectively $O((\log n)/\varepsilon)$ in the dual model, and $\tilde{O}(1/\varepsilon^4)$ in the cumulative dual [Canonne-15].)