Difference between revisions of "Open Problems:5"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Line 1: Line 1:
 
{{Header
 
{{Header
|title=Characterizing Sketchable Distances
 
 
|source=kanpur06
 
|source=kanpur06
 
|who=Sudipto Guha and Piotr Indyk
 
|who=Sudipto Guha and Piotr Indyk

Revision as of 01:38, 7 March 2013

Suggested by Sudipto Guha and Piotr Indyk
Source Kanpur 2006
Short link https://sublinear.info/5

Some of the early successes in developing algorithms for the data stream model related to estimating $L_p$ norms [FeigenbaumKSV-02,Indyk-00,AlonMS-99] and the “Hamming norm” $L_0$ [CormodeDIM-03]. What other distances, or more generally “measures of dissimilarity,” can be approximated in the data stream model? Do all sketchable distances essentially arise as norms, specially, if deletions are allowed? Note that the set similarity distance (symmetric difference over union) can be estimated in the streaming model in the absence of deletions [BroderCFM-00].

Recent work provides some preliminary results [GuhaIM-07]. Let $f=(f_1, \ldots, f_n)$ and $g=(g_1, \ldots, g_n)$ be two frequency vectors defined by a stream in the usual way. Consider a distance $d(f,g)= \sum_i \phi(f_i,g_i)$ where $\phi:\mathbb N \times \mathbb N \rightarrow \mathbb R^+$ and $\phi(x,x)=0$. If there exist $a,b,c\in \mathbb N$ such that \[ \max \left ( \frac{\phi(a+c,a)}{\phi(b+c,b)} , \frac{\phi(a,a+c)}{\phi(b,b+c)} \right )> \alpha^2\] then it can be shown that any one-pass $\alpha$-approximation of $d(f,g)$ requires $\Omega(n)$ space where the stream defining $f$ and $g$ has length $O(n(a+b+c))$. Similar results hold for multiple-pass algorithms and for probabilistic divergences of the form $d(f,g)= \sum_i \phi(p_i,q_i)$ where $p_i=f_i/L_1(f)$ and $q_i=g_i/L_1(g)$. These results suggest that for a distance $d$ to be sketchable, $d(x,y)$ needs to be some function of $x-y$. In particular, they show that multiplicative approximation of all $f$-divergences and Bregman divergences, such as Kullback-Leibler and Hellinger, requires $\Omega(n)$ space with $L_1$ and $L_2^2$ being notable exceptions.