Difference between revisions of "Open Problems:5"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
m (1 revision)
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Problem 5: Characterizing Sketchable Distances}}
+
{{Header
{{Infobox
+
|source=kanpur06
|label1 = Proposed by
+
|who=Sudipto Guha and Piotr Indyk
|data1 = Sudipto Guha and Piotr Indyk
 
|label2 = Source
 
|data2 = [[Workshops:Kanpur_2006|Kanpur 2006]]
 
|label3 = Short link
 
|data3 = http://sublinear.info/5
 
 
}}
 
}}
 
Some of the early successes in developing algorithms for the data stream model
 
Some of the early successes in developing algorithms for the data stream model
Line 35: Line 30:
 
In particular, they show that multiplicative approximation of all $f$-divergences and Bregman divergences, such as Kullback-Leibler  
 
In particular, they show that multiplicative approximation of all $f$-divergences and Bregman divergences, such as Kullback-Leibler  
 
and Hellinger, requires $\Omega(n)$ space with $L_1$ and $L_2^2$ being notable exceptions.
 
and Hellinger, requires $\Omega(n)$ space with $L_1$ and $L_2^2$ being notable exceptions.
 +
 +
==Update==
 +
It was shown that for '''norms''' small sketches are equivalent to good embeddings into $\ell_{1 - \varepsilon}$ {{cite|AndoniKR-14}}.

Latest revision as of 21:20, 11 November 2014

Suggested by Sudipto Guha and Piotr Indyk
Source Kanpur 2006
Short link https://sublinear.info/5

Some of the early successes in developing algorithms for the data stream model related to estimating $L_p$ norms [FeigenbaumKSV-02,Indyk-00,AlonMS-99] and the “Hamming norm” $L_0$ [CormodeDIM-03]. What other distances, or more generally “measures of dissimilarity,” can be approximated in the data stream model? Do all sketchable distances essentially arise as norms, specially, if deletions are allowed? Note that the set similarity distance (symmetric difference over union) can be estimated in the streaming model in the absence of deletions [BroderCFM-00].

Recent work provides some preliminary results [GuhaIM-07]. Let $f=(f_1, \ldots, f_n)$ and $g=(g_1, \ldots, g_n)$ be two frequency vectors defined by a stream in the usual way. Consider a distance $d(f,g)= \sum_i \phi(f_i,g_i)$ where $\phi:\mathbb N \times \mathbb N \rightarrow \mathbb R^+$ and $\phi(x,x)=0$. If there exist $a,b,c\in \mathbb N$ such that \[ \max \left ( \frac{\phi(a+c,a)}{\phi(b+c,b)} , \frac{\phi(a,a+c)}{\phi(b,b+c)} \right )> \alpha^2\] then it can be shown that any one-pass $\alpha$-approximation of $d(f,g)$ requires $\Omega(n)$ space where the stream defining $f$ and $g$ has length $O(n(a+b+c))$. Similar results hold for multiple-pass algorithms and for probabilistic divergences of the form $d(f,g)= \sum_i \phi(p_i,q_i)$ where $p_i=f_i/L_1(f)$ and $q_i=g_i/L_1(g)$. These results suggest that for a distance $d$ to be sketchable, $d(x,y)$ needs to be some function of $x-y$. In particular, they show that multiplicative approximation of all $f$-divergences and Bregman divergences, such as Kullback-Leibler and Hellinger, requires $\Omega(n)$ space with $L_1$ and $L_2^2$ being notable exceptions.

Update[edit]

It was shown that for norms small sketches are equivalent to good embeddings into $\ell_{1 - \varepsilon}$ [AndoniKR-14].