Difference between revisions of "Open Problems:4"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
m (1 revision)
Line 1: Line 1:
{{DISPLAYTITLE:Problem 4: Deterministic Summary Structures}}
+
{{Header
{{Infobox
+
|number=4
|label1 = Proposed by
+
|title=Deterministic Summary Structures
|data1 = Sumit Ganguly
+
|source=kanpur06
|label2 = Source
+
|who=Sumit Ganguly
|data2 = [[Workshops:Kanpur_2006|Kanpur 2006]]
 
|label3 = Short link
 
|data3 = http://sublinear.info/4
 
 
}}
 
}}
 
Given a stream of elements of the form $(i,\delta)$ where $i\in [n]$ and $\delta\in \{-1,1\}$ define the frequency of an element to be $f_i=\sum_{(i,\delta)} \delta$. We wish to find estimates $\hat{f}_i$ for each $f_i$ such that  
 
Given a stream of elements of the form $(i,\delta)$ where $i\in [n]$ and $\delta\in \{-1,1\}$ define the frequency of an element to be $f_i=\sum_{(i,\delta)} \delta$. We wish to find estimates $\hat{f}_i$ for each $f_i$ such that  

Revision as of 02:08, 16 November 2012

Suggested by Sumit Ganguly
Source Kanpur 2006
Short link https://sublinear.info/4

Given a stream of elements of the form $(i,\delta)$ where $i\in [n]$ and $\delta\in \{-1,1\}$ define the frequency of an element to be $f_i=\sum_{(i,\delta)} \delta$. We wish to find estimates $\hat{f}_i$ for each $f_i$ such that \[|\hat{f}_i-f_i|\leq \epsilon L_1\] where $L_1=\sum_i |f_i|$. The Count-Min algorithm is a randomized $O(\epsilon^{-1} \log (mn) \log \delta^{-1})$-space algorithm that returns such estimates with probability $1-\delta$ [CormodeM-05]. This is nearly optimal as the space lower bound is $O(\epsilon^{-1} \log (m) \log \epsilon n)$ [Ganguly-06].

However, in practice it is desirable to have deterministic algorithms rather than randomized algorithms. Using a deterministic collection of primes [Muthukrishnan-06], Ganguly [Ganguly-06] devised a deterministic $O(\phi^{-2}\epsilon^{-1} \log^2(mn))$-space algorithm that returned all items $i$ with $|f_i|\geq \phi L_1$ and no $j$ satisfying $|f_j|\leq (1-\epsilon)\phi L_1$. While this algorithm has the advantage of being deterministic, it uses more space than the Count-Min algorithm. Does there exist a deterministic algorithm that uses the same amount of space as Count-Min? Such an algorithm would lead to space-efficient algorithms for a range of problems including hierarchical heavy hitters, estimating inner product sizes, approximately optimal $B$-bucket histograms etc. Unfortunately, we conjecture that no such algorithm exists. Either an algorithm or lower bound would be very interesting.