Difference between revisions of "Open Problems:4"
m (1 revision) |
|||
Line 1: | Line 1: | ||
− | {{ | + | {{Header |
− | + | |number=4 | |
− | | | + | |title=Deterministic Summary Structures |
− | | | + | |source=kanpur06 |
− | + | |who=Sumit Ganguly | |
− | |||
− | |||
− | |||
}} | }} | ||
Given a stream of elements of the form $(i,\delta)$ where $i\in [n]$ and $\delta\in \{-1,1\}$ define the frequency of an element to be $f_i=\sum_{(i,\delta)} \delta$. We wish to find estimates $\hat{f}_i$ for each $f_i$ such that | Given a stream of elements of the form $(i,\delta)$ where $i\in [n]$ and $\delta\in \{-1,1\}$ define the frequency of an element to be $f_i=\sum_{(i,\delta)} \delta$. We wish to find estimates $\hat{f}_i$ for each $f_i$ such that |
Revision as of 02:08, 16 November 2012
Suggested by | Sumit Ganguly |
---|---|
Source | Kanpur 2006 |
Short link | https://sublinear.info/4 |
Given a stream of elements of the form $(i,\delta)$ where $i\in [n]$ and $\delta\in \{-1,1\}$ define the frequency of an element to be $f_i=\sum_{(i,\delta)} \delta$. We wish to find estimates $\hat{f}_i$ for each $f_i$ such that \[|\hat{f}_i-f_i|\leq \epsilon L_1\] where $L_1=\sum_i |f_i|$. The Count-Min algorithm is a randomized $O(\epsilon^{-1} \log (mn) \log \delta^{-1})$-space algorithm that returns such estimates with probability $1-\delta$ [CormodeM-05]. This is nearly optimal as the space lower bound is $O(\epsilon^{-1} \log (m) \log \epsilon n)$ [Ganguly-06].
However, in practice it is desirable to have deterministic algorithms rather than randomized algorithms. Using a deterministic collection of primes [Muthukrishnan-06], Ganguly [Ganguly-06] devised a deterministic $O(\phi^{-2}\epsilon^{-1} \log^2(mn))$-space algorithm that returned all items $i$ with $|f_i|\geq \phi L_1$ and no $j$ satisfying $|f_j|\leq (1-\epsilon)\phi L_1$. While this algorithm has the advantage of being deterministic, it uses more space than the Count-Min algorithm. Does there exist a deterministic algorithm that uses the same amount of space as Count-Min? Such an algorithm would lead to space-efficient algorithms for a range of problems including hierarchical heavy hitters, estimating inner product sizes, approximately optimal $B$-bucket histograms etc. Unfortunately, we conjecture that no such algorithm exists. Either an algorithm or lower bound would be very interesting.