Problem 20: Relations between Streaming Models

From Open Problems in Sublinear Algorithms
Revision as of 01:42, 7 March 2013 by Krzysztof Onak (talk | contribs) (updating header)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Suggested by Christian Sohler
Source Kanpur 2006
Short link https://sublinear.info/20

There are many different models for data streaming. For example, in geometry we have the insertion-only model, insertion/deletion model, and the sliding window model. In the insertion-only model we are given a stream of points $p_1,\dots, p_n$. In the insertion/deletion model the stream consists of Insert$(p)$ and Delete$(p)$ operations and is assumed to be valid in the sense that no point is deleted that has not been previously inserted and no point is inserted twice. In the sliding window model we get a stream $p_1,\ldots,p_m$ but we are only interested in the $n$ most recent points.

How do these models relate to each other? Obviously, any algorithm for the insertion/deletion model is also an algorithm in the insertion-only model. Under which assumptions is the opposite true as well? Is there any relation between the insertion/deletion model and the sliding window model? The models are not equivalent since one can obtain an exact algorithm for the sum of points (centroid) under the insertion/deletion model but only a $(1+\epsilon)$-approximation in the sliding window model. Is it possible to prove that (under reasonable assumptions) these two models are equivalent within a certain approximation factor, i.e., if there is an $\alpha$-approximation algorithm in one model then there is a $(c \alpha)$-approximation algorithm in the other model?

Since the above questions are quite general and may be difficult to answer, here is one that may be easier to solve: Can you prove that the reset model [HoffmannMR-04] is equivalent to uniform sampling?

In the reset model we have a stream of updates $(i,p)$ telling that the new position of point number $i$ is $p$. The conjecture is that the reset model is equivalent to uniform random sampling. Since one direction is immediate, one has to prove that any algorithm in the reset model can be turned into a streaming algorithm that initially chooses a set of points (indices) uniformly at random and tracks the positions of these points. Then the algorithm computes its output based on the position of these points.