Difference between revisions of "Open Problems:13"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Line 3: Line 3:
 
|source=kanpur06
 
|source=kanpur06
 
|who=Yossi Matias
 
|who=Yossi Matias
}}
 
 
 
{{DISPLAYTITLE:Problem 13: Effects of Subsampling}}
 
{{Infobox
 
|label1 = Proposed by
 
|data1 = Yossi Matias
 
|label2 = Source
 
|data2 = [[Workshops:Kanpur_2006|Kanpur 2006]]
 
|label3 = Short link
 
|data3 = http://sublinear.info/13
 
 
}}
 
}}
 
When processing very fast streams, it is not feasible to run a streaming algorithm on the entire stream, even one that can process each element in $O(1)$ time. Rather it is necessary to sample from the stream and to process the sub-stream using a streaming algorithm. For standard problems such as estimating $F_0$, how does the sub-sampling affect that the accuracy of the streaming algorithms? How should the sampling rate and the per-element time-complexity of a streaming algorithm be traded-off to achieve optimal results?  
 
When processing very fast streams, it is not feasible to run a streaming algorithm on the entire stream, even one that can process each element in $O(1)$ time. Rather it is necessary to sample from the stream and to process the sub-stream using a streaming algorithm. For standard problems such as estimating $F_0$, how does the sub-sampling affect that the accuracy of the streaming algorithms? How should the sampling rate and the per-element time-complexity of a streaming algorithm be traded-off to achieve optimal results?  

Revision as of 05:08, 16 November 2012

Suggested by Yossi Matias
Source Kanpur 2006
Short link https://sublinear.info/13

When processing very fast streams, it is not feasible to run a streaming algorithm on the entire stream, even one that can process each element in $O(1)$ time. Rather it is necessary to sample from the stream and to process the sub-stream using a streaming algorithm. For standard problems such as estimating $F_0$, how does the sub-sampling affect that the accuracy of the streaming algorithms? How should the sampling rate and the per-element time-complexity of a streaming algorithm be traded-off to achieve optimal results?

Another way to formalize this question, suggested by Muthukrishnan, is in terms of what part of the stream to skip and which to stream. A formal definition of the model and algorithms for estimating $F_2$ and others can be found in [BhattacharyyaMMY-07].