Difference between revisions of "Open Problems:26"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
(Created page with "{{DISPLAYTITLE:Problem 26: Equivalence of Two MapReduce Models}} {{Infobox |label1 = Proposed by |data1 = Paul Beame |label2 = Source |data2 = [[Workshops:Kanpur_2009|Kanpur 2...")
 
(updated header)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Problem 26: Equivalence of Two MapReduce Models}}
+
{{Header
{{Infobox
+
|source=kanpur09
|label1 = Proposed by
+
|who=Paul Beame
|data1 = Paul Beame
 
|label2 = Source
 
|data2 = [[Workshops:Kanpur_2009|Kanpur 2009]]
 
|label3 = Short link
 
|data3 = http://sublinear.info/26
 
 
}}
 
}}
 
The original MapReduce paper {{cite|DeanG-04}} gives two distributed models. First it only says that  
 
The original MapReduce paper {{cite|DeanG-04}} gives two distributed models. First it only says that  
 
intermediate key/value pairs with the same key are combined and sent as batch jobs to workers.
 
intermediate key/value pairs with the same key are combined and sent as batch jobs to workers.
 
Then in Section 4.2, it additionally guarantees that the batch jobs received by a single worker are sorted according to the corresponding key values. There are algorithms that rely on this additional feature of MapReduce. Are these two models equivalent? For decision problems in the complexity world, we know strong time-space trade-offs for sorting, but no similar lower bounds are known for distinctness.
 
Then in Section 4.2, it additionally guarantees that the batch jobs received by a single worker are sorted according to the corresponding key values. There are algorithms that rely on this additional feature of MapReduce. Are these two models equivalent? For decision problems in the complexity world, we know strong time-space trade-offs for sorting, but no similar lower bounds are known for distinctness.

Latest revision as of 01:50, 7 March 2013

Suggested by Paul Beame
Source Kanpur 2009
Short link https://sublinear.info/26

The original MapReduce paper [DeanG-04] gives two distributed models. First it only says that intermediate key/value pairs with the same key are combined and sent as batch jobs to workers. Then in Section 4.2, it additionally guarantees that the batch jobs received by a single worker are sorted according to the corresponding key values. There are algorithms that rely on this additional feature of MapReduce. Are these two models equivalent? For decision problems in the complexity world, we know strong time-space trade-offs for sorting, but no similar lower bounds are known for distinctness.