Problem 49: Sketching Earth Mover Distance

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Suggested by Piotr Indyk
Source Bertinoro 2011
Short link

For any two subsets $A,B$ of the planar grid $[n]^2$, $|A|=|B|$, define \[ \operatorname{EMD}(A,B)=\min_{\pi: A \to B} \sum_{a \in A} \|a-\pi(a)\|_1, \] where $\pi$ ranges over one-to-one mapping from $A$ to $B$.

Question: What is the sketching complexity of $c$-approximating EMD? That is, consider a distribution over mappings $L_c$ that maps subset of $[n]^2$ to $\{0,1\}^s$, such that for any sets $A,B$ with $|A|=|B|$, given $L_c(A), L_c(B)$, one can estimate $\operatorname{EMD}(A,B)$ up to a factor of $c$, with probability $\ge 2/3$. Is it possible to construct such a distribution for constant $c$ and $s=\operatorname{polylog} n$?

Background: It is known that one can achieve $s=O(\log n)$ for $c=O(\log n)$ by embedding EMD into $\ell_1$ [IndykT-03,Charikar-02], and $s=n^{O(1/c)} \operatorname{polylog} n$ for any $c \ge 1$ [AndoniDIW-09].