Problem 49: Sketching Earth Mover Distance
Suggested by | Piotr Indyk |
---|---|
Source | Bertinoro 2011 |
Short link | https://sublinear.info/49 |
For any two subsets A,B of the planar grid [n]^2, |A|=|B|, define \operatorname{EMD}(A,B)=\min_{\pi: A \to B} \sum_{a \in A} \|a-\pi(a)\|_1, where \pi ranges over one-to-one mapping from A to B.
Question: What is the sketching complexity of c-approximating EMD? That is, consider a distribution over mappings L_c that maps subset of [n]^2 to \{0,1\}^s, such that for any sets A,B with |A|=|B|, given L_c(A), L_c(B), one can estimate \operatorname{EMD}(A,B) up to a factor of c, with probability \ge 2/3. Is it possible to construct such a distribution for constant c and s=\operatorname{polylog} n?
Background: It is known that one can achieve s=O(\log n) for c=O(\log n) by embedding EMD into \ell_1 [IndykT-03,Charikar-02], and s=n^{O(1/c)} \operatorname{polylog} n for any c \ge 1 [AndoniDIW-09].