Problem 7: Estimating Earth-Mover Distance

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Suggested by Piotr Indyk
Source Kanpur 2006
Short link

Consider a stream of red points $R$ and blue points $B$ from a 2-dimensional grid $[\Delta]^2$, in an arbitrary order. We assume $|R|=|B|=n$. The Earth-Mover Distance (EMD) between $R$ and $B$ is the value of the min-cost matching between $R$ and $B$, i.e., \[ \mbox{EMD}(R,B)= \min_{\pi: R \to B} \sum_{p \in R} \|p-\pi(p)\|, \] where $\pi$ is a one-to-one mapping, and $\|\cdot\|$ is (say) the $L_1$ norm.

What are the space vs. approximation tradeoffs achievable by streaming algorithms for this problem? In particular, is there an $O(1)$-approximation algorithm using $(\log n+\log \Delta)^{O(1)}$ space?

It is known that that there is an $O(\log \Delta)$-approximation algorithm using that much space [Indyk-04]. That algorithm proceeds essentially by embedding EMD into $L_1$ [Charikar-02,IndykT-03]. However, any such embedding must incur at least $\Omega( \sqrt{\log \Delta})$ distortion [NaorS-06]. So one would need to do something else to get $O(1)$-approximation.


It was shown that the decision version of the problem (distinguish the cases, when the EMD distance is at most $1$ vs. at least $D$ for $D = O(1)$) does not allow sketches of the constant size [AndoniKR-14].