Problem 31: Gap-Hamming Information Cost

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Suggested by Amit Chakrabarti
Source Kanpur 2009
Short link

In the Gap-Hamming problem, two players Alice and Bob have vectors $x,y\in \{0,1\}^n$ respectively and wish to compute the function $f$ \[ f(x,y)= \begin{cases} 0 & \mbox{ if } \Delta(x,y) \leq n/2-\sqrt{n} \\ 1 & \mbox{ if } \Delta(x,y) \geq n/2+\sqrt{n} \end{cases} \] where $\Delta(x,y)=|\{i:x_i\neq y_i\}|$ is the Hamming distance between the vectors and we are promised that $|\Delta(x,y)-n/2|\geq \sqrt{n}$. The problem became interesting in the streaming community because a lower bound on the communication complexity of evaluating $f$ yields a lower bound on the space required by a streaming algorithm to estimate the number of distinct elements or the entropy of a stream. After a series of papers, it is know that evaluating $f$ requires $\Omega(n)$ communication [IndykW-03,Woodruff-04,BrodyC-09,BrodyCRVW-10,ChakrabartiR-11] even if an unlimited number of rounds of communication are used.

An increasingly popular technique in communication complexity is to prove bounds by bounding the information cost [ChakrabartiSWY-01,BarYossefJKS-04]. Here we consider random input $(X,Y)$ and consider the mutual information between the input and the random transcript of the protocol $\Pi(X,Y)$: \[ I(XY;\Pi(X,Y))=H(XY)-H(XY|\Pi(X,Y)) \ . \] It would be interesting to prove a lower bound on the information cost for the Gap-Hamming problem for some natural input distribution.


A linear lower bound on the information cost for a certain distribution was shown by [KerenidisLLRX-12]. Later, [BravermanGPW-13] showed that a linear lower bound also holds for the uniform distribution.