# Problem 31: Gap-Hamming Information Cost

In the Gap-Hamming problem, two players Alice and Bob have vectors $x,y\in \{0,1\}^n$ respectively and wish to compute the function $f$ $f(x,y)= \begin{cases} 0 & \mbox{ if } \Delta(x,y) \leq n/2-\sqrt{n} \\ 1 & \mbox{ if } \Delta(x,y) \geq n/2+\sqrt{n} \end{cases}$ where $\Delta(x,y)=|\{i:x_i\neq y_i\}|$ is the Hamming distance between the vectors and we are promised that $|\Delta(x,y)-n/2|\geq \sqrt{n}$. The problem became interesting in the streaming community because a lower bound on the communication complexity of evaluating $f$ yields a lower bound on the space required by a streaming algorithm to estimate the number of distinct elements or the entropy of a stream. After a series of papers, it is know that evaluating $f$ requires $\Omega(n)$ communication [IndykW-03,Woodruff-04,BrodyC-09,BrodyCRVW-10,ChakrabartiR-11] even if an unlimited number of rounds of communication are used.
An increasingly popular technique in communication complexity is to prove bounds by bounding the information cost [ChakrabartiSWY-01,BarYossefJKS-04]. Here we consider random input $(X,Y)$ and consider the mutual information between the input and the random transcript of the protocol $\Pi(X,Y)$: $I(XY;\Pi(X,Y))=H(XY)-H(XY|\Pi(X,Y)) \ .$ It would be interesting to prove a lower bound on the information cost for the Gap-Hamming problem for some natural input distribution.