# Problem 57: Coding Theory in the Streaming Model

Suggested by Atri Rudra Dortmund 2012 https://sublinear.info/57

Consider the problem of “codeword testing” in the data stream model. In particular, consider a code $C:\Sigma^k\rightarrow\Sigma^n$ with distance[1] $d$. The specific problem is the following:

The input to the problem is a vector $\mathbb{y}\in\Sigma^n$ and integer parameters $0\le \tau_1<\tau_2\le n$. The algorithm has to decide whether

$\Delta(\mathbb{y},C)\le \tau_1~ \mathrm{ or }~ \Delta(\mathbb{y},C)\geq \tau_2,$

where $\Delta(\mathbb{y},C)$ is the Hamming distance of $\mathbb{y}$ from the closest codeword in $C$.

Ideally, we want a one-pass, $\log^{O(1)}{n}$ space algorithm to solve the problem above for some good code $C$ (that is, we have $k\ge \Omega(n)$ and $d\ge \Omega(n)$). Or if we prove a hardness result, one would like a hardness result for every good code $C$. (For the sake of simplicity, assume that the algorithm has access to some succinct description of the code $C$.)

The main technical motivation comes from the case when $\tau_1=0$ and $\tau_2\ge \epsilon n$ for any fixed $\epsilon>0$ but with constant number of queries to $\mathbb{y}$ (i.e. in the property testing world). This question is perhaps the open question in the codeword testing literature. The case of $\tau_1>0$ also makes sense in the property testing world and has been studied [GuruswamiR-05]. (See the paper for some potential practical motivations.)

One of the original motivation (in [RudraU-10]) for the study of the data-streaming version of the question was possibly to use communication complexity results to prove the impossibility of good locally testable codes.

It was shown in [RudraU-10] that for the well-known Reed-Solomon codes, the data stream version of the problem can be solved for $\tau_1=0$ and $\tau_2=1$ with one pass and logarithmic space. It can also be shown that the classical Berlekamp-Massey algorithm for decoding Reed-Solomon codes implies a solution for the case $\tau_2=\tau_1+1$ with one pass and space $\tilde{O}(\tau_1)$[2]. Finally, [McGregorRU-11] showed how to solve this problem in one pass and $O(k\log{n})$ space. This question is wide open:

Solve the problem above with one pass and $\tilde{O}(\min(k,\tau_1))$ space.

In fact the very special case of the problem above for $k=\tau_1=\sqrt{n}$ with one pass and space $o(\sqrt{n})$ is also open. This is open even for the special case of Reed-Solomon codes.

## Notes

1. The distance of a code $C$ is the minimum Hamming distance between any two codewords, i.e., $\min_{\mathbb{x}\neq \mathbb{y}\in\Sigma^k} |\{i\in [n]| C(\mathbb{x})_i\neq C(\mathbb{y})_i\}|$.
2. There is a small catch: the algorithm actually computes the location of errors if the number of errors is at most $\tau_1$. However, results in [RudraU-10] can be used to verify if the returned error locations are indeed correct.