Problem 57: Coding Theory in the Streaming Model
Suggested by | Atri Rudra |
---|---|
Source | Dortmund 2012 |
Short link | https://sublinear.info/57 |
Consider the problem of “codeword testing” in the data stream model. In particular, consider a code with distance[1] d. The specific problem is the following:
The input to the problem is a vector \mathbb{y}\in\Sigma^n and integer parameters 0\le \tau_1<\tau_2\le n. The algorithm has to decide whether
\Delta(\mathbb{y},C)\le \tau_1~ \mathrm{ or }~ \Delta(\mathbb{y},C)\geq \tau_2,
where \Delta(\mathbb{y},C) is the Hamming distance of \mathbb{y} from the closest codeword in C.
Ideally, we want a one-pass, \log^{O(1)}{n} space algorithm to solve the problem above for some good code C (that is, we have k\ge \Omega(n) and d\ge \Omega(n)). Or if we prove a hardness result, one would like a hardness result for every good code C. (For the sake of simplicity, assume that the algorithm has access to some succinct description of the code C.)
The main technical motivation comes from the case when \tau_1=0 and \tau_2\ge \epsilon n for any fixed \epsilon>0 but with constant number of queries to \mathbb{y} (i.e. in the property testing world). This question is perhaps the open question in the codeword testing literature. The case of \tau_1>0 also makes sense in the property testing world and has been studied [GuruswamiR-05]. (See the paper for some potential practical motivations.)
One of the original motivation (in [RudraU-10]) for the study of the data-streaming version of the question was possibly to use communication complexity results to prove the impossibility of good locally testable codes.
It was shown in [RudraU-10] that for the well-known Reed-Solomon codes, the data stream version of the problem can be solved for \tau_1=0 and \tau_2=1 with one pass and logarithmic space. It can also be shown that the classical Berlekamp-Massey algorithm for decoding Reed-Solomon codes implies a solution for the case \tau_2=\tau_1+1 with one pass and space \tilde{O}(\tau_1)[2]. Finally, [McGregorRU-11] showed how to solve this problem in one pass and O(k\log{n}) space. This question is wide open:
Solve the problem above with one pass and \tilde{O}(\min(k,\tau_1)) space.
In fact the very special case of the problem above for k=\tau_1=\sqrt{n} with one pass and space o(\sqrt{n}) is also open. This is open even for the special case of Reed-Solomon codes.
Notes[edit]
- Jump up ↑ The distance of a code C is the minimum Hamming distance between any two codewords, i.e., \min_{\mathbb{x}\neq \mathbb{y}\in\Sigma^k} |\{i\in [n]| C(\mathbb{x})_i\neq C(\mathbb{y})_i\}|.
- Jump up ↑ There is a small catch: the algorithm actually computes the location of errors if the number of errors is at most \tau_1. However, results in [RudraU-10] can be used to verify if the returned error locations are indeed correct.