Difference between revisions of "Open Problems:57"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
Line 5: Line 5:
 
}}
 
}}
 
Consider the problem of “codeword testing” in the data stream model. In particular, consider a code
 
Consider the problem of “codeword testing” in the data stream model. In particular, consider a code
\[C:\Sigma^k\rightarrow\Sigma^n,\]
+
$C:\Sigma^k\rightarrow\Sigma^n$
with distance $d$. Recall that the distance of a code $C$ is the minimum Hamming distance between any two codewords, i.e., $\min_{\mathbb{x}\neq \mathbb{y}\in\Sigma^k} |\{i\in [n]| C(\mathbb{x})_i\neq C(\mathbb{y})_i\}|$. The specific problem is the following
+
with distance<ref>The distance of a code $C$ is the minimum Hamming distance between any two codewords, i.e., $\min_{\mathbb{x}\neq \mathbb{y}\in\Sigma^k} |\{i\in [n]| C(\mathbb{x})_i\neq C(\mathbb{y})_i\}|$.</ref> $d$. The specific problem is the following:
  The input to the problem is a vector $\mathbb{y}\in\Sigma^n$ and integer parameters $0\le \tau_1<\tau_2\le n$. The algorithm has to decide whether
+
<blockquote>The input to the problem is a vector $\mathbb{y}\in\Sigma^n$ and integer parameters $0\le \tau_1<\tau_2\le n$. The algorithm has to decide whether
  \[\Delta(\mathbb{y},C)\le \tau_1~ \mathrm{ or }~ \Delta(\mathbb{y},C)\geq \tau_2.\]
+
\[\Delta(\mathbb{y},C)\le \tau_1~ \mathrm{ or }~ \Delta(\mathbb{y},C)\geq \tau_2,\]
  $\Delta(\mathbb{y},C)$ is the Hamming distance of $\mathbb{y}$ from the closest codeword in $C$.
+
where $\Delta(\mathbb{y},C)$ is the Hamming distance of $\mathbb{y}$ from the closest codeword in $C$.</blockquote>
  
 
Ideally, we want a one-pass, $\log^{O(1)}{n}$ space algorithm to solve the problem above for some ''good'' code $C$ (that is, we have $k\ge \Omega(n)$ and $d\ge \Omega(n)$). Or if we prove a hardness result, one would like a hardness result for ''every'' good code $C$. (For the sake of simplicity, assume that the algorithm has access to some succinct description of the code $C$.)
 
Ideally, we want a one-pass, $\log^{O(1)}{n}$ space algorithm to solve the problem above for some ''good'' code $C$ (that is, we have $k\ge \Omega(n)$ and $d\ge \Omega(n)$). Or if we prove a hardness result, one would like a hardness result for ''every'' good code $C$. (For the sake of simplicity, assume that the algorithm has access to some succinct description of the code $C$.)
Line 17: Line 17:
 
One of the original motivation (in {{cite|RudraU-10}}) for the study of the data-streaming version of the question was possibly to use communication complexity results to prove the impossibility of good locally testable codes.
 
One of the original motivation (in {{cite|RudraU-10}}) for the study of the data-streaming version of the question was possibly to use communication complexity results to prove the impossibility of good locally testable codes.
  
It was shown in {{cite|RudraU-10}} that for the well-known Reed-Solomon codes, the data stream version of the problem can be solved for $\tau_1=0$ and $\tau_2=1$ with one pass and logarithmic space. It can also be shown that the classical Berlekamp-Massey algorithm for decoding Reed-Solomon codes implies a solution for the case $\tau_2=\tau_1+1$ with one pass and space $\tilde{O}(\tau_1)$. (There is a small catch: the algorithm actually computes the location of errors ''if'' the number of errors is at most $\tau_1$. However, results in {{cite|RudraU-10}} can be used to verify if the returned error locations are indeed correct.)
+
It was shown in {{cite|RudraU-10}} that for the well-known Reed-Solomon codes, the data stream version of the problem can be solved for $\tau_1=0$ and $\tau_2=1$ with one pass and logarithmic space. It can also be shown that the classical Berlekamp-Massey algorithm for decoding Reed-Solomon codes implies a solution for the case $\tau_2=\tau_1+1$ with one pass and space $\tilde{O}(\tau_1)$<ref>There is a small catch: the algorithm actually computes the location of errors ''if'' the number of errors is at most $\tau_1$. However, results in {{cite|RudraU-10}} can be used to verify if the returned error locations are indeed correct.</ref>.
 
 
 
Finally, {{cite|McGregorRU-11}} showed how to solve this problem in one pass and $O(k\log{n})$ space.
 
Finally, {{cite|McGregorRU-11}} showed how to solve this problem in one pass and $O(k\log{n})$ space.
 
This question is wide open:
 
This question is wide open:
  Solve the problem above with one pass and $\tilde{O}(\min(k,\tau_1))$ space.
+
<blockquote>Solve the problem above with one pass and $\tilde{O}(\min(k,\tau_1))$ space.</blockquote>
  
 
In fact the very special case of the problem above for $k=\tau_1=\sqrt{n}$ with one pass and space $o(\sqrt{n})$ is also open. This is open even for the special case of Reed-Solomon codes.
 
In fact the very special case of the problem above for $k=\tau_1=\sqrt{n}$ with one pass and space $o(\sqrt{n})$ is also open. This is open even for the special case of Reed-Solomon codes.
 +
 +
==Notes==
 +
<references />

Revision as of 05:06, 12 December 2012

Suggested by Atri Rudra
Source Dortmund 2012
Short link https://sublinear.info/57

Consider the problem of “codeword testing” in the data stream model. In particular, consider a code $C:\Sigma^k\rightarrow\Sigma^n$ with distance[1] $d$. The specific problem is the following:

The input to the problem is a vector $\mathbb{y}\in\Sigma^n$ and integer parameters $0\le \tau_1<\tau_2\le n$. The algorithm has to decide whether

\[\Delta(\mathbb{y},C)\le \tau_1~ \mathrm{ or }~ \Delta(\mathbb{y},C)\geq \tau_2,\]

where $\Delta(\mathbb{y},C)$ is the Hamming distance of $\mathbb{y}$ from the closest codeword in $C$.

Ideally, we want a one-pass, $\log^{O(1)}{n}$ space algorithm to solve the problem above for some good code $C$ (that is, we have $k\ge \Omega(n)$ and $d\ge \Omega(n)$). Or if we prove a hardness result, one would like a hardness result for every good code $C$. (For the sake of simplicity, assume that the algorithm has access to some succinct description of the code $C$.)

The main technical motivation comes from the case when $\tau_1=0$ and $\tau_2\ge \epsilon n$ for any fixed $\epsilon>0$ but with constant number of queries to $\mathbb{y}$ (i.e. in the property testing world). This question is perhaps the open question in the codeword testing literature. The case of $\tau_1>0$ also makes sense in the property testing world and has been studied [GuruswamiR-05]. (See the paper for some potential practical motivations.)

One of the original motivation (in [RudraU-10]) for the study of the data-streaming version of the question was possibly to use communication complexity results to prove the impossibility of good locally testable codes.

It was shown in [RudraU-10] that for the well-known Reed-Solomon codes, the data stream version of the problem can be solved for $\tau_1=0$ and $\tau_2=1$ with one pass and logarithmic space. It can also be shown that the classical Berlekamp-Massey algorithm for decoding Reed-Solomon codes implies a solution for the case $\tau_2=\tau_1+1$ with one pass and space $\tilde{O}(\tau_1)$[2]. Finally, [McGregorRU-11] showed how to solve this problem in one pass and $O(k\log{n})$ space. This question is wide open:

Solve the problem above with one pass and $\tilde{O}(\min(k,\tau_1))$ space.

In fact the very special case of the problem above for $k=\tau_1=\sqrt{n}$ with one pass and space $o(\sqrt{n})$ is also open. This is open even for the special case of Reed-Solomon codes.

Notes

  1. The distance of a code $C$ is the minimum Hamming distance between any two codewords, i.e., $\min_{\mathbb{x}\neq \mathbb{y}\in\Sigma^k} |\{i\in [n]| C(\mathbb{x})_i\neq C(\mathbb{y})_i\}|$.
  2. There is a small catch: the algorithm actually computes the location of errors if the number of errors is at most $\tau_1$. However, results in [RudraU-10] can be used to verify if the returned error locations are indeed correct.