Difference between revisions of "Open Problems:59"
(Created page with "{{Header |title=Low Expansion Encoding of Edit Distance |source=dortmund12 |who=Hossein Jowhari }} Let $T = \bigcup_{i=1}^{n} \{0,1\}^i$. For pair of strings $(x,y) \in T \tim...") |
m (updated header) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
{{Header | {{Header | ||
− | |||
|source=dortmund12 | |source=dortmund12 | ||
|who=Hossein Jowhari | |who=Hossein Jowhari | ||
}} | }} | ||
− | Let $T = \bigcup_{i=1}^{n} \{0,1\}^i$. For pair of strings $(x,y) \in T \times T$ let $ed(x,y)$ denote the edit distance between $x$ and $y$ which is defined as the minimum number of character insertion, deletion and substitution needed for converting $x$ into $y$. | + | Let $T = \bigcup_{i=1}^{n} \{0,1\}^i$. For a pair of strings $(x,y) \in T \times T$, let $\operatorname{ed}(x,y)$ denote the edit distance between $x$ and $y$, which is defined as the minimum number of character insertion, deletion, and substitution needed for converting $x$ into $y$. |
− | '''Question''' | + | '''Question:''' Is there a mapping $f:T \rightarrow \{0,1\}^{m}$ satisfying the following conditions |
* $f$ is injective, i.e. it does not map different inputs to the same point. | * $f$ is injective, i.e. it does not map different inputs to the same point. | ||
* $m=O(n^c)$ for some constant $c \geq 1$. | * $m=O(n^c)$ for some constant $c \geq 1$. | ||
− | * For strings with $ed(x,y)=1$ we have $\mathcal{H}(f(x),f(y)) \le C$ for $C=o(\log n)$. | + | * For strings with $\operatorname{ed}(x,y)=1$ we have $\mathcal{H}(f(x),f(y)) \le C$ for $C=o(\log n)$. |
The same question holds for randomized mappings as long as they map different $x$ and $y$ to different points with high probability. Currently the best upper bound on $C$ is $O(\log n\log^*n)$ achieved through a randomized mapping that deploys the Locally Consistent Parsing method {{cite|CormodePSV-00}}. For non-repetitive strings (the Ulam distance) there is a deterministic mapping with $C\leq 6$ and $c=2$. Preferably we would like to have mappings that are efficiently computable and are equipped with polynomial time decoding algorithms ($x$ can be obtained from $f(x)$ efficiently). See {{cite|Jowhari-12}} for motivations on the problem. | The same question holds for randomized mappings as long as they map different $x$ and $y$ to different points with high probability. Currently the best upper bound on $C$ is $O(\log n\log^*n)$ achieved through a randomized mapping that deploys the Locally Consistent Parsing method {{cite|CormodePSV-00}}. For non-repetitive strings (the Ulam distance) there is a deterministic mapping with $C\leq 6$ and $c=2$. Preferably we would like to have mappings that are efficiently computable and are equipped with polynomial time decoding algorithms ($x$ can be obtained from $f(x)$ efficiently). See {{cite|Jowhari-12}} for motivations on the problem. |
Latest revision as of 02:00, 7 March 2013
Suggested by | Hossein Jowhari |
---|---|
Source | Dortmund 2012 |
Short link | https://sublinear.info/59 |
Let $T = \bigcup_{i=1}^{n} \{0,1\}^i$. For a pair of strings $(x,y) \in T \times T$, let $\operatorname{ed}(x,y)$ denote the edit distance between $x$ and $y$, which is defined as the minimum number of character insertion, deletion, and substitution needed for converting $x$ into $y$.
Question: Is there a mapping $f:T \rightarrow \{0,1\}^{m}$ satisfying the following conditions
- $f$ is injective, i.e. it does not map different inputs to the same point.
- $m=O(n^c)$ for some constant $c \geq 1$.
- For strings with $\operatorname{ed}(x,y)=1$ we have $\mathcal{H}(f(x),f(y)) \le C$ for $C=o(\log n)$.
The same question holds for randomized mappings as long as they map different $x$ and $y$ to different points with high probability. Currently the best upper bound on $C$ is $O(\log n\log^*n)$ achieved through a randomized mapping that deploys the Locally Consistent Parsing method [CormodePSV-00]. For non-repetitive strings (the Ulam distance) there is a deterministic mapping with $C\leq 6$ and $c=2$. Preferably we would like to have mappings that are efficiently computable and are equipped with polynomial time decoding algorithms ($x$ can be obtained from $f(x)$ efficiently). See [Jowhari-12] for motivations on the problem.