Difference between revisions of "Open Problems:61"

From Open Problems in Sublinear Algorithms
Jump to: navigation, search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Header
 
{{Header
|title=RNA Folding
 
 
|source=bertinoro14
 
|source=bertinoro14
 
|who=Qin Zhang
 
|who=Qin Zhang
 
}}
 
}}
 +
An RNA sequence is a string of letters from the alphabet $\{\mbox{A}, \mbox{C}, \mbox{G}, \mbox{U}\}$, where $\mbox{A---U}$ and $\mbox{C---G}$ form pairings. A set of pairings in such a string is said to be non-crossing if there are no pairs of the form $(i, j)$ and $(k, l)$, where $i < k < j < l$.
  
An RNA sequence is a string of letters from the alphabet {A, C, G, U}, where A-U and C-G form pairings. A set of pairings in such a string is said to be non-crossing if there are no pairs of the form $(i, j)$ and $(k, l)$, where $i < k < j < l$.  
+
A ''maximum non-crossing matching'' is a pairing of $\mbox{A---U}$ and $\mbox{C---G}$ of maximum cardinality that is non-crossing. Given a string of length $n$, such a matching can be computed in $O(n^2)$ space and $O(n^3)$ time via dynamic programming {{cite|Eddy-04}}.
  
A maximum non-crossing matching is a pairing of A-U and C-G of maximum cardinality that is non-crossing. Given a string, such a matching can be computed in $n^2$ space and $n^3$ time via dynamic programming {{cite|Eddy-04}}.
+
Note that there is a trivial 2-approximation to the optimal matching. Find the optimal matchings on the $\{\mbox{A}, \mbox{U}\}$ and $\{\mbox{C}, \mbox{G}\}$ subsequences, and take the larger one. In particular, this implies that a 2-approximation to the ''size'' of the optimal non-crossing matching can be computed in $O(1)$ space.
  
Note that there is a trivial 2-approximation to the optimal matching. Find the optimal matchings on the A, U and C, G subsequences, and take the larger one. This can be computed in a stream.
+
How well can the size of the optimal non-crossing matching be approximated in $\operatorname{polylog}(n)$ space and a small number of passes?
  
Is there a streaming algorithm that yields a factor better than 2 (in a small number of passes)?
+
(Additional question from Alex Andoni: Can the problem be solved in the RAM model with running time better than the trivial dynamic programming?)

Latest revision as of 22:55, 13 June 2014

Suggested by Qin Zhang
Source Bertinoro 2014
Short link https://sublinear.info/61

An RNA sequence is a string of letters from the alphabet $\{\mbox{A}, \mbox{C}, \mbox{G}, \mbox{U}\}$, where $\mbox{A---U}$ and $\mbox{C---G}$ form pairings. A set of pairings in such a string is said to be non-crossing if there are no pairs of the form $(i, j)$ and $(k, l)$, where $i < k < j < l$.

A maximum non-crossing matching is a pairing of $\mbox{A---U}$ and $\mbox{C---G}$ of maximum cardinality that is non-crossing. Given a string of length $n$, such a matching can be computed in $O(n^2)$ space and $O(n^3)$ time via dynamic programming [Eddy-04].

Note that there is a trivial 2-approximation to the optimal matching. Find the optimal matchings on the $\{\mbox{A}, \mbox{U}\}$ and $\{\mbox{C}, \mbox{G}\}$ subsequences, and take the larger one. In particular, this implies that a 2-approximation to the size of the optimal non-crossing matching can be computed in $O(1)$ space.

How well can the size of the optimal non-crossing matching be approximated in $\operatorname{polylog}(n)$ space and a small number of passes?

(Additional question from Alex Andoni: Can the problem be solved in the RAM model with running time better than the trivial dynamic programming?)