Problem 61: RNA Folding
Suggested by | Qin Zhang |
---|---|
Source | Bertinoro 2014 |
Short link | https://sublinear.info/61 |
An RNA sequence is a string of letters from the alphabet $\{\mbox{A}, \mbox{C}, \mbox{G}, \mbox{U}\}$, where $\mbox{A---U}$ and $\mbox{C---G}$ form pairings. A set of pairings in such a string is said to be non-crossing if there are no pairs of the form $(i, j)$ and $(k, l)$, where $i < k < j < l$.
A maximum non-crossing matching is a pairing of $\mbox{A---U}$ and $\mbox{C---G}$ of maximum cardinality that is non-crossing. Given a string of length $n$, such a matching can be computed in $O(n^2)$ space and $O(n^3)$ time via dynamic programming [Eddy-04].
Note that there is a trivial 2-approximation to the optimal matching. Find the optimal matchings on the $\{\mbox{A}, \mbox{U}\}$ and $\{\mbox{C}, \mbox{G}\}$ subsequences, and take the larger one. In particular, this implies that a 2-approximation to the size of the optimal non-crossing matching can be computed in $O(1)$ space.
How well can the size of the optimal non-crossing matching be approximated in $\operatorname{polylog}(n)$ space and a small number of passes?