# Problem 50: Sparse Recovery for Tree Models

Suggested by | Piotr Indyk |
---|---|

Source | Bertinoro 2011 |

Short link | https://sublinear.info/50 |

For any $n=2^{h}-1$, we can identify the coordinates of a vector $v \in \mathbb R^n$ with the nodes of a full binary tree $B_h$ of height $h$ and root $1$. We define a *$k$-sparse tree model* ${\mathcal T}_k$ to be a set of all $T \subset [n]$ of size $k$ which form a connected subtree in $B_h$ rooted at $1$.

We want to design an $m \times n$ matrix $A$ such that for any $x \in \mathbb R^n$, one can recover from $Ax$ a vector $x^* \in \mathbb R^n$ such that:

\[ \left\|x^*-x\right\|_1 \leq \min_{x'\in\mathbb R^n,\ \operatorname{supp}(x') \subset T \mbox{ for some } T \in {\mathcal T}_k } C \left\|x'-x\right\|_1, \] where $\operatorname{supp}(x')$ is the set of non-zero coefficients of $x'$, and $C>0$ is a constant.

**Question:** Is it possible to achieve $m=O(k)$ for some constant $C>0$?

**Background:** It is possible to achieve a weaker bound of $m=O(k \log(n/k))$ even if ${\mathcal T}_k$ is replaced by the set of all $k$-subsets of $[n]$ [CandesRT-06a]. However, since $|{\mathcal T}_k | =\exp(O(k))$, one can expect a better bound for ${\mathcal T}_k$. The best $C$ we know how to achieve for $m = O(k)$ is $O(\sqrt{\log n})$ [IndykP-11] (building on [BaraniukCDH-10]).

## Update

Indyk and Razenshteyn [IndykR-13] improved the bound on $m$ to $O(k \log(n / k) / \log \log (n / k))$ for constant $C$. In a follow-up paper, Bah et al. [BahBC-14] presented an *efficient* recovery algorithm for this number of measurements.