# Problem 12: Deterministic $CUR$-Type Decompositions

Suggested by | Michael Mahoney |
---|---|

Source | Kanpur 2006 |

Short link | https://sublinear.info/12 |

A $CUR$-decomposition of $A$ expresses $A$ as a product of three matrices, $C$, $U$, and $R$, where $C$ consists of a small number of actual columns of $A$, $R$ consists of a small number of actual rows of $A$, and $U$ is a small, carefully constructed matrix that guarantees that the product $CUR$ is “close” to $A$. Recent work [DrineasMM-06,DrineasMM-06a,DrineasMM-06b] proved the existence and provided efficient randomized algorithms for $CUR$ decompositions that are nearly as good as the best rank-$k$ approximation to $A$ that is obtained by truncating the SVD. Hence, the columns of $A$ that are included in $C$, as well as the rows of $A$ that are included in $R$, can be used in place of the eigencolumns and eigenrows, with the added benefit of improved interpretability in terms of the original data. Note the structural simplicity of a CUR matrix decomposition: $$ \underbrace{\left( \begin{array}{ccccc} &&&& \\ &&&& \\ &&A&& \\ &&&& \\ &&&& \end{array} \right)}_{m \times n} \approx \underbrace{\left( \begin{array}{ccc} && \\ &&\\ &C& \\ && \\ && \end{array} \right)}_{m \times c} \underbrace{ \left( \begin{array}{ccc} && \\ &U& \\ && \end{array} \right) }_{c \times r} \underbrace{\left( \begin{array}{ccccc} &&&& \\ &&R&& \\ &&&& \end{array} \right)}_{r \times n} . $$

We briefly expand on the latter point. In many cases, an important
step in data analysis is to construct a compressed representation
of $A$ that may be easier to analyze and interpret. The most
common such representation is obtained by truncating the SVD at
some number $k \ll \min \{m,n\}$ terms, in large part because this
provides the “best” rank-$k$ approximation to $A$ when measured
with respect to any unitarily invariant matrix norm.
Unfortunately, the basis vectors (the so-called eigencolumns and
eigenrows) provided by this approximation (and with respect to
which every column and row of the original data matrix is
expressed) are notoriously difficult to interpret in terms of the
underlying data and processes generating that data. Gould, in
the “Mismeasure of Man” [Gould-96], provides examples where
such reification of the singular vectors (or principal components
or “factors”) resulted in social policy with potentially
devastating consequences for large groups. For example, the vector $[(1/2)\,\mbox{age} - (1/\sqrt{2})\,\mbox{height} + (1/2)\,\mbox{income}]$, being one of the significant uncorrelated “factors”
from a dataset of people's features, is not particularly
informative. From an analyst's point of view, it would be highly
preferable to have a low-rank approximation that is nearly as good
as that provided by the SVD but that is expressed in terms of a
small number of *actual columns* and/or *actual rows* of a
matrix, rather than linear combinations of those columns and rows.
Our $CUR$ matrix decomposition is a direct formulation of this
problem. For example, the $CUR$ matrix decomposition was recently
applied to hyperspectrally-resolved medical imaging data
[MahoneyMD-06]. In this application, a column corresponds to an
image at a single physical frequency and a row corresponds to a
single spectrally-resolved pixel, and it was shown that data
reconstruction and classification tasks can be performed with
little loss in quality even after substantial data compression.

The main existing result for $CUR$ matrix decompositions is the following.

**Theorem**(Drineas et al. [DrineasMM-06b])**.***Given a matrix $A \in \mathbb{R}^{m \times n}$ and an integer $k \ll \min \{m,n\}$, there exist randomized algorithms such that if $c = O(\epsilon^{-2} k \log k \log (1/\delta))$ columns of $A$ (in expectation) are chosen to construct $C$, and then $r = O(\epsilon^{-2}c \log c \log (1/\delta))$ rows of $A$ (in expectation) are chosen to construct $R$, then with probability at least $1-\delta$,*- \begin{equation*}\newcommand{\FNorm}[1]{\|#1\|_{\rm F}}\FNorm{A - CUR} \leq (1+\epsilon) \FNorm{A-A_k}.\end{equation*}
*Here, the matrix $U$ is a weighted Moore-Penrose inverse of the intersection between $C$ and $R$, and $A_k$ is the best rank-$k$ approximation to $A$. The randomized algorithm runs in time $O(\operatorname{SVD}(A_k))$, which is the time required to compute the best rank-$k$ approximation to the SVD [GolubV-89].*

Many important questions remain open within the context of
$CUR$-type decompositions. The most important one is to devise
*deterministic* algorithms. Whereas, from a theoretical
viewpoint, the randomized algorithms are satisfactory,
deterministic algorithms would be much preferable. Results of Gu and Eisenstat [GuE-96] and Stewart [Stewart-99,Stewart-04] may help towards this goal.
Also relevant is work by Goreinov,
Tyrtyshnikov, and Zamarashkin [GoreinovTZ-97,GoreinovT-01] that was motivated
by applications such as scattering, in which large coefficient
matrices have blocks that can be easily approximated by low-rank
matrices. They showed that if the matrix $A$ is approximated by a
rank-$k$ matrix to within an accuracy $\epsilon$ then *there*
exists* a choice of $k$ columns and $k$ rows, i.e., $C$ and $R$,*
and a low-dimensional $k \times k$ matrix $U$ constructed from the
elements of $C$ and $R$, such that $A \approx CUR$ in the sense
that $\newcommand{\TNorm}[1]{\|#1\|_2}\TNorm{A-CUR} \le \epsilon f(m,n,k)$, where $ f(m,n,k) = 1 + 2\sqrt{km} + 2\sqrt{kn} $. In [GoreinovTZ-97], the choice for these
matrices is related to the problem of determining the minimum
singular value $\sigma_k$ of $k \times k$ submatrices of $n \times
k$ orthogonal matrices. In addition, in [GoreinovT-01] the choice for
$C$ and $R$ is interpreted in terms of the maximum volume concept
from interpolation theory, in the sense that columns and rows
should be chosen such that their intersection $W$ defines a
parallelepiped of maximum volume among all $k \times k$
submatrices of $A$.

A second research topic is to improve the error bounds of previous results, and improve the dependency of the number of sampled columns and rows on $k$ and $\epsilon$. Again, the aforementioned results from the numerical linear-algebra community will serve as starting points.