# Sample Complexity for Streaming Multi-Armed Bandit and Coin Tossing

Suggested by Chen Wang WALDO 2021

Consider the following pure exploration problem in streaming coin tossing (and multi-armed bandits): given a stream of $n$ coins with unknown biases, find the most biased coin using 1). a minimal number of coin tosses (sample complexity) and 2). a small number of stored coins at any point of the stream (space complexity).

A handful of remarks about the streaming coin tossing model are in order. As the name suggests, the coins arrive one after another in a stream under the model. For each arriving coin, we can toss it multiple times before deciding to store or discard it. In addition, we are allowed to toss a stored coin and compare the empirical biases, and a stored coin can be discarded at any point. However, once a coin is discarded, it is lost forever and cannot be further tossed or even retrieved. The sample complexity is defined as the total number of coin tosses, and the space complexity is defined as the maximum number of coins an algorithm ever stores.

Denote $\boldsymbol{H}_{1}:=\frac{n}{\Delta^2_{[2]}}$. It is known that given the parameter $\Delta_{[2]}$ (the gap between the most and second-most biased coins), in a single pass over the stream, one can (with high constant probability) find the most biased coin with a sample complexity of $O(\boldsymbol{H}_{1})$ and a space complexity of a single coin (Assadi and Wang [STOC'20]). The sample complexity matches the worst-case lower bound for any algorithm even under the RAM setting (i.e., with a memory of $n$ coins) (Mannor and Tsitsiklis [COLT'04]). That is to say, for the $\boldsymbol{H}_{1}$ sample complexity, there is no sample-space tradeoff.

An interesting variant of the problem is the `instance-sensitive' sample complexity, defined as $$\boldsymbol{H}_{2}:= \sum_{i=2}^{n}\frac{1}{\Delta^2_{[i]}} \log\log(\frac{1}{\Delta_{[i]}}),$$ where $\Delta_{[i]}$ is the gap between the most biased and the $i$-th most biased coins. It is known that the $O(\boldsymbol{H}_{2})$ sample complexity can be achieved under the RAM setting even without the knowledge of $\Delta_{[2]}$. Note that this does not contradict the worst-case lower bound by (Mannor and Tsitsiklis [COLT'04]), as $O(\boldsymbol{H}_{2})$ becomes close to $O(\frac{n}{\Delta^2_{[2]}})$ when all the sub-optimal coins share the same bias (i.e., $\Delta_{[i]}=\Delta_{[2]}$, $\forall i\geq 2$).

For the O($\boldsymbol{H}_{2}$) sample complexity, the sample-space tradeoff begins to emerge under the streaming coin tossing model. The following hardness results are known for single-pass streaming algorithms (to find the most biased coin with high constant probability). Consider the following assumptions: a). the knowledge of $\Delta_{[2]}$; b). the random arrival of coins; and c). an estimation of $\boldsymbol{H}_{2}$. No algorithm with a space of $o(n)$ coins can achieve the O($\boldsymbol{H}_{2}$) sample complexity given (a). and b).) or ( a). and c). ).

On the other hand, if we are allowed to make multiple passes over the stream, the $O(\boldsymbol{H}_{2})$ sample complexity can be achieved. In particular, if we can make $O(\log(\frac{1}{\Delta_{[2]}}))$ passes, it is able to find the most biased coin with the $O(\boldsymbol{H}_{2})$ sample complexity and a space of a single coin (Jin et al. [ICML'21]) even without the knowledge of $\Delta_{[2]}$. Note that this is a very strong dichotomy between the single-pass scenario, as one can show that if $\Delta_{[2]}$ is not given in a single pass, the sample complexity goes unbounded for an algorithm with $o(n)$-coin space.

Based on the above context, the interesting open question is as follows: what is the tight number of passes (i.e. algorithms and lower bounds) for a streaming algorithm with $o(n)$-coin space to achieve the O($\boldsymbol{H}_{2}$) sample complexity?