Problem 56: Efficient Measures of “Surprisingness” of Sequences

From Open Problems in Sublinear Algorithms
Revision as of 01:51, 12 December 2012 by Andoni (talk | contribs) (Created page with "{{Header |title=Efficient measures of ''surprisingness'' of sequences |source=dortmund12 |who=Rina Panigrahy }} Consider a sequence of iid random bits $S\in\{0,1\}^n$. '''Que...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Suggested by Rina Panigrahy
Source Dortmund 2012
Short link https://sublinear.info/56

Consider a sequence of iid random bits $S\in\{0,1\}^n$.

Question: Find efficient measures of how surprising/unbelievable $S$ appears to be. (Good heuristic for measuring how probable/improbably a string is.)

For example, if we see $0,0,0,\ldots $, we won't believe it is random (i.e., it is surprising.)

One existing measure is the ($k^{th}$-order) Shannon entropy $H_k$ ($H_0$ would correspond to taking the entropy of the empirical frequencies of 0s and 1s). However, it fails to say that a string like $(0,0,...0,1,1,...1)$ is surprising (from the point of view of densities it looks pretty random).

Ideal solution is to consider the Kolmogorov complexity, but it is hard (impossible) to compute.

A particular setting of the strings to consider may be: suppose each bit is generated from a biased independent coin, but the bias of the coin changes (slowly?) over time. Is there a good compression here?