# Estimating a Graph's Degree Distribution

Suggested by | C. Seshadhri |
---|---|

Source | wola19 |

The *degree distribution* of a graph $G=(V,E)$ is the histogram of the degree frequencies: i.e., letting $n(d)$ denote the number of degree-$d$ vertices, the histogram $(n(d))_{d\geq 0}$. Define the (complementary) cumulative distribution function as
\[
N(d) \stackrel{\rm def}{=} \sum_{d'\geq d} n(d'), \qquad d\geq 0\,.
\]
Assume one has access to the graph $G$ via the following (standard) three types of queries:

- sampling a u.a.r. vertex

- querying the degree of a given vertex

- sample a u.a.r. neighbor of a given vertex

and the goal is to obtain the following $(1\pm \varepsilon)$-"bicriteria" approximation $\hat{N}$ of the degree distribution: for all $d$, \[ (1-\varepsilon)N( (1-\varepsilon)d) \leq \hat{N}(d) \leq (1+\varepsilon) N((1+\varepsilon)d)\,. \] Previous work of Eden, Jain, Pinar, Ron, and Seshadhri [EdenJPRS-18] shows an upper bound of \[ \frac{n}{h} + \frac{m}{\min_d d\cdot N(d)} \] queries, where $h$ is the value s.t. $N(h)=h$ (where the complementary cdf intersects the diagonal).

**Question:** Can this upper bound be improved? Can one establish matching lower bounds?

And also, slightly less well-defined:

**Question:** Can one obtain better upper bounds when relaxing the goal to only learn the *high-degree* (tail) part of the distribution? What about testing properties of the degree distribution (e.g., "power-law-ness") in this setting? And what about the first type of queries — can one relax it, or work with a different type of sampling than uniform (for instance, via random walks)?