Difference between revisions of "Open Problems:98"
(Created page with "{{Header |title=Estimating a Graph's Degree Distribution |source=wola19 |who=C. Seshadhri }} The ''degree distribution'' of a graph $G=(V,E)$ is the histogram of the degree fr...") |
m (Krzysztof Onak moved page Waiting:Estimating a Graph's Degree Distribution to Open Problems:98 without leaving a redirect) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{Header | {{Header | ||
− | |||
|source=wola19 | |source=wola19 | ||
|who=C. Seshadhri | |who=C. Seshadhri | ||
Line 6: | Line 5: | ||
The ''degree distribution'' of a graph $G=(V,E)$ is the histogram of the degree frequencies: i.e., letting $n(d)$ denote the number of degree-$d$ vertices, the histogram $(n(d))_{d\geq 0}$. Define the (complementary) cumulative distribution function as | The ''degree distribution'' of a graph $G=(V,E)$ is the histogram of the degree frequencies: i.e., letting $n(d)$ denote the number of degree-$d$ vertices, the histogram $(n(d))_{d\geq 0}$. Define the (complementary) cumulative distribution function as | ||
\[ | \[ | ||
− | N(d) \ | + | N(d) \stackrel{\rm def}{=} \sum_{d'\geq d} n(d'), \qquad d\geq 0\,. |
\] | \] | ||
Assume one has access to the graph $G$ via the following (standard) three types of queries: | Assume one has access to the graph $G$ via the following (standard) three types of queries: | ||
− | + | ||
− | + | * sampling a vertex uniformly at random | |
− | + | ||
− | and the goal is to obtain the following $(1\pm \varepsilon)$- | + | * querying the degree of a given vertex |
+ | |||
+ | * sample a neighbor of a given vertex uniformly at random | ||
+ | |||
+ | and the goal is to obtain the following $(1\pm \varepsilon)$-“bicriteria” approximation $\hat{N}$ of the degree distribution: for all $d$, | ||
\[ | \[ | ||
(1-\varepsilon)N( (1-\varepsilon)d) \leq \hat{N}(d) \leq (1+\varepsilon) N((1+\varepsilon)d)\,. | (1-\varepsilon)N( (1-\varepsilon)d) \leq \hat{N}(d) \leq (1+\varepsilon) N((1+\varepsilon)d)\,. | ||
Line 26: | Line 29: | ||
And also, slightly less well-defined: | And also, slightly less well-defined: | ||
− | '''Question:''' Can one obtain better upper bounds when relaxing the goal to only learn the ''high-degree'' (tail) part of the distribution? What about testing properties of the degree distribution (e.g., | + | '''Question:''' Can one obtain better upper bounds when relaxing the goal to only learn the ''high-degree'' (tail) part of the distribution? What about testing properties of the degree distribution (e.g., “power-law-ness”) in this setting? And what about the first type of queries — can one relax it, or work with a different type of sampling than uniform (for instance, via random walks)? |
Latest revision as of 18:25, 26 August 2019
Suggested by | C. Seshadhri |
---|---|
Source | WOLA 2019 |
Short link | https://sublinear.info/98 |
The degree distribution of a graph $G=(V,E)$ is the histogram of the degree frequencies: i.e., letting $n(d)$ denote the number of degree-$d$ vertices, the histogram $(n(d))_{d\geq 0}$. Define the (complementary) cumulative distribution function as \[ N(d) \stackrel{\rm def}{=} \sum_{d'\geq d} n(d'), \qquad d\geq 0\,. \] Assume one has access to the graph $G$ via the following (standard) three types of queries:
- sampling a vertex uniformly at random
- querying the degree of a given vertex
- sample a neighbor of a given vertex uniformly at random
and the goal is to obtain the following $(1\pm \varepsilon)$-“bicriteria” approximation $\hat{N}$ of the degree distribution: for all $d$, \[ (1-\varepsilon)N( (1-\varepsilon)d) \leq \hat{N}(d) \leq (1+\varepsilon) N((1+\varepsilon)d)\,. \] Previous work of Eden, Jain, Pinar, Ron, and Seshadhri [EdenJPRS-18] shows an upper bound of \[ \frac{n}{h} + \frac{m}{\min_d d\cdot N(d)} \] queries, where $h$ is the value s.t. $N(h)=h$ (where the complementary cdf intersects the diagonal).
Question: Can this upper bound be improved? Can one establish matching lower bounds?
And also, slightly less well-defined:
Question: Can one obtain better upper bounds when relaxing the goal to only learn the high-degree (tail) part of the distribution? What about testing properties of the degree distribution (e.g., “power-law-ness”) in this setting? And what about the first type of queries — can one relax it, or work with a different type of sampling than uniform (for instance, via random walks)?