Open Problems:94 - Revision history

Krzysztof Onak at 03:12, 25 August 2019

2019-08-25T03:12:41Z

Krzysztof Onak: Krzysztof Onak moved page Waiting:Ads Impressions and Statistics to Open Problems:94 without leaving a redirect

2019-08-20T03:53:24Z

Krzysztof Onak moved page Waiting:Ads Impressions and Statistics to Open Problems:94 without leaving a redirect

Krzysztof Onak: Updating the header

2019-08-20T03:53:10Z

Updating the header

Ccanonne at 22:13, 7 August 2019

2019-08-07T22:13:25Z

Ccanonne: Created page with "{{Header |title=Ads, Impressions, and Statistics&action=submit |source=warwick18 }} 1. Consider the problem of reporting statistics about online ads to the advertisers that..."

2019-08-07T22:11:19Z

Created page with "{{Header |title=Ads, Impressions, and Statistics&action=submit |source=warwick18 }} 1. Consider the problem of reporting statistics about online ads to the advertisers that..."

New page

{{Header
|title=Ads, Impressions, and Statistics&action=submit
|source=warwick18
}}

1. Consider the problem of reporting statistics about online ads to the advertisers that pay for them. Advertisers may wish to know how many unique people matching some demographic conditions have seen an ad at a particular time range. There are many ads and many demographic conditions and time ranges; and the service provider should be able to accurately answer any possible query over ads, demographics, and time. In other words, it’s an OLAP cube problem where the aggregate is ''distinct count'' rather than ''sum''. How can one construct a sketch that will provide estimates with "good" errors that solves this with, say, $O(1)$ time complexity per query? Loosely speaking, "good" here means every query for an ad (without demographic conditions) has good relative error and any query that has a high count should have good relative error.

2. Suppose a company tracks an important high level metric (e.g. Total network traffic) that is the sum of a metric over many discrete categories (e.g., country, device, network provider, etc.). When that metric changes in an unexpected way, it wishes to figure out what combinations of categories are driving that change. How can one find a small set of combination of categories that explains most of the change and can be described succinctly? Can this be done in real-time so that a streaming sketch can keep track of the appropriate approximate aggregates and the search for a succinct description of category can be done in "reasonable" time?

@@ Line 2: / Line 2: @@
 |source=warwick18
 }}
-. Consider the problem of reporting statistics about online ads to the advertisers that pay for them. Advertisers may wish to know how many unique people matching some demographic conditions have seen an ad at a particular time range. There are many ads and many demographic conditions and time ranges; and the service provider should be able to accurately answer any possible query over ads, demographics, and time. In other words, it’s an [https://en.wikipedia.org/wiki/OLAP_cube OLAP cube] problem where the aggregate is ''distinct count'' rather than ''sum''.  How can one construct a sketch that will provide estimates with "good" errors that solves this with, say, $O(1)$ time complexity per query? Loosely speaking, "good" here means every query for an ad (without demographic conditions) has good relative error and any query that has a high count should have good relative error.
+. Consider the problem of reporting statistics about online ads to the advertisers that pay for them. Advertisers may wish to know how many unique people matching some demographic conditions have seen an ad at a particular time range. There are many ads and many demographic conditions and time ranges; and the service provider should be able to accurately answer any possible query over ads, demographics, and time. In other words, it’s an [https://en.wikipedia.org/wiki/OLAP_cube OLAP cube] problem where the aggregate is ''distinct count'' rather than ''sum''.  How can one construct a sketch that will provide estimates with &ldquo;good&rdquo; errors that solves this with, say, $O(1)$ time complexity per query? Loosely speaking, &ldquo;good&rdquo; here means every query for an ad (without demographic conditions) has good relative error and any query that has a high count should have good relative error.
-. Suppose a company tracks an important high level metric ''(e.g., total network traffic)'' that is the sum of a metric over many discrete categories ''(e.g., country, device, network provider, etc.)''. When that metric changes in an unexpected way, it wishes to figure out what combinations of categories are driving that change. How can one find a small set of  combination of categories that explains most of the change and can be described succinctly? Can this be done in real-time so that a streaming sketch can keep track of the appropriate approximate aggregates and the search for a succinct description of category can be done in "reasonable" time?
+. Suppose a company tracks an important high level metric (e.g., total network traffic) that is the sum of a metric over many discrete categories (e.g., country, device, network provider, etc.). When that metric changes in an unexpected way, it wishes to figure out what combinations of categories are driving that change. How can one find a small set of  combination of categories that explains most of the change and can be described succinctly? Can this be done in real-time so that a streaming sketch can keep track of the appropriate approximate aggregates and the search for a succinct description of category can be done in &ldquo;reasonable&rdquo; time?

@@ Line 1: / Line 1: @@
 {{Header
 |source=warwick18
 }}
 . Consider the problem of reporting statistics about online ads to the advertisers that pay for them. Advertisers may wish to know how many unique people matching some demographic conditions have seen an ad at a particular time range. There are many ads and many demographic conditions and time ranges; and the service provider should be able to accurately answer any possible query over ads, demographics, and time. In other words, it’s an [https://en.wikipedia.org/wiki/OLAP_cube OLAP cube] problem where the aggregate is ''distinct count'' rather than ''sum''.  How can one construct a sketch that will provide estimates with "good" errors that solves this with, say, $O(1)$ time complexity per query? Loosely speaking, "good" here means every query for an ad (without demographic conditions) has good relative error and any query that has a high count should have good relative error.
 . Suppose a company tracks an important high level metric ''(e.g., total network traffic)'' that is the sum of a metric over many discrete categories ''(e.g., country, device, network provider, etc.)''. When that metric changes in an unexpected way, it wishes to figure out what combinations of categories are driving that change. How can one find a small set of  combination of categories that explains most of the change and can be described succinctly? Can this be done in real-time so that a streaming sketch can keep track of the appropriate approximate aggregates and the search for a succinct description of category can be done in "reasonable" time?

← Older revision	Revision as of 03:53, 20 August 2019
(No difference)