Years ago I worked at a large advertising network that was concerned about fraudulent impressions. E.g. Ads placed "under the fold" or hidden or behind stuff or otherwise generating impressions that weren't real.
I suggested we could build a small piece of supplemental ad code that would load alongside one of our ads in a row page and "look around" — see where ads were placed and so on.
The idea was rejected because it would create too much data. I argued that we could trigger the fraud detection code once per n impressions with n being 100 or 1000 and still be able to identify fraudulent sites with statistical certainty (our problem would be false negatives) but they couldn't wrap their heads around merely sampling enough data to answer a question rather than ALL the data, so the idea was rejected.
Of course it's also highly likely that they didn't actually want to detect fraud.
Years ago I worked at a large advertising network that was concerned about fraudulent impressions. E.g. Ads placed "under the fold" or hidden or behind stuff or otherwise generating impressions that weren't real.
I suggested we could build a small piece of supplemental ad code that would load alongside one of our ads in a row page and "look around" — see where ads were placed and so on.
The idea was rejected because it would create too much data. I argued that we could trigger the fraud detection code once per n impressions with n being 100 or 1000 and still be able to identify fraudulent sites with statistical certainty (our problem would be false negatives) but they couldn't wrap their heads around merely sampling enough data to answer a question rather than ALL the data, so the idea was rejected.
Of course it's also highly likely that they didn't actually want to detect fraud.