Hacker Newsnew | past | comments | ask | show | jobs | submit | ckelly's commentslogin

This article is arguing for working hours that equate to about 65% of waking hours. So your range seems too high.


I think from context a more likely interpretation of the parent comment is that it is saying "60% to 80% utilisation [of working hours]".


Perhaps. But what defines working hours?


"Housing as a vehicle for building wealth and housing becoming unaffordable for the younger generation are two sides of the same coin." That's a great insight.


It has always surprised me that many technology professionals (and business professionals in general) don't have a strong intuition for the power of sampling. For example, in this case, the author states: "With 100 samples, our estimates are accurate to within about 5%. The magic of sampling is that we can derive accurate estimates about a very large population using a relatively small number of samples. In the last scenario (100 billion M&MS), we have 1% accuracy despite only sampling 0.00001% of the M&Ms."

I bet many would think n=100 would be worthless once the population reaches millions, or especially billions.

One HN-related piece of evidence for that is when I pointed out what margin of error would be for a n=164 survey sample, I got downvoted hard! https://news.ycombinator.com/item?id=8050801

But I saw this hundreds of times talking to customers when I ran a survey sampling product out of YC.


The impact of sample size completely defies human intuition. I give approximately zero weight to any opinion about a sample size that doesn’t come with the formula, except for some coworkers talking about specific topics where they’ve run the numbers so many times.

The problem is the sample size relationship with power is at a square, then it involves the effect size and the variance. Quadratic relationships are unintuitive, many order of magnitude differences are unintuitive, and variance is unintuitive. So it’s like a superformula for the human brain to not guess accurately.

I write experimentation software, and with samples in the millions, data scientists still want more power. Then you run some internal experiment with n=20 and it’s like “oh yeah, super significant”.


I think the issue is that it isn't really taught: in science class and in popular science it's often mentioned that you should be on the lookout for small sample sizes in studies as a measure of quality, but it doesn't really go into much more detail than that. And in my mathematics education I wasn't taught about this effect until university (despite doing a lot of stats exercises involving sampling). And sample size is pretty much always an easy thing to pull out of a paper without much familiarity with the subject of the paper, while sample bias (a much bigger issue) can be much harder to address (though at least some people seem to be getting the idea that if you are sampling from university students like many papers do it's not exactly representitive of the general population).


Statistics has got to have the highest density of paradoxes and unintuitive results. Math and quantum mechanics seem like they have plenty of paradoxes, but then when I studied statistics, it felt like I was blindsided by new paradoxes every month for a year.


I was thinking about this last week[1] and I think that both the "math" people and "common sense" people are correct in a sense, and talking past each other. The math people are of course mathematically correct within the limits of the constructed model, given all the assumptions of perfectly random sampling, no systemic error, etc. Meanwhile common sense people are correct in a practical sense: small samples are vulnerable to sampling error, p-hacking, outright fraud, etc. Even before you're aware of the exact mechanisms by which things can go wrong you intuitively know that small/cheap studies are more vulnerable to some kind of honest human error or dishonest... shenanigans.

---

[1] I was listening to a podcast where trolley problems were brought up and the speaker was lamenting how clearly "unethical" and "irrational" your evolved intuition is given that most people will let the train hit 10 men working on the tracks than to divert it and kill 1 innocent. Trolley problems are intellectually interesting for various reasons but jumping to that conclusion is clearly absurd. Your intuitions are shaped by millions of years of genetic and social evolution to precisely be most rational for actual real-life problems. If you were actually standing at that switch you'd be thinking...

* do I actually trust my eyes in this situation? Are the workers on a parallel track and there's no actual problem here?

* if I pull the switch, will it derail the train and kill N+1 people instead of the 10.

* will the workers just notice the train in time and scurry off the track? Or will the train just stop? How good are brakes on a train anyway?

* how much time do judges and juries spend solving trolley problems?

... and while you were paralyzed thinking about these and a million other things, whatever was about to happen would happen and there would be no trolley problem.


Yeah, I confess I'm having a hard time understanding this. How much of the underlying population do you have to accurately know for such a small sample to be worth so much?

Edit: I see that the article describes some of the limitations. I'm curious on how to work with unknown populations. That said, it does have me revisiting some ideas. Looking forward to it.


For people who can write code, the simplest exercise to convince yourself of foundational statistics is simulations.

Create a simulated population with some distribution of a metric & run multiple sampling simulations. You'll be surprised. You can even put in sampling biases as test the impact.

Monte Carlo simulations are a surprisingly powerful tool. I once discovered that FAANG data scientists were mis-understanding statistical significance in a reporting product they made by half an order of magnitude because they didn't understand the impact of observationalmethodology and sampling bias in their product. In my company, we set our own thresholds much larger than what the product recommended.


Right, but this just reinforces my thought here. In order to simulate sampling, I have to know the data well enough to simulate it. Which, for many things I'd care about, if I knew the underlying distribution that well, I probably don't need to sample. :(


i meant doing it as a theoritical planning exercise. you can throw in any number of weird distributions you might guess and you'll be surprised at how quickly sampling will fairly reliably pick up patterns and this helps you plan your sampling around uncertainty.

Of course if your underlying distribution is likely to be Gaussian which is true for many phenomena, you don't need to bother except as a pedagogical exercise.


If you know a bit of programming, that's actually sufficient to explore these ideas and verify them for yourself.

Allen Downey has a ton of open source books that use this philosophy [0] and Peter Norvig has used Python notebooks in a similar manner (look at the ones in the Probability section) [1].

[0] https://greenteapress.com/wp/ [1] https://github.com/norvig/pytudes#pytudes-index-of-jupyter-i...


> Which, for many things I'd care about, if I knew the underlying distribution that well, I probably don't need to sample

You don't have to sample directly. The entire field of Bayesian variational learning exist to deal with that very problem. Look up Markov chain Monte Carlo, Metropolis algorithm, conjugate priors, reparametrization tricks.


Thanks for the pointers, will be looking into these!


> How much of the underlying population do you have to accurately know for such a small sample to be worth so much?

That's the magic of random sampling, you don't need to accurately know anything about the underlying population. If you do know things about the underlying population then you can do clever things like stratified sampling to get even more accurate measurements, but that's not necessary. The magic is that a randomly selected group of 100/1000/10000 is unlikely to be too different from the population as a whole, no matter what that population looks like.

You do have to be able to sample randomly--truly randomly[1]--from the population, though, and that's often an issue. Picking 100 people randomly from the population of "likely voters in the next US presidential election" is a very nontrivial thing. To start with, that population is not even very well defined; who is likely to vote changes over time and is difficult to pin down. Pollsters do various things to try to account for this, but if they fail to predict say a surge in young voters their numbers will end up being off.

Even if the population is clearly defined, it's not easy to survey a truly random sample from it. Some people are hard to reach. Some people don't want to talk to you, and whether or not they're willing to talk to you might be correlated with the thing you're interested in (like who they plan to vote for). You can do things to try to correct for that, but again if you get that wrong (and it's very hard to get right) your estimates will be off.

And of course, if you're interested in things that are rare, like third party voters, you need a much larger sample to get an accurate read. If you sample 100 likely voters there's a pretty good chance you won't get a single person who plans to vote for the Libertarian Party candidate.

[1] For the most basic form of random sampling, simple random sampling, you need not just every individual in the population to have the same probability of getting sampled, but every possible sample (i.e. every possible set of 100) needs to have the same probability of being sampled.


Hmmmm… I have code where I’m randomly sampling an exponential function and even thousands of samples are insufficient to pass chi-squared tests at 95% accuracy that the observed distribution matches my expected ground truth exponential function. The reason? Chi-squared needs 5 samples at the tail which has an effective probability of 0. And if I try to flip it and say “run the experiment with 500 samples 100 times but verify the observed matches the expected with a 5% error”, I’ll still see more than 5 runs that fail this.

Is there something special about exponential functions or is it just my misunderstanding of statistics/calculus at play here for doing this correctly? I assume it’s the latter but I haven’t figured out what I’m doing wrong.


I'm not sure what exactly you're doing--binning the observations into ranges to run the chi square test?

In any case, it sounds like maybe it falls under the "if you're interested in things that are rare" paragraph in my post above. You can always design statistics that are arbitrarily hard to estimate. The things that we're typically interested in estimating in real life, though--averages, proportions, and similar--are typically estimable with reasonable sample sizes.


> thousands of samples are insufficient to pass chi-squared tests at 95% accuracy that the observed distribution matches my expected ground truth exponential function

It doesn't sound like your test statistic is chi-squared distributed, in which case it's not surprising that your samples fail the test, and sampling more just makes the failure more obvious.

> Is there something special about exponential functions

It's not that exponential functions are special; almost any other function would likely also fail the test. Rather, they're insufficiently special. The chi-squared distribution with k degrees of freedom arises from the sum of k independent standard normal-distributed random variables. Some computations (e.g. sample variance of k draws from a normal distribution) can be expressed using such a sum, but others (e.g. sample variance of k draws from an exponential distribution) cannot.

You'll need to switch to a different test statistic and use that test statistic's distribution (which is unlikely to be chi-squared) to compute your confidence intervals.


Which test statistic should I use? I’ve been trying to figure this out but have been unsuccessful in finding it.


If you can post a detailed explanation of what exactly you're trying to do , and/or your code, I'm happy to try to help you sort it out.


I have a random number function that has an exponentially decreasing probability of generating a given integer within [0, R). So for example, if the range of values is [0, 100), 99 has a 50% probability of being generated, 98 has a 25% chance, and so on.

I’m trying to confirm that if I run this function N times (let’s say 1000), that the frequency of the numbers generated match the expected distribution.


Ok, so the big issue is that statistical tests like the chi-squared test are not designed to show that a sample matches a certain distribution. Statistical tests are designed to show the opposite--"this sample does not match that distribution".

If the sample matches the distribution, by design the p-value is going to be uniformly distributed--i.e. a p-value of 0.01 is equally likely as a p-value of 0.99.


It's the fact that you need rare samples. The power of sample size is that you can see finer details relative to the fully zoomed out view. If you are interested in an effect which is rare or want to find a small difference between two effects, then you will potentially need a much larger sample size. (For the extremes of this, see the truely gigantic number of samples (trillions+) that are taken in high-energy physics experiments like the LHC: they are looking for very small differences in very rare events. This is also related as to why standards for statistical tests are much higher in this field)


I don’t actually care about the tails. I’m fine cutting off the comparison and treating sufficiently rare events as having an expected value of 0. And indeed, the bins that show up with “errors” (ie deviating > 5%) are the ones where events are reasonably expected. The tails are indeed always within 5% of expected.


Yeah, I realize I was more than a little off in terms. Being able to randomly sample, though, feels like it also needs a lot of knowledge about what you are sampling from. Such that I meant for that to be included in my question. :D


The most important thing for a sample is that it’s representative. The sample must have the same characteristics as the population. If it doesn’t, it just destroys the usefulness of the analysis.

If you know absolutely nothing about your population, the only thing to look at is the mechanism of sampling. Is there some step in the process that would bias selection?

In the real world, you never know nothing about a population (you heard me, Frequentists) and you can check that the known attributes of the population match the sample. If they don’t, that could hint at something wrong.

You don’t even need to know the attributes ahead of time. Let’s say you want to spot check 100 API calls. You could find the ratio of user agents for the whole population and make sure your sample is close (detecting Sample Ratio Mismatch). Same for distribution of response times and so on. Just be aware that the more you look at the more likely you’ll find something weird! You need to correct for that if doing math or keep it in mind if eyeballing it.


I forgot to mention an easy example: look out for anything that reminds you of calling 100 landline home phone numbers and concluding the average American is a retired 70-year-old homeowner.


The example I was falling back on is a fun exercise I saw on reservoir sampling from the dictionary on your computer. This seems a good methodology to pick how big of a reservoir to make.

That said, I'm curious how it does against different question to the words. For example, is the MOE really the same for such questions as "How many words are more than 5 characters?" and "How many words start with the letter M?" Feels like this should /not/ be the case to me, but I will have fun doing some of the simulations.


Let me know!


Maybe as a way to rewire your intuition a bit, imagine that the population is infinite. A smooth curve. Yet if you sample 100 random points on the curve, you'll know quite a lot about it.


Only if I know the max frequency of change, though. The Nyquist rate kicks in there, after all. Right? Is why we use way way way more than 100 samples for a second of sound.


> I bet many would think n=100 would be worthless once the population reaches millions, or especially billions.

That depends on a uniform distribution of the population and an unbiased sampling method. One of the polls in the 2016 US presidential election [1] would shift Trump's position by a full percentage point based on input from a single man depending on which week he participated in the panel.

[1] https://www.nytimes.com/2016/10/13/upshot/how-one-19-year-ol...


Yes, but if your sampling is biased then the solution is not simply a larger sample size. The only way in which larger sample sizes help is giving more power to any methods you use to try to control for sampling bias.


> a full percentage point

If your goal is to be within 10%, that's not a problem.


Yes, I’ve heard Michael Bloomberg say the secret weapon was adding chat, which built the network effect.


How often do people reach out to random people? Don't people get tons of spam?


I can't say there is no spam, but in 10 years of use I don't think I've personally seen any. "Random" people sometimes reach out, but it's usually for ok reasons. Spamming people is a very quick way to ruin your firm's reputation. And depending on what you do, there may be regulations against that type of stuff.


> I wouldn't put too much weight into any sort of imputed probability from the price.

It's absolutely fair to impute a rough probability of deal closure from the stock price. The whole "merger arbitrage" industry works around that premise.

Sometimes the market doesn't think a deal has a 100% chance of closing (like MSFT and LinkedIn) and it still closes. There were valid antitrust concerns circling that deal, e.g. https://thehill.com/policy/technology/298573-salesforce-rais...


That's not the same as saying the market wasn't taking the offer seriously.

> The whole "merger arbitrage" industry works around that premise.

If the market price reflected the probability, then an arbitrage strategy should not be profitable.

Usually, these folks have better experience/skills/knowledge about M&A, antitrust, etc than the market average. In other words, the market doesn't reflect the probability of an event happening.


Yes, I wasn't commenting on the original "taking it seriously" language.

> If the market price reflected the probability, then an arbitrage strategy should not be profitable > The market doesn't reflect the probability of an event happening.

No, the market's implied probability could be right, on average, across all deals...and the top merger arb funds could absolutely still be profitable by selecting deals when they think the market is mispricing the probability (for the reasons you mention: better experience, knowledge, etc.)

It's like the sports betting market: you can roughly impute a team's win probability from the (opening) betting line...and even if that's right on average, the top gamblers are still profitable.

And, of course, sometimes things with a say, 40% chance of happening do happen...so that doesn't mean the market was "wrong" about the chance (i.e. your LinkedIn mispricing exmaple).

But sounds like we're in full agreement you can't look at the implied probability from the market price and draw some conclusion about it definitely happening, or definitely not happening (e.g. the market not taking it seriously).


Yeah, I think we're in agreement.

Another point however, about the market voting that the Musk takeover won't happen - we can only speculate as to why they predict it won't happen.

It doesn't necessarily mean they think he can't line up the financing. It could just mean they don't think the board will accept his offer.


The answer is less attribution measurement, more incrementality measurement. Incrementality solves the famous "handing out coupons outside the pizzeria door" problem: https://www.adweek.com/programmatic/lower-ad-fraud-will-be-a...


A root cause of this is an overreliance on Multi-touch Attribution (MTA) models, instead of true incrementality experiments: https://www.adweek.com/programmatic/lower-ad-fraud-will-be-a...


Goolge had a similar product called Google Website Optimizer that it shut down years ago:

https://en.wikipedia.org/wiki/Google_Website_Optimizer

https://support.google.com/analytics/answer/2661700?hl=en

Some screenshots still floating around:

http://imgur.com/a/eReWQ

http://imgur.com/a/7XydI


"Shut down" is a little deceptive:

"Google announced that GWO as a separate product would be retired as of 1 August 2012, and its functionality would be integrated into Google Analytics as Google Analytics Content Experiments."

They just merged it into another product, where it still exists today: https://support.google.com/analytics/answer/1745152?hl=en


I wonder how is this Google Optimize different from what's available in GA today.


tl;dr Yes


Seems to be a rare instance that violates Betteridge's Law.


It's not true, sugar is not worse for you than white bread. White bread has a higher glycemic index than sugar.

Sugar does have a lot of fructose in it, but fruits have higher fructose content because they have less glucose.

Reality, turns out, is more complicated than black and white.


Probably depends on your white bread. My understanding is that any glycemic delta in white bread is primarily a function of the fiber. White bread without fiber is, to a first approximation, sugar.


White bread is just syntactic sugar for simpler sugars, and destructures into them in your gut.


So far as i know, the same is true of sucrose. Only simple sugars pass into the bloodstream.


A lot of white bread has actual sugar or high-fructose corn syrup in it as an ingredient. How much of that is to start the yeast and how much is for taste varies by brand.

Whole-grain/white can be misleading as far a sugar goes. I've had whole-grain brands that were so sweet that I couldn't stand to use them for sandwiches.


> White bread has a higher glycemic index than sugar.

Doesn't seem to be so: http://nutritiondata.self.com/topics/glycemic-index


Your link has bread at 70 and sugar at 68. QED


I believe I read a "study" recently where a news headline analysis found that roughly a third of question-headlines were "yes", the other two thirds? "No" and "Maybe"

Turns out Betteridge's law is just something snarky Internet people say.


For further reference (as it seems I've been downvoted possibly for my own snark on this one) - since I can no longer edit the original comment, here are some references

This appears to be the one I recall: http://calmerthanyouare.org/2015/03/19/betteridges-law.html

My numbers were off it's actually 46% "non-polar", 20% "yes", 16% "maybe" and, 15% no

https://news.ycombinator.com/item?id=9232419

Additionally there was a publication in a statistical journal which also discussed the "phenomenon" http://link.springer.com/article/10.1007/s11192-016-2030-2 which also found that it wasn't a factual assertion (in academic papers).

Anyways - if the downvote was for something other than my own snark I'd love to hear what it was.

Apologies for coming off as a jerk.


This is fantastic news for founders. Survata (S12) was pumped to have Initialized in our Series A last year. Garry worked the closest with us of all YC partners, and Alexis had already been a customer! They make valuable customer intros, always offer time to help, and have such a pro-founder view of the world.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: