Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where are the confidence intervals and error bars? In order to be taken seriously when aggregating 1.8MM comments, you need those. The variability in pg's plots makes me think the data for any individual is going to be just as noisy.


We are looking at a time-based analysis of the monthly means of PG and the HN community. The null hypothesis is that the slope of a regression line should be 0. Since we are dealing with a time-based analysis of the mean, the variance of each bucket is irrelevant. I re-ran the analysis to make sure that my results were statistically significant:

Anxiety/Confidence PG: p <= 0.0007 HN: p <= 0.0001

Hostility/Compassion PG: p <= 0.0005 HN: p <= 0.0001

Depression/Happiness PG: p <= 0.0067 HN: p <= 0.0001

Note that for the HN comments, even though I used a 2nd order (parabola) fit for the graphs, the p values above are for a linear regression as that is the more appropriate fit for determining statistical significance here.


p values only make sense when the residuals are normal and independent. Is that true for this data set?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: