Thanks for answering! I'm sure it's just my lack of elementary statistics, but I still don't understand re #2. I get that the weights aren't important, but the curve shape seems to invalidate the notion of a p-value, because each font had more "strongly X"s than "weakly X"s. When your sample results are clustered at the extremes, what do you do to apply these statistical techniques?
Here is something I tried: arbitrarily choose a font as the "control," in my case Georgia. Then to measure each other font, say Baskerville, randomly pair a Georgia data point with a Baskerville data point, and measure how much Baskerville improved agreement. (Comparing different shuffles, each font's mean improvement is pretty stable.) That gives what looks like a normal curve, at or least it is big in the middle and small on the edges. So now I can find a p-value, and my null hypothesis is that changing the font has no effect. I ran a t-test with R, and I got a p-value of 0.2069. So much less impressive than the article claims. But I assume my approach is wildly invalid, so what is the right approach?
I thought re-analyzing Morris's data would be a fun "homework" assignment to give myself as I try to learn statistics. It looks like the simplest approach is nothing like what I proposed above, but a "two-sample t-test." I performed that analysis and wrote it up here, if anyone is interested:
Here is something I tried: arbitrarily choose a font as the "control," in my case Georgia. Then to measure each other font, say Baskerville, randomly pair a Georgia data point with a Baskerville data point, and measure how much Baskerville improved agreement. (Comparing different shuffles, each font's mean improvement is pretty stable.) That gives what looks like a normal curve, at or least it is big in the middle and small on the edges. So now I can find a p-value, and my null hypothesis is that changing the font has no effect. I ran a t-test with R, and I got a p-value of 0.2069. So much less impressive than the article claims. But I assume my approach is wildly invalid, so what is the right approach?