Alternative take: there isn't that much low hanging fruit there.
Hear me out.
"To the person who only has a hammer, everything looks like a nail."
The data in front of your is the data you want to analyze, but it doesn't follow that that is the data you ought to analyze. I predict that most of the data you look at will result in nothing. The null hypothesis will not be rejected in the vast majority of cases.
I think we -- machine learning learners -- have a fantasy that the signal is lurking and if we just employ that one very clever technique it will emerge. Sure random forests failed, and neural nets failed and the SVR failed but if I reduce the step size, plug the output of the SVR into the net and change the kernel...
Let me put an example: suppose you want to analyze the movement of the stock market using the movement of the stars. Adding more information on the stars, and more techniques may feel like you're making progress but it isn't.
Conversely, even a simple piece of simple information that requires minimal analysis (this companies sales are way up and no one else but you know it) would be very useful in making that prediction.
The first data set is rich, but simply doesn't have the required signal. The second is simple, but has the required signal. The data that is widely available is unlikely to have unextracted signal left in it.
I've been selling good data in a particular industry for three years. In this industry at least, the so-called "low-hanging fruit" only seems low-hanging until you realize that the people who could benefit most from the data are the ones who are mentally lazy and least likely to adopt it. Data has the same problems as any other product and may even be harder because you need to 1) acquire the data and 2) build tools that solve reliably difficult problems using huge amounts of noisy information...
Isn't there utility in accepting the null hypothesis? It's almost as valuable to know that there is no signal in the data as there is in the opposite, i.e., knowing where not to look for information.
I think your example is really justifying a "machine learner" that has some domain expertise and doesn't blindly apply algorithms to some array of numbers.
I think his argument is that some null hypotheses can be rejected out of hand, but that people are wasting time and effort obtaining evidence that, if they had better priors, would be multiplied by 0.0000000000001 to end up with an insignificant posterior. That's what the astrology example indicates.
The effort to evaluate the null hypothesis can be costly. In the competitive environment found in most hedge funds, how would you allocate to accepting the null hypothesis?
As in, if you worked at a data acquisition desk, and spent a quarter churning through terabytes of null hypothesis data, what's your attribution to the fund's performance?
Accepting the null hypothesis has utility only if you have some reason to believe it would not be accepted.
Accepting it per se has no particular value. You could generate several random datasets, and accept/reject the null hypothesis between them ad infinitum.
To put it another way, its only interesting if its surprising.
Bingo. You nailed it. I work in finance. Developed markets have efficient stock markets. They are highly liquid. The reality is that there's lot of people competing for the same profits. In reality when there's that many players, if there's a profit to be had from a dataset you will be buy from a vendor, chances are one of your many competitors already bought it and found it. This is why we now say don't try to beat the market, you likely can't and mostly just need to get lucky having the right holding when an unforeseen event occurs. Too many variables at play that we just don't understand. Most firms are buying these datasets to stay relevant but they really make no difference in their actual investing strategies.
This is where you might use a genetic algorithm or to learn which data to use on a particular prediction. Good AI won’t use all data just trim down to signal.
I read a neat criticism of ai techniques. The author pointed out humans can pick out a strong signal as well or better than ai. Humans could pick out signal from an array of weak sources. Ai would identify that case with fewer weak signals required, but it was hard to trust because it was sometimes wrong.
I wish I could remember the source. I’m sure it was an article here a few years ago. I want to say it was medical diagnosis based on charts.
Anyway, the point was there is a very narrow valley where ai is useful beyond an expert. And that valley is expensive to explore. And, there might not be anything there.
Hear me out.
"To the person who only has a hammer, everything looks like a nail."
The data in front of your is the data you want to analyze, but it doesn't follow that that is the data you ought to analyze. I predict that most of the data you look at will result in nothing. The null hypothesis will not be rejected in the vast majority of cases.
I think we -- machine learning learners -- have a fantasy that the signal is lurking and if we just employ that one very clever technique it will emerge. Sure random forests failed, and neural nets failed and the SVR failed but if I reduce the step size, plug the output of the SVR into the net and change the kernel...
Let me put an example: suppose you want to analyze the movement of the stock market using the movement of the stars. Adding more information on the stars, and more techniques may feel like you're making progress but it isn't.
Conversely, even a simple piece of simple information that requires minimal analysis (this companies sales are way up and no one else but you know it) would be very useful in making that prediction.
The first data set is rich, but simply doesn't have the required signal. The second is simple, but has the required signal. The data that is widely available is unlikely to have unextracted signal left in it.