Research Directions for Machine Learning and Algorithms

bravura · on May 17, 2011

As always, John Langford has one foot firmly planted in practice, and knows deeply what practitioners will care about in the next five years. Incidentally, he is the author of Vowpal Wabbit, one of the fastest out-of-core learning implementations. He also has a profound theoretical knowledge.

This article is a more open-ended, forward-looking version of his blog post "The Ideal Large Scale Learning Class" (http://hunch.net/?p=1729). That blog post is required reading for anyone who wants to do large-scale learning, and understand the current state-of-the-art.

"Almost all the big impact algorithms operate in pseudo-linear or better time."

This is one of the reasons for the resurgence in neural networks. It's useful to learn a non-linear model over 1 billion examples, which is something you can't do with an SVM.

"How do we efficiently learn in settings where exploration is required?"

I've been exploring this problem recently. For example, I am designing an interface where a user can interact with search results to improve the quality. Besides ML issues, this also touches on UX questions: Is it better to force the user to give feedback on five crucial search results? Or should you show 50 results and have the user cherry pick results that are particularly good/bad?

"How can we learn to index efficiently?"

Another important question. One particularly interesting approach is using a dense hash code to do semantic search. For example, see the approach Semantic Hashing, which I describe in this forward-looking O'Reilly Strata talk about ML: http://strataconf.com/strata2011/public/schedule/detail/1693... (YouTube: http://www.youtube.com/watch?v=fEUw8igr1IY) That talk overlaps a bit with Langford's post, except my Strata talk was about new developments for people in industry, not upcoming topics of interest for academics.

iandanforth · on May 18, 2011

"Is it better to force the user to give feedback on five crucial search results? Or should you show 50 results and have the user cherry pick results that are particularly good/bad?"

What metafore are you trying to support? Is the system one that learns (think child) or one that adapts (adult).

If you present a system as one that needs to be taught you will invoke a whole different set of expectations than one where it is assumed to know a great deal but can adapt.

Personally I always recommend forcing a user to teach the system (see Netflix new user flow) because it breaks the assumption that the system will 'just know' which is often unreasonable.

hamner · on May 17, 2011

The argument that important ML algorithms should be highly scalable ( O(logN), O(N), O(NlogN) ) holds in fields that are rich in "big data," with millions to trillions of data points.

However, there are also many fields where acquiring a large ( > 100s-1000s of samples) is infeasible. This is especially relevant in medicine and biology. Many applications are constrained by small sample sizes and may have a feature count that is orders of magnitude larger than the sample count. Examples include fMRI studies and gene expression studies. Don't discount research in methodologies (such as SVMs and many graphical models) that have superlinear performance as impractical for real-world applications, because these are used heavily in certain fields.

Wuzzy · on May 17, 2011

My impression was that the OP didn't say superlinear algorithms are somehow useless; merely that there are reasons why the linear (or better) ones can be used in much more general settings, which is what makes them "big impact".

shoo · on May 18, 2011

Experimental settings like these are also interesting because they provide opportunities for the application of active learning.

For example, if we wish to learn the properties of some family of new materials, we may have to choose which particular elements of the family to synthesise before we can begin measuring anything. Even if we can only afford to take 100 samples, or less, the synthesis procedure might have a dozen parameters or more.

We then have the problem of sensibly selecting a small number of samples from a relatively high dimensional space. In this situation it could be very easy to justify some serious computational effort if it could potentially save months of wasted effort in the lab.

mturmon · on May 18, 2011

Agreed, and I'd add that I found the title to be misleading. It's one researcher's take on what's important, but of necessity it's constrained by that person's interests.

Yahoo cares about big data, but not everyone is in that domain.

alextp · on May 17, 2011

This is a really valuable list, specially because by reading journals and going to conferences one would not think these are the major problems (as most of these are too hard to be tackled directly and yet need simple solutions that would not translate to dozens of publications), but at the same time in talking to non-researchers there is too much clouding this clear picture.

I'm trying to focus more on these problems in my research, but it is no accident that they are still unsolved.

I think the main recent ideas to tackle these are: sampling instead of doing things exactly, random projections to compress information when needed, limiting the memory footpring of existing algorithms, and controlling shared state to increase distributedness. The hardest of these problems, how to act when what you do changes the world and the data you see, desperatly needs at least some guiding principles.