Fisher information is key and turns up in a lot of fundamental places. I'm currently slowly working my way through a text on Information Geometry and another on Ideals and varieties. There is only a limited time one can devote to constant learning so I try to learn things that cut through as much territory as possible. I feel strong discomfort when reading about subjects like say machine learning where a lot of stuff is seemingly arbitrary rules of thumb* .
Turns out that a bunch of geometric ideas that are useful in physics also unify ML concepts. The idea of Information Geometries. There are three ways I have seen the concept used. One is based on differential geometry and treats sets of probability distributions as manifolds and their parameters as coordinates. Many concepts are unified and tricky ideas become tautologies within a solid framework (fisher information as a metric) http://www.cscs.umich.edu/~crshalizi/notabene/info-geo.html .
The other approach is in terms of varieties from algebraic geometry. Here statistical models of discrete random variables are the zeros of certain sets of polynomials (which describe hypertetrahedrons). Graphical models (hidden markov models, neural nets, bayes nets) are all treated on one footing.
The final approach is an interesting set of techniques where a researcher abstracts Information retrieval with methods from quantum mechanics. The benefit is that you get a basic education in the math of QM as well.
* Arbitrary in the sense that you just have to accept a lot things that only become less fuzzy in time. Where as a proper framework provides handholds that reward effort with proportional amounts of understanding. The last time I felt this way was when I was first learning functional programming 7 years ago. The terminology was different and heavy going from imperative programming but I knew the rewards in understanding, expressiveness and flexibility would be well worth the effort. Confusion dissipated linearly with effort (unlike C++'s nonlinear relationship) and I knew that I was picking up a bunch of CS theory at the same time that would make learning programming (and C++) much easier.
Turns out that a bunch of geometric ideas that are useful in physics also unify ML concepts. The idea of Information Geometries. There are three ways I have seen the concept used. One is based on differential geometry and treats sets of probability distributions as manifolds and their parameters as coordinates. Many concepts are unified and tricky ideas become tautologies within a solid framework (fisher information as a metric) http://www.cscs.umich.edu/~crshalizi/notabene/info-geo.html .
The other approach is in terms of varieties from algebraic geometry. Here statistical models of discrete random variables are the zeros of certain sets of polynomials (which describe hypertetrahedrons). Graphical models (hidden markov models, neural nets, bayes nets) are all treated on one footing.
The final approach is an interesting set of techniques where a researcher abstracts Information retrieval with methods from quantum mechanics. The benefit is that you get a basic education in the math of QM as well.
* Arbitrary in the sense that you just have to accept a lot things that only become less fuzzy in time. Where as a proper framework provides handholds that reward effort with proportional amounts of understanding. The last time I felt this way was when I was first learning functional programming 7 years ago. The terminology was different and heavy going from imperative programming but I knew the rewards in understanding, expressiveness and flexibility would be well worth the effort. Confusion dissipated linearly with effort (unlike C++'s nonlinear relationship) and I knew that I was picking up a bunch of CS theory at the same time that would make learning programming (and C++) much easier.