What does any of that have to do with modifying P(sentence) relative to P(senten...

srean · on March 12, 2012

Ah I see your point. Norvig uses a positive exponent on the prior whereas I have written it in the form where there is a negative exponent on the conditional. According to Bayes, we predict class 1 when

   P(C_1) P(words | C_1) > P(C_2) P(words | C_2)

The positive exponent on P(C_i) and a negative exponent on P(Words |C_i) are equivalent as far as classification is concerned, one needs to take a suitable (negative power) on both sides so that the exponent on the conditional becomes one and you end up with a positive exponent on P(C_i).

Thanks for catching this.

The constant c can and do vary with the class label. As a result the Bayes classifier will use conditional distributions that have a fixed exponents. Typically however these will depend per word and per class. The model that Norvig is using a very simple variant where he is fixing the exponent. As I said its not the exact expression that one would have obtained by assuming a power law, but is very similar.