Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The description of Naive Bayes is misleading. Almost all supervised learning problems assume that "Inputs are classified in isolation where no input has an effect on any other inputs" (quote from the article), but that's not why Naive Bayes is called naive.

The naive assumption made by Naive Bayes is that the features (or attributes) of each input point are independent. Let me explain from a simple example:

Suppose you want to find people who receive benefits they are not entitled to. The input data might have two attributes: cash on bank account, and amount received in benefits. Although you could look for data that have a high value in both attributes, the naive assumption made in Naive Bayes says that you can in fact make your classification without correlating multiple attributes; Naive Bayes assumes you can explain the labeling of data just by looking at attributes in isolation. In this example, this assumption is clearly unfounded, since if you only look at benefits or only at cash balance, you won't be able to tell how a person should be classified.

The data independence assumption made by almost all ML algorithms is that different data points are not correlated: the label of a single data point (person in the above problem) does not depend on the attributes of other data points.



I agree. Most supervised learning classifiers are derived based on the independent and identically distributed assumption for each (x,y) pair.

To be more specific about the Naive Bayes assumption, the features of a data point are conditionally independent instead of simply independent. This means that given a certain label, these set of features are independent.


Hi, I'm Stephanie Kim and wrote the talk/post. Thanks for the comment! Yes you are correct I should have specified that it is the features of each input rather than the inputs that are regarded as independent from one another! I will revise that in the post. Again, thanks for pointing that out since it's an important distinction, especially for people just starting out!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: