Naive Bayes Theorem Everyday

Naive Bayes Theorem is “a family of probabilistic algorithms” that use Probability Theory and Bayes Theorem to ascertain if a random element is associated with our reference element. In the context of the technology of image recognition, if our reference element is an apple, we would use the probability if the random element given was red, round, oval shaped, etc. If the probabilities of the random element was high enough, the machine would classify it as an apple. Thus is the logic of the Naive Bayes classifier which can be considered to be Artificial Intelligence. In this article, I shall go through a basic classifier using the Naive Bayes classifier algorithm and then explore the concept’s vast applications in today’s technologies.
First, our example tries to determine whether a piece of text is music related. With a training data set, Naive Bayes Theorem can, using Probability Theory find if the piece of text is music related given prior knowledge of certain conditions such as if other texts similar to it (training data) are also classified as such. This such example is a very common Natural Language Processing problem which is involved in Machine Learning.
In our training data, we could put sentences/text and label it as ‘music’ or ‘not music.’ In ML the pieces of information we give to the computer are called ‘features’ so that the machine can use it in an algorithm to produce the desired results. For NLP, we use word frequency or a word count in our text classifier to determine, along with the probability theory of Bayes Theorem to classify a piece of text. For simplification, we will classify this text as music or non-music. We shall now introduce our training data.

The piece of text that we want to be classified is:
“The song had a sad harmony.”
Here, we can apply Bayes Theorem:
P(A|B) = P(B|A) * P(A) / P(B) to our piece of text to get:
P(music|the song had a sad melody) =
P(the song had a sad melody|music) * P(music) / P(the song had a sad melody)
Remembering our Probability Laws in math we know that:
P(the song had a sad melody|music) = P(the|music) * P(song|music) * P(had|music) * P(a|music) * P(sad|music) * P(harmony|music)
Because we know P(music) is 3/5 and P(not music) is 2/5 from our training data, we can sub in our values into Bayes Theorem. Below is a table that summarizes our numbers.

But first, because there are zero probabilities in some of our data which is a problem since if we multiplied we would get zero. In order to solve this issue, we use LaPlace Smoothing which adds 1 in the numerator and the total number of unique words in our entire training data in the denominator. More information on LaPlace Smoothing can be found here:
Because we are finding the probability of “the song had a sad melody” in music and not music, we may just compare numerators to see which has a higher probability to see which tag this text belongs to. Remember to multiply the values of the LaPlace Smoothing column by 3/5 (P (music)) and 2/5 (P(not music)) respectively to get the full numerator which is as follows:
P(music|the song had a sad melody) = (4/30)*(2/30)*(2/30)*(2/30)*(1/30)*(3/5)
and
P(not music|the song had a sad melody) = (3/25)*(1/17)*(1/17)*(1/17)*(1/17)*(2/5)
.22 and 5.7*10^-7 , respectively which means it GETS the music tag. This probability method using Bayes Theorem is called Naive Bayes because the words technically do not have any dependence on each other. Rather, we are just getting all the probabilities of our words appearing simultaneously.
This is a very basic example of the Naive Bayes classifier but its versatility in its applications are endless. Bayes theorem lends its hand in recommendation systems where associations can be made based on past selections towards future selections in music artists, movie genres, books, and food recommendations. These tech applications are already seen in Netflix, Yelp, and Spotify.
Besides being used as a recommendation system on websites, its other applications are in medical diagnostics where we can predict a certain disease given the patients other health parameters associated with that disease. It can also be used as a spam filter which it can tag mail associations in common with spam mail. Beside that, it is used in face recognition softwares as well, where certain characteristics of the face can be used as training data to then mathematically calculate the probability of a given image to be a face. We can also use it for weather prediction since existing data can be used where given these conditions in our training data, the computer can calculate the probabilities of a future weather forecast.
Finally, if anyone did a search on the usefulness of Naive Bayes Theorem in modern society, much information is readily available and accessible to the public. It is extremely accurate and reproducible and easily applied in everyday things that technology makes so convenient for us. It demystifies how our camera phones can detect our family’s and friend’s faces as we take a snapshot of them, or how Spotify tells me I would enjoy Alessia Cara because it learned that I downloaded Sam Smith two weeks ago. It told me that I would find ‘Primal Fear’ to be captivating because I highly rated ‘The Usual Suspects.’ It also saved time and clutter on my email account by moving all my spam to another folder. These real-world applications make me thank math and probability for introducing me to new restaurants, music, movies, books and more.