Build News Recommendation Model Using Python, BERT and FAISS

Zahra Ahmad
7 min readJun 9, 2022
Photo by Filip Mishevski on Unsplash

In this story, I will show how to build news recommendation model based on topic, we will use Microsoft News Dataset (MIND) for experiments.

The Proposed Solution

The proposed solution is based on content-level recommendations, it means that I will not take into consideration collaborative recommendations (readers who read this, also read the following), which I will keep for a future post.

The system is built to measure semantic similarity between the current news item (the one that a user is reading) and the rest of the news items in the index, semantic similarity is different from term-matching based methods such as Tf-IDF.

Term frequency — inverse document frequency (TF-IDF) is commonly used to create document vectors. The algorithm does not take word order into account. Instead, a bag-of-words approach is used where each term receives a weight corresponding to its frequency in a document compared to the inverse frequency in the corpus. This assigns large weights to terms appearing infrequently in the corpus but often in a certain document, which potentially are more representative of the document than more common words.

Then a similarity score function such as cosine similarity is used to calculate the difference between…

--

--

Zahra Ahmad

MSc in Data Science, I love to extract the hell out of any raw data, sexy plots and figures are my coffee