Build News Recommendation Model Using Python, BERT and FAISS
In this story, I will show how to build news recommendation model based on topic, we will use Microsoft News Dataset (MIND) for experiments.
The Proposed Solution
The proposed solution is based on content-level recommendations, it means that I will not take into consideration collaborative recommendations (readers who read this, also read the following), which I will keep for a future post.
The system is built to measure semantic similarity between the current news item (the one that a user is reading) and the rest of the news items in the index, semantic similarity is different from term-matching based methods such as Tf-IDF.
Term frequency — inverse document frequency (TF-IDF) is commonly used to create document vectors. The algorithm does not take word order into account. Instead, a bag-of-words approach is used where each term receives a weight corresponding to its frequency in a document compared to the inverse frequency in the corpus. This assigns large weights to terms appearing infrequently in the corpus but often in a certain document, which potentially are more representative of the document than more common words.
Then a similarity score function such as cosine similarity is used to calculate the difference between…