What is it about?

Most retrieval models including the popular TF-IDF and BM25 relies mainly on two statistics extracted from a document collection: 1) term frequency (TF), which is the number of times a word occurs in a document, and 2) document frequency (DF), the number of documents where a word was used. Another statistic, collection term frequency (CTF), which is the total number of times a word occurs in the collection can be used to improve these retrieval methods.

Featured Image

Why is it important?

BM25 is a very popular and effective retrieval method. Many attempts have been made to improve it. In the paper, we make a summary of these attempts and concluded that our approach produced significant and consistent relative improvements, which are superior to previous approaches. Gathering and using CTFs have the same costs as DFs, which makes the approach conceptually simple and easy to implement in systems already based on BM25.

Read the Original

This page is a summary of: BM25-CTF: Improving TF and IDF factors in BM25 by using collection term frequencies, Journal of Intelligent & Fuzzy Systems, May 2018, IOS Press,
DOI: 10.3233/jifs-169475.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page