Given a document with terms A, B, C with the following frequencies A:3, B:2, C:1. The document belongs to a collection of 10000 docs. The document frequencies are A:50 B:1300 C:250. Compute the TF-IDF and compare them
where is term frequency of term t
if frequency of term t within a document.
is the max frequency within the document.
where IDF = inverse document frequency of specific term t
x=Total number of documents
y=Number of documents with specific term t in it
Hence tfidf will be the product of tf weight and idf weight.
For A:- tf= 3/3=1 and idf= log(10000/50) = 5.3
Hence tfidf=5.3
For B :- tf=2/3 and idf = log(10000/1300)=2.0
Hence tf-idf=1.3
For C:- tf = 1/3 and idf = log(10000/250) = 3.7
Hence tf-idf=1.2
Clearly, weight of A is maximum and hence it has most importance. Then comes B and then C.
Get Answers For Free
Most questions answered within 1 hours.