Question

Given a document with terms A, B, C with the following frequencies A:3, B:2, C:1. The...

Given a document with terms A, B, C with the following frequencies A:3, B:2, C:1. The document belongs to a collection of 10000 docs. The document frequencies are A:50 B:1300 C:250. Compute the TF-IDF and compare them

Homework Answers

Answer #1

where is term frequency of term t

if frequency of term t within a document.

  is the max frequency within the document.

where IDF = inverse document frequency of specific term t

x=Total number of documents

y=Number of documents with specific term t in it

Hence tfidf will be the product of tf weight and idf weight.

For A:- tf= 3/3=1 and idf= log(10000/50) = 5.3

Hence tfidf=5.3

For B :- tf=2/3 and idf = log(10000/1300)=2.0

Hence tf-idf=1.3

For C:- tf = 1/3 and  idf = log(10000/250) = 3.7

Hence tf-idf=1.2

Clearly, weight of A is maximum and hence it has most importance. Then comes B and then C.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Consider the following symbols and their frequencies: a:1, b:2, c:3, d:4, e:5, f:6 What is the...
Consider the following symbols and their frequencies: a:1, b:2, c:3, d:4, e:5, f:6 What is the amount of bits needed per character for a balanced tree encoding.
1.) Why is the statement of cash flows a useful document? 2.) Define the following terms...
1.) Why is the statement of cash flows a useful document? 2.) Define the following terms as they relate to the statement of cash flows: cash flows: cash, operating activities, investing activities, and financing activities. 3.) How does the direct method differ from the indirect method?
1. The following formula calculates the ______________. ?=χobt2n A. chi-square statistic B. expected cell frequencies C....
1. The following formula calculates the ______________. ?=χobt2n A. chi-square statistic B. expected cell frequencies C. phi-coefficient D. Cramer's V E. gamma 2.The following formula calculates the ______________. ?=RMi×CMin A. chi-square test statistic B. expected cell frequencies C. phi-coefficient D. lambda E. Yule's Q 3.The following formula calculates the ______________. ?=∑i=1k(fo−fe)2fe A. lambda B. expected cell frequencies C. phi-coefficient D. chi-square statistic E. Yule's Q 4. A(n) ____________ is a statistic that informs us about the strength, or magnitude, as...
Complete the given relative frequency distribution. Outcome     1         2         3         4  &
Complete the given relative frequency distribution. Outcome     1         2         3         4         5     Rel. Frequency 0.4 0.1 0.3 0.1 Compute the relative frequencies. (a)     P({2, 3, 4}) (b)     P(E') where E = {3, 4}
Given: A =   1 −1 −2 3 B =   1 5 8 −8 C =   1...
Given: A =   1 −1 −2 3 B =   1 5 8 −8 C =   1 2 3 4 Solve: AX + B = C X =
1.) Use the following terms to fill in the correct answers. a. Assets b. Liabilities c....
1.) Use the following terms to fill in the correct answers. a. Assets b. Liabilities c. Equity/Net Assets d. Revenues e. Expenses f. Surplus or Deficit/Change in Net Assets An operating statement for an organization lists the ____________, ____________, and ____________; while the balance sheet presents an organization’s ____________, ____________, and ____________. 2. The GAAP concept that values assets at what an organization paid for them at acquisition is termed: A) Fair market valuation B) Objective measurement C) Replacement cost...
Given: x = 2, 1, 5, 3 y = -4, 3, 2, 1 b = 2...
Given: x = 2, 1, 5, 3 y = -4, 3, 2, 1 b = 2 Determine the following: a) Σx =   b) Σy =   c) Σbx + Σby =   d) (Σy)3 =   e) Σ(x + y)2 =   f) [Σ(x + y)]2 =  
Given: Year 0 1 2 3 4 Project A $-350 $50 $100 $150 $200 Project B...
Given: Year 0 1 2 3 4 Project A $-350 $50 $100 $150 $200 Project B $-250 $125 $100 $75 $50 Hurdle rate = 10%, Use incremental IRR analysis to decide whether you should take project A instead of project B.
1. the assertions occurrence and existence involves: a. Following accounting standards b. Ownership c. Overstatement d....
1. the assertions occurrence and existence involves: a. Following accounting standards b. Ownership c. Overstatement d. Understatement 2. Management states "that all recorded sales took place" relates to which management assertion? a. Completeness b. Occurrence c. Rights and Obligations d. Accuracy, valuation and allocation 3. as a test of sales for completeness, an auditor selects 50 sales recorded in the sales journal and vouches them to sales invoices and then to the respective shipping document. This evidence is not appropriate...
Given ??+2?+3?^2=3: (a) Find ?′ by implicit differentiation (leave your answer in terms of ? and...
Given ??+2?+3?^2=3: (a) Find ?′ by implicit differentiation (leave your answer in terms of ? and ?). (b) Solve the equation for ? and differentiate to get ?′ in terms of ?. (The answers should be consistent!)