Describe the structure of inverted lists, and how they can be used for efficient retrieval.
Suppose we want to search the texts “hello everyone, ” “this article is used for inverted index, ” “which is like hashmap data structure”. If we index by (text, word within the text), the index with location in text is:
hello (1, 1) everyone (1, 2) this (2, 1) article (2, 2) is (2, 3); (3, 2) used (2, 4) for (2, 5) inverted (2, 6) index (2, 7) which (3, 1) like (3, 3) like (3, 4) data (3, 5) structure (3, 6)
The word “hello” is in document 1 (“hello everyone”) starting at word 1, so has an entry (1, 1) and word “is” is in document 2 and 3 at ‘3rd’ and ‘2nd’ positions respectively (here position is based on word).
With the inverted index created, we can efficiently retrieve documents because the query can now be resolved by jumping to the word ID (via random access) in the inverted index.
Get Answers For Free
Most questions answered within 1 hours.