Explain why it is not good to do sorting in hard drive. List and explain the steps of blocked sort-based indexing
It is not good to do sorting in hard drive as the external sort minimises the amount of data read and written to external storage (and historically also seek times), and a general-purpose virtual memory implementation on top of a sort algorithm not designed for this will not be competitive with an algorithm designed to minimise IO.
The steps of blocked based indexing are:
Block merge indexing:
1. Parse documents into (TermID, DocID) pairs until “block” is full
2 Invert the block
3. Sort the (TermID,DocID) pairs
4 Write the block to disk
5 Then merge all blocks into one large postings file
-------------------------------------------------------------Please Upvote-----------------------------------------------------------------
Get Answers For Free
Most questions answered within 1 hours.