In Locality Sensitive Hashing, we divide signature columns to b bands where there are r elements in each band for each column. Consider two columns C1 and C2 of length m = br. Suppose that for each i = 1, .., m, the probability that C1(i) = C2(i) is equal to s independently from other elements. Note that s is in fact the similarity of the two columns.
(a) Show that the probability that the two columns are equal in one band is s^r .
(b) Show that the probability that none of the bands are equal is (1 − s ^r)^b .
(c) Conclude that the probability that the columns are chosen as a candidate pair (i.e., at least one of the bands match) is 1 − (1 − s^r)^b
Get Answers For Free
Most questions answered within 1 hours.