We share our admiration for a set of results called “locality sensitive hashing” by demonstrating a greatly simplified example that exhibits the spirit of the techniques.Locality sensitive hashing is awe inspiring in its originality, simplicity, beauty and effectiveness. In addition locality sensitive hashing is a remarkable technique as it works even when drastically abridged and simplified. In this paper (link to pdf) we give a description of conditions where the technique works and a heuristic argument why it works (using only elementary math).

Categories: Computer Science Exciting Techniques Expository Writing math programming Opinion Tutorials

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

Some simple example code now up on GitHub: https://github.com/WinVector/Locality-Sensitive-Hashing-Example.

It is just a simple example on random data- but even on as few as 10,000 vectors we see reliable results and a 25 times speedup over brute force. Also the simple code deals two major problems seen when working in Java: expense of many small objects and inability to reliably key things off floating point (due to non-repeatiablity default Java floating point).

The detailed log given in the example show how we sweep all projection widths until we find a sweet spot where we are getting sets small enough to inspect (look at fraction inspected, when it is low we are skipping large components).