LSH – an efficient approach to nearest neighbour search

In Image similarity search with LIRE we explained how to compare and find similar images using the Java library LIRE. The idea was to transform the complicated problem of comparing a large bunch of pixels to the simpler problem of comparing vectors representing histograms and other higher-level properties. In other words, if we can compress the information inherent in a bunch of pixels to a point in n-dimensional space (an array with n entries – the so called feature vector of the image), we can regard the distance between two such points as a similarity measure for the corresponding images. We can then find the images similar to a search image by selecting those images whose feature vectors have a small distance to the feature vector of the search image.

However, a naive approach to the problem – comparing the feature vector of the search image to all feature vectors in the database – is rather slow if our database is large. In this article, we show how to implement a fast similarity search for even very large databases.