In Image similarity search with LIRE we explained how to compare and find similar images using the Java library LIRE. The idea was to transform the complicated problem of comparing a large bunch of pixels to the simpler problem of comparing vectors representing histograms and other higher-level properties. In other words, if we can compress the information inherent in a bunch of pixels to a point in n-dimensional space (an array with n entries – the so called feature vector of the image), we can regard the distance between two such points as a similarity measure for the corresponding images. We can then find the images similar to a search image by selecting those images whose feature vectors have a small distance to the feature vector of the search image.
However, a naive approach to the problem – comparing the feature vector of the search image to all feature vectors in the database – is rather slow if our database is large. In this article, we show how to implement a fast similarity search for even very large databases.