Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It was used extensively in classic computer vision descriptor matching including most image lookup e.g. google image search today. There are many reasons it not much used in recent computer vision. The first is that it does not work when the descriptors themselves are explicitly designed to make visually similar features descriptively distinct, and the second is that it generally works poorly for binary vectors which dominate. With regards to ml, its been tried, but the datasets are either too small or not available to researchers, and the result would be far too slow compared to the current approaches. Its actually one of the areas where deep learning hasn't reached parity to my current knowledge. It would be a great project, the dataset issue aside.


For binary vectors you can choose a different distance metric (not geometric one, i.e. Jaccard) that can be used to effectively hash similar data points into similar buckets.

Treating your binary vector as a set allows you to use min-hashing as your LSH schema (min-hashing is just a random permutation of the given set). This simple trick makes LSH with min-hashing quite a powerful tool for binary vectors that are extensively used in recommenders systems and other domains.

I've used LSH + Min-Hash for image search (and subsequently for audio fingerprinting). If interested, I've blogged about it here [1].

[1] - https://emysound.com/blog/open-source/2020/06/12/how-audio-f...


Agree. Also cosine distance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: