full-text searching is contentious, it's a rabbit-hole of holy-wars. I don't bel...

thaumaturgy · on Jan 24, 2021

> For instance, sometimes stemming words is the right way to go, but if you are say doing a medical system where two very different medicines could be spelled similar and stem to the same word...

FWIW lemmatization may be a good alternative to stemming. Stemming is algorithmic and can generate errors, as you point out; "caring" for example might naively be stemmed to "car". Lemmatization uses a dictionary of senses and their root words to avoid this. For common English, there's Princeton's Wordnet (https://wordnet.princeton.edu/). Lemmatizing technical niches, like medicine, would require an additional dictionary.

yorwba · on Jan 24, 2021

> there's also sphinx and xapian, also fairly widespread.

Sphinx is now Manticore, and as luck has it, a Manticore dev is in this thread, offering support: https://news.ycombinator.com/item?id=25890828

gutensearch · on Jan 24, 2021

Thanks! I really appreciate the pointers. I already had planned to explore some of these and you've expanded and directed the search nicely.