Depends on the nature of the content you’re working with, but I’ve had some good...

mrshu · on Nov 1, 2024

This is also often referred to as Hypothetical Document Embeddings (https://arxiv.org/abs/2212.10496).

adamgordonbell · on Nov 1, 2024

Do you have examples of this? Please say more!

joerick · on Nov 1, 2024

Nice workaround. I just wish there was a less 'lossy' way to go about it!

jacobr1 · on Nov 1, 2024

Could you explicitly train a set of embeddings that performed that step in the process? For example which computing the loss, you compare the difference against the normalized text rather than the original. Or alternatively do this as a fine-tuning. Then you would have embedding that optimized for the characteristics you care about.

hobs · on Nov 1, 2024

Normal full text search stuff helps reduce the search space - eg lemming, stemming, query simplification stuff were all way before LLMs.