We generally run UMAP on regular semi-structured data like database query result...

abhgh · 2025-12-16T13:12:05 1765890725

Thank you. Your comment about LLMs to semantically parse diverse data, as a first step, makes sense. In fact come to think of it, in the area of prompt optimization too - such as MIPROv2 [1] - the LLM is used to create initial prompt guesses based on its understanding of data. And I agree that UMAP still works well out of the box and has been pretty much like this since its introduction.

[1] Section C.1 in the Appendix here https://arxiv.org/pdf/2406.11695

nighthawk454 · 2025-12-16T20:41:41 1765917701

I’m working on a new UMAP alternative - curious what kinds of improvements you’d be interested in?