It’s impossible to answer that question without knowing what content/query domain you are embedding. Checkout MTEB leaderboard, dig into the retrieval benchmark, and look for analogous datasets.
So we're talking maximizing embedding model per use case? Medical dats would require differnet model than say sales data? Sounds very fragmented approach.