More

jamesgresql · 2026-01-11T20:54:47 1768164887

I know it sounds obvious, but some people are pretty determined to us it that way!

jamesgresql · 2025-12-14T19:39:24 1765741164

Hey HN! Author here. We added faceted search capabilities to our `pg_search` extension for Postgres, which is built on Tantivy (Rust's answer to Lucene). This brings Elasticsearch-style faceting directly into Postgres with a 14x performance improvement over a CTE based approach by performing facet aggregations in a single BM25 index pass and making use of our columnar store.

You get the same faceting features you'd expect from a dedicated search engine while maintaining full ACID compliance. Happy to answer technical questions about the implementation!

PSeitz · 2025-12-15T02:51:39 1765767099

Hi, tantivy dev here. There are two recent performance improvements in tantivy, which should make term aggregations considerable faster.

https://github.com/quickwit-oss/tantivy/pull/2740 https://github.com/quickwit-oss/tantivy/pull/2759

stuhood · 2025-12-15T15:34:22 1765812862

Yes, thank you for your hard work! We rebased recently, and we'll likely talk about those improvements as part of our `0.21.x` release.

jamesgresql · 2025-12-12T18:49:19 1765565359

Haha, I like “good old tokenization”

jamesgresql · 2025-12-12T18:31:43 1765564303

Amazing, will have a read!

jamesgresql · 2025-12-12T18:31:18 1765564278

Chinese, Japanese, Korean etc.. don’t work like this either.

However, even though the approach is “old fashioned” it’s still widely used for English. I’m not sure there is a universal approach that semantic search could use that would be both fast and accurate?

At the end of the day people choose a tokenizer that matches their language.

I will update the article to make all this clearer though!

jamesgresql · 2025-12-12T18:25:47 1765563947

100%, maybe we should do a follow up on other types of tokenization.

jamesgresql · 2025-12-08T19:19:17 1765221557

Hello HN, author here. It seems like everyone is talking about 'hybrid search' (lexical/BM25 + semantic/vector) these days, so I wanted to show how it's possible (and fully customizable) using reciprocal rank fusion in SQL.

jamesgresql · 2025-10-12T22:11:33 1760307093

The original title of this post was "When Tokenization Becomes Token", but nobody got it.

I'm curious, after reading this article how many people can tell me why that title would have been great?

(also I'd love feedback on the interactive components, I think they came out well!)

anonymoushn · 2025-10-15T18:03:29 1760551409

hello. it would be great because it celebrates the total destruction of the search capabilities of the tool, just like the article does.

jamesgresql · 2025-09-18T15:12:37 1758208357

Author here — you beat me to it!

Hi everyone A lot of you will probably see the title of this post and immediately think “of course, just use the right tool for the job.”

But for those who don’t … here's a thing for you.

I’d really love to hear from both sides:

- Folks who’ve been burned by using Elastic as a primary datastore.

- Folks who haven’t — and can make the case for why it works just fine. (I know some of you are out there!)

jamesgresql · 2025-06-18T08:07:47 1750234067

Not the case!

gangstead · 2025-06-18T15:28:54 1750260534

Then why reference the "Agentic Era" in the title?