To get optimum performance with PostgreSQL full text search you need to create a column to store the tsvector values with an index on it. Then you need to fill this column with the tsv values and create a trigger to update the field on INSERT and UPDATE.
I've been playing around with full text search in Postgres, and I took this sort of approach when starting out, but then realized you I could just have the index be an expression.
So instead of (per the example)
CREATE INDEX tsv_idx ON documents USING gin(tsv);
doing something like
CREATE INDEX tsv_idx ON documents USING gin(to_tsvector('english', text));
Is there any reason you wouldn't do this? For multi-languange you you'd have to detect the language of the text, but there's no reason you couldn't parameterize that too.
You can precompute to_tsvector() in parallel ahead of time if you're storing it in a dedicated column. CREATE INDEX runs on a single thread, including the part where it evaluates to_tsvector() for each row. If you ever need to recreate the index, it'll be faster if the tsvector is in a dedicated column.
I have a table with 30 million documents using pgsql's full text index. Creating the index takes ages, and search performance is generally very poor. The difference between creating the index with the precomputed column versus creating the index with the expression in the index itself (which is how I originally did it) was substantial.
No, there's no real reason why you would not use a functional index. The reason so many tutorials use a dedicated column is simply lack of up-to-date information, I think.
That is the way I do it also. I have used Lucene and Solr a lot in the past, but I now find Postgres text search to be more than adequate for my needs and it does make software development and deployment simpler.
You're not including the title from the JSON column or setting weights in your alternate version. I'm not sure if that can be included in the index or not, but that might be the reasoning.
If you want to preprocess your document, or aggregate different parts of it with different labels and weights, it can be helpful to store it in a separate column.
Of course you can always just index your preprocessing/aggregating function and call it every time you want to search, but depending on how expensive that is, it might be in your interest to do it upfront and make your searches a bit quicker.
I've been playing around with full text search in Postgres, and I took this sort of approach when starting out, but then realized you I could just have the index be an expression.
So instead of (per the example)
doing something like Is there any reason you wouldn't do this? For multi-languange you you'd have to detect the language of the text, but there's no reason you couldn't parameterize that too.