skip to Main Content

I have an articles table with these (relevant) fields

CREATE TABLE
  IF NOT EXISTS articles (
    ...
    ...
    title TEXT NOT NULL,
    content JSONB NOT NULL,
    search_index TSVECTOR NOT NULL,
    ...
    ...
  );

with an index on search_index

CREATE INDEX articles_search_index ON articles USING gin(search_index);

And I want to be able to search both title and content without any particular priority.

I want to treat entire content data as a blob of text without actually changing its structure.

What I’d hope would work:

CREATE TRIGGER articles_search_index_update BEFORE INSERT
OR
UPDATE
  ON articles FOR EACH ROW EXECUTE FUNCTION 
     tsvector_update_trigger
        (search_index,
         'pg_catalog.english', 
          title, 
          content
        );

Error – column "content" is not of a character type which is understandable since tsvector_update_trigger expects text columns.

Is it possible to have jsonb content in that function?

2

Answers


  1. You could cast the entire json to a string by using content::text instead of content in the update trigger, but then all the brackets, keys and commas of the json would also get thrown in which could be problematic. If content always contains the same simple properties it will be much cleaner to manually extract the (relevant) properties, cast them to text where necessary and concatenating them together instead.

    Login or Signup to reply.
  2. You can use a generated column with jsonb_to_tsvector(). As you also want to index the title, you will need to concatenate the content column with a JSON value containing the title.

    Something along the lines:

    CREATE TABLE IF NOT EXISTS articles 
    (
      ...
      title TEXT NOT NULL,
      content JSONB NOT NULL,
      search_index TSVECTOR generated always as 
         (jsonb_to_tsvector('english', content||jsonb_object(array['title', title]), '"all"')) stored
    );
    

    Note that you can not use jsonb_build_object() as that is unfortunately not marked as "immutable", but jsonb_object() is.

    If the JSON stored in the content column can contain a top-level key title, you will need to use a different key, maybe jsonb_object(array['__title', title]).

    One advantage I can see (apart from not having to manage the trigger) is, that this only indexes the actual JSON values not the keys, so I would expect this to result in more accurate search results.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search