skip to Main Content

I am trying to avoid duplicate entries in the table based on type and names however the unique constraint not getting applied.

    CREATE TABLE IF NOT EXISTS test
(
    id integer NOT NULL DEFAULT nextval('threshold_retry.threshold_details_id_seq'::regclass),
    component_type text COLLATE pg_catalog."default" NOT NULL,
    component_names text[] COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT test_pk PRIMARY KEY (id)
)

Data to be inserted:

INSERT INTO test(
    id, component_type, component_names)
    VALUES (1, 'INGESTION', '{ingestiona,atul, ingestiona, ingestionb}'),
    (2, 'INGESTION', '{test_s3_prerit, atul}'),
    (3, 'DQM', '{testmigration}'),
    (4, 'SCRIPT', '{scripta}'),
    (5, 'SCRIPT', '{testimportscript, scripta}'),
    (6, 'SCRIPT', '{Script_Python}'),
    (7, 'BUSINESS_RULES', '{s3_testH_Graph}'),
    (8, 'EXPORT', '{Export2}')
    ;

I want the result to be similar like :

component_type  component_names
INGESTION       {ingestiona,atul,ingestionb}
INGESTION       {test_s3_prerit}
DQM             {testmigration}
SCRIPT          {scripta}
SCRIPT          {testimportscript}
SCRIPT          {Script_Python}
BUSINESS_RULES  {s3_testH_Graph}
EXPORT          {Export2}

Where atul, ingestiona and scripta is getting removed.
I tried using gist but it is not working for text and text[] cobination

    ALTER TABLE threshold_retry.test
ADD CONSTRAINT exclude_duplicate_names EXCLUDE USING gist (component_type with =, component_names  with &&);

Tried creating Operator class but that too not working.

How to achieve this while insterting or updating the value?

2

Answers


  1. I would suggest to manage the deduplication logic in the application layer or via a function before inserting or updating data in the table like the following

    CREATE OR REPLACE FUNCTION normalize_component_names() RETURNS TRIGGER AS $$
    DECLARE
        existing_names text[];
    BEGIN
        SELECT array_agg(DISTINCT unnest(component_names)) INTO existing_names
        FROM test
        WHERE component_type = NEW.component_type;
        
        IF existing_names IS NOT NULL THEN
            NEW.component_names := array(SELECT DISTINCT unnest(array_cat(existing_names, NEW.component_names)));
        END IF;
    
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    

    Then Create a trigger to use this function

    CREATE TRIGGER normalize_names_before_insert_or_update
    BEFORE INSERT OR UPDATE ON test
    FOR EACH ROW
    EXECUTE FUNCTION normalize_component_names();
    

    You will insert data without checking duplicates at this point because the function will handle deduplication. After inserting the data, you should query the table to ensure that duplicates within the same component_type have been handled appropriately:

    SELECT component_type, component_names FROM test;
    
    Login or Signup to reply.
  2. You can do this using a function and a CHECK constraint. This is likely more efficient than a trigger.

    The function checks that all values are unique: in other words after grouping by the value there is no group with more than one value.

    CREATE FUNCTION is_unique(input text[])
    RETURNS bool AS $$
      SELECT NOT EXISTS(SELECT 1
        FROM unnest(input) AS v(value)
        GROUP BY value
        HAVING COUNT(*) > 1) AS result;
    $$ LANGUAGE SQL;
    

    Then:

    CREATE TABLE IF NOT EXISTS test
    (
        id integer NOT NULL DEFAULT nextval('threshold_retry.threshold_details_id_seq'::regclass),
        component_type text COLLATE pg_catalog."default" NOT NULL,
        component_names text[] COLLATE pg_catalog."default" NOT NULL,
        CONSTRAINT test_pk PRIMARY KEY (id),
        CONSTRAINT uq_names CHECK ( is_unique(component_names) )
    );
    

    db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search