skip to Main Content

I need to join two tables: our_sample and tls207_pers_appln from PATSTAT.

our_sample‘ has 4 columns: appln_id, appln_auth, appln_nr, appln_kind.

tls207_pers_appln has 4 columns: appln_id, person_id, applt_seq_nr, invt_seq_nr.

our_sample has 2191 rows and some (60) of them are missing in tls207_pers_appln.

Because I want to join the tables keeping all the appln_id in our_sample (even if they do not have the matching information from tls207_pers_appln) I join the two tables doing a RIGHT JOIN.

However, the resulting view t2_tot_in_patent has only 2096 appln_id.

This is in part due to the restriction I put (35 patents are dropped because I select only those HAVING MAX(invt_seq_nr) > 0, which is fine). But this would yield 2191-35 = 2156 patents.

Instead, I get 2096 of them that is:
2191 (in our_sample) – 60 (appln_ids in our_sample that miss from tls207) – 35 (appln_ids for which invt_seq_nr = 0).

BUT the whole point of using RIGHT JOIN is that I should not loose those 60 patents. Why then?

-- compiling total count of inventors per patent: t2_tot_in_patent

DROP VIEW IF EXISTS t2_tot_in_patent;
CREATE VIEW t2_tot_in_patent AS
SELECT m.appln_id, MAX(invt_seq_nr) AS tot_in_patent
FROM patstat2022a.tls207_pers_appln AS t7
RIGHT OUTER JOIN cecilia.our_sample AS m
ON t7.appln_id = m.appln_id 
GROUP BY appln_id
HAVING MAX(invt_seq_nr) > 0

2

Answers


  1. Suggestion:

    GROUP BY m.appln_id
    

    Maybe the SQL is grouping by appln_id of the other table.

    Try this:

    HAVING MAX(invt_seq_nr) > 0 OR MAX(invt_seq_nr) IS NULL
    

    Right now your code is getting only rows where this max value is more than 0 but those that don’t exist in the other table are getting ignored because they would have this value as NULL.

    Login or Signup to reply.
  2. A right-join to our_sample in the second position is the same as a left-join starting with our_sample in the first position. No alias for the max( invt_seq_nr ) table which I would imply comes from the t7 table.

    Now, it appears you want to have EVERYONE from the our_sample table regardless of the tls207_pers_appln. I would reverse to a left-join putting our_sample in primary position.

    SELECT
            m.appln_id, 
            count(t7.invt_seq_nr ) t7Entries,
            MAX( t7.invt_seq_nr) AS tot_in_patent
        FROM
            cecilia.our_sample AS m
                LEFT JOIN patstat2022a.tls207_pers_appln AS t7
                    ON m.appln_id = t7.appln_id
        GROUP BY 
            m.appln_id 
        order by
            case when count(t7.invt_seq_nr ) = 0
                then 1 else 2 end
    

    Remove the HAVING clause, that is going to chop off those people who never had an event. By adding the order by as I have here, I am basically floating to the top of the list all people who did NOT have any records in the t7 table.

    At this point you should be able to get your entire EXPECTED count to be correct, but also see those that dont have and see if that pairs-up with the missing count encountered.

    Once confirmed, you can re-apply the HAVING clause to only restrict those who HAD activity in the t7 table vs not.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search