skip to Main Content

Data Table

Name    Company     Continent   country     state   district    
Tom     HP          Asia        India       Assam   Kdk    
George  SAP         Africa      Sudan       Chak    ksk     
Bill    EBAY        Europe      Denmark     Lekh    Sip     
Charles WM          Asia        India       Haryana Jhat    
Chip    WM          Asia        India       Punjab  Chista    
Chia    WM          Asia        India       Punjab  Mast

Rule Table

Continent   country     state   district    Pass    
Asia        India       ALL     ALL         Yes    
Asia        India       Punjab  ALL         NO
Asia        India       Punjab  Mast        Yes

I have two tables in Hive. Depending on the rule I have to filter out the data in the data table.

In the rule table there is a column called pass which determines whether a record in data table needs to be filtered or not.

In this example there are different kinds of rules. They are the ones at broader level and at narrow level.
The rules at narrow level should not affect the rules at broader level. This means the rules at narrow level is an exception to rules at broader level.
For ex: in the rules table, there are 3 records. The first record is the rule at broader level. The other ones are at narrow level.

The first rules says to pass all the records that have country as india,state as any/all and district as any/all.
The second rule says to not pass all the records that have country as India, state as punjab and district as any/all.
The third rule says to pass all records that have country as India,state as punjab and district as Mast.

The second rule is an exception to first rule. The third rule is an exception to second rule.

Considering the data in the data table and rules in the rules table, the pass columns will be as follows for the Indian(country) records.

Name    Company     Continent   country     state   district    Pass    
Tom     HP          Asia        India       Assam   Kdk         Yes    
Charles WM          Asia        India       Haryana Jhat        Yes    
Chia    WM          Asia        India       Punjab  Mast        Yes
Chip    WM          Asia        India       Punjab  Chista      No

This is just an example. In production the data will be different.

How do I implement this using SQL/Sql script?

Help is much appreciated.

2

Answers


  1. You want the most specific rule. In Hive, you can use multiple left joins:

    select d.*, coalesce(r1.pass, r2.pass, r3.pass)
    from data d left join
         rules r1
         on r1.Continent = d.Continent and
            r1.country = d.country and
            r1.state = d.state and
            r1.district = d.district left join
         rules r2
         on r2.Continent = d.Continent and
            r2.country = d.country and
            r2.state = d.state  and
            r2.district = 'ALL'  left join
         rules r3
         on r3.Continent = d.Continent and
            r3.country = d.country and
            r3.state = 'ALL' and
            r3.district = 'ALL' ;
    

    You might want to continue with the LEFT JOINs if 'ALL' is allowed for continent and country.

    Login or Signup to reply.
  2. @TomG : Please see the below code if that helps

    select * from TEMP_TESTING  where country ='India'  and district<>'Chista'
    union 
    (select * from TEMP_TESTING where country ='India'   except
    select * from TEMP_TESTING where  country ='India' and state='Punjab')
    union 
    select * from TEMP_TESTING where country ='India'and state='Punjab' and district='Mast'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search