skip to Main Content

I am trying to create a view to fetch data from a bucket by excluding certain folders inside S3 on Hive. I was able to successfully create view on Athena, but couldn’t do the same on Hive.

Athena View:

CREATE VIEW test
as
SELECT *
FROM TABLE_A
WHERE NOT ("$PATH LIKE '%PASSENGER_DATA%')
AND NOT ("$PATH LIKE '%CUSTOMER_DATA%');

Could you please advise how the same could be achieved on Hive?

2

Answers


  1. There isn’t the same facility to filter by path. However, depending on what version you are using you could use Ranger to exclude the data so it wasn’t shown.

    If you must do it by view try using:

    CREATE TABLE filter_out [blah blah blah]
    LOCATION '%CUSTOMER_DATA%'
    
    SELECT *
    FROM TABLE_A
    WHERE NOT EXISTS (SELECT ID FROM filter_out WHERE TABLE_A.ID = filter_out.ID)
    
    Login or Signup to reply.
  2. You may actually wish to consider moving the data into its own folders. Then you could build them as a table with:

    CREATE VIEW TABLE_DATA
    as
    SELECT *
    FROM TABLE_A --("$PATH LIKE '%PASSENGER_DATA%')
    UNION
    SELECT *
    FROM TABLE_B -- ("$PATH LIKE '%CUSTOMER_DATA%');
    

    THis likely will also make your permission issues easier to manage.

    And when needed you could easily use one table or both tables.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search