I have the table below:
create table tablex (fileID int, file varchar(10), name varchar(10));
insert into tablex (fileID, file, name) values (1, 'file1' , 'AAA'),(1, 'file1' , 'AAA'),(1, 'file1' , 'AAA'),(2, 'file2' , 'AAA'),(2, 'file2' , 'AAA'),(2, 'file2' , 'AAA'),(1, 'file1' , 'BBB'),(1, 'file1' , 'BBB'),(2, 'file2' , 'BBB'),(2, 'file2' , 'BBB'),(3, 'file3' , 'BBB'),(3, 'file3' , 'BBB'),(1, 'file1' , 'CCC'),(1, 'file1' , 'CCC');
tablex
fileID | file | name
1 | file1 | AAA
1 | file1 | AAA
1 | file1 | AAA
2 | file2 | AAA
2 | file2 | AAA
2 | file2 | AAA
1 | file1 | BBB
1 | file1 | BBB
2 | file2 | BBB
2 | file2 | BBB
3 | file3 | BBB
3 | file3 | BBB
1 | file1 | CCC
1 | file1 | CCC
I expect to get only the names that appears in more than one file, and then count the number of occurrences in the file with max id, and also the number of occurrences in the other files. Ex.
AAA – 3 occurrences in file2, 3 occurrences in other files
BBB – 2 occurrences in file3, 4 occurrences in other files
I’m trying to use window function (still learning), but not sure if this approach is the best path to go as i had to add a distinct to make it work
select distinct t.name, count(t.name) over (partition by t.name) countAll,
count(t.name) over (partition by t.name, fileId) countLatestFile,
count(t.name) over (partition by t.name) - count(t.name) over (partition by t.name, fileId) countOthers
from tablex t
join (select name from tablex group by name having count(distinct fileId) > 1) tdups
on t.name = tdups.name;
Any other ideas?
3
Answers
Using the window function
dense_rank
, do it like this:First
cte
was used to arrange names by rank according to fileID. Get the latest file from each name using cte2 (having max rank).Then, we join both CTEs to produce the desired result:
Demo here
You may use the max window function and conditional aggregation as the following:
See demo
Possible to do without using a window function: