skip to Main Content

I have a database containing different information from our website. One table (called Raw_Pages) contains a list of every page on our site and the path to it (along with other fields of course). Another table (called Paths) contains a list of various branches of the site that are owned by different departments.

I’m trying to run a query to basically find all pages on the site that do not fall under one of the branches specified.

table Raw_Pages
+-------------------------+--------------+
| Field                   | Type         |
+-------------------------+--------------+
| ID                      | int(11)      |
| Path                    | varchar(500) |
| Title                   | varchar(255) |
+-------------------------+--------------+

table Paths
+----------+--------------+
| Field    | Type         |
+----------+--------------+
| ID       | int(11)      |
| Path     | varchar(255) |
+----------+--------------+

We currently have 64,002 pages I’m checking against 757 paths (All departments have multiple branches due to different ones for different file types). I’m also planning to do a similar query for files, of which we have 306,625 and pulls from the same list of 757 paths. Yes, our site is a giant mess.

From what I can tell, a LEFT JOIN is what would work best for me with a wildcard on the right side. I am a novice at code so I could be far off.

SELECT * FROM Raw_Pages LEFT JOIN Paths ON Raw_Pages.path LIKE CONCAT(Paths.Path,'%') WHERE Paths.ID IS NULL

I’m honestly not sure if the above code works or not since it just freezes phpMyAdmin when I try it. I’m assuming something is wrong in it, or there is a better way.

Thank you!

2

Answers


  1. If you have an index on Paths(Path), you might be able to do:

    select rp.*
    from raw_pages rp
    where not exists (select 1
                      from paths p
                      where p.path <= rp.path and
                            p.path > concat(rp.path, '(')
                     );
    

    It is possible for the subquery to use an index. I’m not sure it will.

    Login or Signup to reply.
  2. If the value of the path field is identical in the two tables you could use:

    SELECT * 
    FROM Raw_Pages AS R
    LEFT JOIN Paths AS P ON (R.path=P.path)
    WHERE R.ID IS NULL
    

    If it matches only the name of the page or a piece of the route

    SELECT * 
    FROM Raw_Pages AS R
    LEFT JOIN Paths AS P ON (R.path LIKE CONCAT('%',P.path,'%'))
    WHERE R.ID IS NULL
    

    You can check this page to verify the type of query you need

    It is good practice to index the path fields in both tables so that the query is faster due to the number of records

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search