I have a database containing different information from our website. One table (called Raw_Pages) contains a list of every page on our site and the path to it (along with other fields of course). Another table (called Paths) contains a list of various branches of the site that are owned by different departments.
I’m trying to run a query to basically find all pages on the site that do not fall under one of the branches specified.
table Raw_Pages
+-------------------------+--------------+
| Field | Type |
+-------------------------+--------------+
| ID | int(11) |
| Path | varchar(500) |
| Title | varchar(255) |
+-------------------------+--------------+
table Paths
+----------+--------------+
| Field | Type |
+----------+--------------+
| ID | int(11) |
| Path | varchar(255) |
+----------+--------------+
We currently have 64,002 pages I’m checking against 757 paths (All departments have multiple branches due to different ones for different file types). I’m also planning to do a similar query for files, of which we have 306,625 and pulls from the same list of 757 paths. Yes, our site is a giant mess.
From what I can tell, a LEFT JOIN is what would work best for me with a wildcard on the right side. I am a novice at code so I could be far off.
SELECT * FROM Raw_Pages LEFT JOIN Paths ON Raw_Pages.path LIKE CONCAT(Paths.Path,'%') WHERE Paths.ID IS NULL
I’m honestly not sure if the above code works or not since it just freezes phpMyAdmin when I try it. I’m assuming something is wrong in it, or there is a better way.
Thank you!
2
Answers
If you have an index on
Paths(Path)
, you might be able to do:It is possible for the subquery to use an index. I’m not sure it will.
If the value of the path field is identical in the two tables you could use:
If it matches only the name of the page or a piece of the route
You can check this page to verify the type of query you need
It is good practice to index the path fields in both tables so that the query is faster due to the number of records