I have a main table called pics
that has an ID number and the origin site from which it was pulled from. I also have another table called tags
that has a list of all the tags that were also pulled from that site. Here’s the kicker, there are known duplicates. The way in which duplicates are known is irrelevant. The duplicates
are also stored in their own table. The first column, "original", is the first instance of it in the "pics" table. The second column is any additional
duplicate. The website searches images based on tags, which can be more than one tag.
(I know having the source site in the pics
table may not be optimal, nor proper database design.) This must work on MySQL/phpMyAdmin.
1. What is the SQL join query that will list all of the image ID numbers when searching for a specific tag? For example:
- Search String:
fire
, Result:[null]
- Search String:
red
, Result:#1
- Search String:
vines
, Result:#2
- Search String:
green
, Result:#2
- Search String:
forest brown
, Result:#2
- Search String:
bird tree
, Result:[null]
2. What is the SQL join query that will merge the image source sites? For example:
- Site(s) for image:
1
, Result:colors.com
- Site(s) for image:
2
, Result:colors.com
nature.com
- Site(s) for image:
4
, Result:nature.com
- (Getting the sites for #5 and #6 doesn’t matter because they would always reference #2 anyway.)
3. OR would it honestly just be easier to not do this complicated join stuff and do simpler join queries and have an automated, scheduled procedure manually merge tags and delete duplicate entries?
Here is the SQL to copy paste into a blank database:
START TRANSACTION;
CREATE TABLE `duplicates` (`original` int(2) NOT NULL,`additional` int(2) NOT NULL);
INSERT INTO `duplicates` (`original`, `additional`) VALUES (2, 5),(2, 6);
CREATE TABLE `pics` (`id` int(2) NOT NULL,`sourceSite` text NOT NULL);
INSERT INTO `pics` (`id`, `sourceSite`) VALUES (1, 'colors.com'),(2, 'colors.com'),(3, 'colors.com'),(4, 'nature.com'),(5, 'nature.com'),(6, 'nature.com');
CREATE TABLE `tags` (`id` int(2) NOT NULL,`tag` varchar(16) NOT NULL);
INSERT INTO `tags` (`id`, `tag`) VALUES (1, 'red'),(1, 'orange'),(2, 'green'),(2, 'brown'),(3, 'blue'),(3, 'cyan'),(4, 'bird'),(4, 'flight'),(5, 'tree'),(5, 'forest'),(6, 'vines'),(6, 'tree');
ALTER TABLE `duplicates` ADD KEY `original` (`original`), ADD KEY `additional` (`additional`);
ALTER TABLE `pics` ADD PRIMARY KEY (`id`);
ALTER TABLE `tags` ADD KEY `id` (`id`);
ALTER TABLE `duplicates` ADD CONSTRAINT `duplicates_ibfk_1` FOREIGN KEY (`original`) REFERENCES `pics` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, ADD CONSTRAINT `duplicates_ibfk_2` FOREIGN KEY (`additional`) REFERENCES `pics` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `tags` ADD CONSTRAINT `tags_ibfk_1` FOREIGN KEY (`id`) REFERENCES `pics` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
COMMIT;
2
Answers
Here’s a graphic representation of your schema. Let’s work with this.
SQL query to list images under a given tag:
That is, given an input tag, e.g.
green
:tags.tag
equal"green"
,duplicates
that are related bytags.id
,duplicates.original
, so that only one duplicate "prevails", thenpics.*
based on the result above.In hindsight, you might want to consider creating foreign key constraints between tables
tags
andduplicates
.SQL query to list source sites of images:
That is, given an input image ID, e.g.
2
:duplicates
that references the image ID2
,pics
associated to any duplicates in the result above, thenIdeally, you want to prune duplicates and normalize tags. But in reality, there might be valid factors for not doing so. In short, it depends.™
Doing this with your current structure is very inefficient but can be done.
Searching the pic IDs based on a set of tags:
Searching for sources of a given pic ID:
If you want to tie both of the above together so you can search for pics based on a set of tags and return the id, sources and full list of tags you can do the following.
First step: get the ids of the images with all of the required tags:
Output:
Second step: get sources for each of the pics returned in previous step:
Output:
Third step: get full list of tags for result of previous step:
Output:
The full query is:
Obviously, this task would be much better served by a normalized structure, without the duplication.
Something like:
With the revised structure the query becomes:
Here’s a db<>fiddle to play with.