I have the following data in my table:
Name | Age | Shirt Size | User Id
-------------------------------------------------------------
Jon Doe | 12 | L | 3
-------------------------------------------------------------
Jonathan Doe | 13 | XL | 3
-------------------------------------------------------------
John Doe | 14 | XL | 3
-------------------------------------------------------------
Jane Doe | 12 | M | 3
-------------------------------------------------------------
All the above are family members of the same users provided by the customer over the last few years. Is there any way to get predictably get unique family members with the most recent data? We are trying to consolidate the data so we have good recent data and remove the old ones.
The output I am expecting is the following:
Name | Age | Shirt Size | User Id
-------------------------------------------------------------
Jonathan Doe | 14 | XL | 3
-------------------------------------------------------------
Jane Doe | 12 | M | 3
-------------------------------------------------------------
I am not quite sure where to start. The data is in a MySQL table, and I am trying to see what the best approach would be to do this. I don’t know if this would be a good candidate for AI to get the proper data.
2
Answers
You Can Simply Use the distinct method in this….
ModelName::distinct()->get(ColumnName);
As mentioned in the comments by @Shadow, there’s no 100% reliable way to determine if any one name matches another.
However, I assume
User Id
is the ID of the user each record is related to?In which case grouping your data by
User Id
and group concatenating the name (and potentially) age fields will narrow the scope since users are likely to have limited set of family members.If you also include a count, you can immediately eliminate any users with a single family member.
Then you can apply some matching techniques. Maybe consider using a dictionary of common name variations such as:
https://github.com/carltonnorthern/nicknames
Also if you have timestamps on the records you may be able to follow the incrementing of the ages in line with the timestamps, that could serve as a verification if nothing else.
You many be able to eliminate a large proportion of your records but I would expect that you’ll need to do at least some manual combing and normalising.
EDIT:
I’m assuming you can pass the data through a program rather than trying to do this all in MySQL. It’s possible it can be done but once you group concat names working through the grouped values will be tricky in the DB. I’m envisioning a function that explodes your concatenated sets and loops through each, comparing the names via the dictionary