I have a strange problem. I am trying to write code that groups strings that end with the longest same substring.
For example, I have a collection of strings:
["samsung.phone.com", "lg.phone.com", "phone.com", "camera.dsrl.nikon.com", "amd.gpu.com", "intel.cpu.com" ]
I want to create a dictionary where they will be grouped by the longest-ending string (excluding the last part – .com).
The dictionary should look like this:
{
"phone.com" : ["lg.phone.com", "samsung.phone.com"],
"camera.dsrl.nikon.com" : [],
"amd.gpu.com": [],
"intel.cpu.com": []
}
And if I add "cpu.com"
to the collection, the new dictionary should look like this:
{
"phone.com" : ["lg.phone.com", "samsung.phone.com"],
"camera.dsrl.nikon.com" : [],
"amd.gpu.com" : [],
"cpu.com": ["intel.cpu.com"]
}
And if I add "hello.samsung.phone.com"
to the collection, the new dictionary should look like this:
{
"phone.com": ["lg.phone.com"],
"samsung.phone.com": ["hello.samsung.phone.com"]
"camera.dsrl.nikon.com": [],
"amd.gpu.com": [],
"cpu.com": ["intel.cpu.com"]
}
Any ideas ?
2
Answers
Not the prettiest code but it works (i think):
x
a key of a dictionarydict
.A
ofx
, find all other elementsB
which haveA
as subdomain.B
to corresponding list indict
.It’s best to step through the above algorithm with breakpoints and console logs to understand what’s happening.
You can do so using a
regex
like the one belowWhat it does is simply splitting the given domain into a subdomain and main domain.
The code would look like this:
The output would be like this: