skip to Main Content

I have a strange problem. I am trying to write code that groups strings that end with the longest same substring.

For example, I have a collection of strings:

["samsung.phone.com", "lg.phone.com", "phone.com", "camera.dsrl.nikon.com", "amd.gpu.com", "intel.cpu.com" ]

I want to create a dictionary where they will be grouped by the longest-ending string (excluding the last part – .com).

The dictionary should look like this:

{
"phone.com" : ["lg.phone.com", "samsung.phone.com"],
"camera.dsrl.nikon.com" : [], 
"amd.gpu.com": [], 
"intel.cpu.com": []
}

And if I add "cpu.com" to the collection, the new dictionary should look like this:

{
"phone.com" : ["lg.phone.com", "samsung.phone.com"],
"camera.dsrl.nikon.com" : [], 
"amd.gpu.com" : [], 
"cpu.com": ["intel.cpu.com"]
}

And if I add "hello.samsung.phone.com" to the collection, the new dictionary should look like this:

{
"phone.com": ["lg.phone.com"],
"samsung.phone.com": ["hello.samsung.phone.com"]
"camera.dsrl.nikon.com": [], 
"amd.gpu.com": [], 
"cpu.com": ["intel.cpu.com"]
}

Any ideas ?

2

Answers


  1. Not the prettiest code but it works (i think):

    • Make each element of the initial list x a key of a dictionary dict.
    • For each element A of x, find all other elements B which have A as subdomain.
    • Push B to corresponding list in dict.
    • Deal with corner cases.
      const x = [
        "samsung.phone.com",
        "lg.phone.com",
        "phone.com",
        "camera.dsrl.nikon.com",
        "amd.gpu.com",
        "intel.cpu.com",
        "cpu.com",
        "hello.samsung.phone.com"
      ];
      const dict = {};
      x.forEach((el) => (dict[el] = []));
      // console.log(dict);
      
      for (let i = 0; i < x.length; i++) {
        for (let j = 0; j < x.length; j++) {
          if (i !== j) {
            const subdomain = x[j].substring(x[j].indexOf(".") + 1);
            if (subdomain === x[i]) {
              // x[i] is the subdomain of x[j]
              dict[x[i]].push(x[j]);
            }
          }
        }
      }
      
      let deleteKeys = [];
      for (const [key, value] of Object.entries(dict)) {
        deleteKeys = [...deleteKeys, ...value];
      }
      
      deleteKeys.forEach((x) => {
        if (dict[x].length == 0) delete dict[x];
      });
      
      const toRemove = Object.keys(dict);
      for (const [key, value] of Object.entries(dict)) {
        dict[key] = value.filter(function(el) {
          return !toRemove.includes(el);
        });
      }
      
      console.log(dict);

    It’s best to step through the above algorithm with breakpoints and console logs to understand what’s happening.

    Login or Signup to reply.
  2. You can do so using a regex like the one below

    (?<!.)([^.]+)(?=.).(.*)
    

    What it does is simply splitting the given domain into a subdomain and main domain.

    The code would look like this:

    const regex = /(?<!.)([^.]+)(?=.).(.*)/
    
    const list = ["samsung.phone.com", "lg.phone.com", "phone.com", "camera.dsrl.nikon.com", "amd.gpu.com", "intel.cpu.com"];
    
    const result = {};
    
    list.forEach(d => {
      const matches = d.match(regex);
    
      // matches[1] is the subdomain and 
      // matches[2] is the main domain
      // if an entry with the given main domain exists, 
      // just push the new one to it, 
      // else create a new entry
      result[matches[2]] ? result[matches[2]].push(d) : result[matches[2]] = [d]
    })
    
    console.log(result);

    The output would be like this:

    {
        phone.com:["samsung.phone.com", "lg.phone.com"]
        com:["phone.com"]
        ...
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search