skip to Main Content

I have data like this:

Record= [
  { "tid": 1, "token_text": "Canis", "spanid": 1, "label": "Name" },
  { "tid": 2, "token_text": "Familiaris", "spanid": 1, "label": "Name" },
  { "tid": 3, "token_text": "is" },
  { "tid": 4, "token_text": "the" },
  { "tid": 5, "token_text": "scientific" },
  { "tid": 6, "token_text": "name" },
  { "tid": 7, "token_text": "of" },
  { "tid": 8, "token_text": "dog", "spanid": 2, "label": "species" },
  { "tid": 9, "token_text": "." }
] 

I want to create another array of dictionaries like the data where the token_text strings that have the same spanid are joined and the tid and label_name can be the same as that of the first occurrence. I want to do this while keeping the order of the elements in the array the same as that of the Record array. I want to do this in JS/ React. I will appreciate any leads in this matter. The output will look like the following:

Record = [
  { "tid": 1, "token_text": "Canis Familiaris", "spanid": 1, "label": "Name" },
  { "tid": 3, "token_text": "is" },
  { "tid": 4, "token_text": "the" },
  { "tid": 5, "token_text": "scientific" },
  { "tid": 6, "token_text": "name" },
  { "tid": 7, "token_text": "of" },
  { "tid": 8, "token_text": "dog", "spanid": 2, "label": "species" },
  { "tid": 9, "token_text": "." }
]

I tried looping but couldn’t figure out exactly.

for (let j = 0; j < spanid.length; j++) {
  var arr = [];
  Records.forEach(i => {
    if (i.spanid === spanid[j]) {
      arr.push(i.tid);
    }
  })
}

Is there a way to do it like this to join the token_texts with same spanids and delete the instances where that have duplicate spanids?

3

Answers


  1. One solution is to use a for loop and an if statement to iterate over the array and check the spanid value of each object.

    For example:

    for (let i = 0; i < Record.length - 1; i++) {
      if (Record[i].hasOwnProperty("spanid")) {
        if (Record[i].spanid === Record[i + 1].spanid) {
          Record[i].token_text += " " + Record[i + 1].token_text;
          Record.splice(i + 1, 1);
          i--;
        }
      }
    }
    

    You can use a nested loop to handle the cases where there are more than two token_text strings with the same spanid. This way, you can combine all those strings into one with a space between them.

    For example:

    for (let i = 0; i < Record.length - 1; i++) {
      if (Record[i].hasOwnProperty("spanid")) {
        let token_text = Record[i].token_text;
        let spanid = Record[i].spanid;
        for (let j = i + 1; j < Record.length && Record[j].spanid === spanid; j++) {
          token_text += " " + Record[j].token_text;
          Record.splice(j, 1);
          j--;
        }
        Record[i].token_text = token_text;
      }
    }
    
    Login or Signup to reply.
  2. const record = [
      { "tid": 1, "token_text": "Canis", "spanid": 1, "label": "Name" },
      { "tid": 2, "token_text": "Familiaris", "spanid": 1, "label": "Name" },
      { "tid": 3, "token_text": "is" },
      { "tid": 4, "token_text": "the" },
      { "tid": 5, "token_text": "scientific" },
      { "tid": 6, "token_text": "name" },
      { "tid": 7, "token_text": "of" },
      { "tid": 8, "token_text": "dog", "spanid": 2, "label": "species" },
      { "tid": 9, "token_text": "." }
    ];
    
    const transformedRecord = [];
    const spanidSet = new Set();
    
    record.forEach(item => {
      if (item.spanid && !spanidSet.has(item.spanid)) {
        const joinedText = record
          .filter(i => i.spanid === item.spanid)
          .map(i => i.token_text)
          .join(' ');
          
        transformedRecord.push({
          tid: item.tid,
          token_text: joinedText,
          spanid: item.spanid,
          label: item.label
        });
        
        spanidSet.add(item.spanid);
      } else if (!item.spanid) {
        transformedRecord.push(item);
      }
    });
    
    console.log(transformedRecord);
    
    Login or Signup to reply.
  3. You can reduce the array of tokens, and check the previous token’s label against the current.

    In order to append to the previous token_text you need to verify three things:

    1. There is a previous (only false for the first iteration)
    2. The current token has a label
    3. The current and previous token labels match

    Working example

    Note: You could shorten the if-condition (below) to simply:

    if (token.label && token.label === prev?.label) {
    
    const record = [
      { "tid": 1, "token_text": "Canis", "spanid": 1, "label": "Name" },
      { "tid": 2, "token_text": "Familiaris", "spanid": 1, "label": "Name" },
      { "tid": 3, "token_text": "is" },
      { "tid": 4, "token_text": "the" },
      { "tid": 5, "token_text": "scientific" },
      { "tid": 6, "token_text": "name" },
      { "tid": 7, "token_text": "of" },
      { "tid": 8, "token_text": "dog", "spanid": 2, "label": "species" },
      { "tid": 9, "token_text": "." }
    ];
    
    const joinTokensByLabel = (tokens) => {
      return tokens.reduce((result, token) => {
        const prev = result[result.length - 1];
        if (prev && token.label != null && token.label === prev.label) {
          prev.token_text += ` ${token.token_text}`;
        } else {
          result.push(structuredClone(token));
        }
        return result;
      }, []);
    };
    
    const fixed = joinTokensByLabel(record);
    
    console.log(fixed);
    .as-console-wrapper { top: 0; max-height: 100% !important; }
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search