skip to Main Content

I have a text with Shamsi date like this:

Pathology report on 01.09.1402 (22.11.2023): Baso-squamous carcinoma in right thigh skin.
Surgical pathology report on 30.03.1403, Multiple lymphoid tissue involved by metastatic epithelial tumor of right inguinal mass.

I want to extract all the dates in an array.

How would you do that?

So far we have this:

const text = `Pathology report on 01.09.1402: Baso-squamous carcinoma in right thigh skin. Surgical pathology report on 30.03.1403, Multiple lymphoid tissue involved by metastatic epithelial tumor of right inguinal mass.`

console.log(getDate(text));

function getDate(d) {
  var day, month, year;

  result = d.match("[0-9]{2}([-/ .])[0-9]{2}[-/ .][0-9]{4}");
  if (null != result) {
    dateSplitted = result[0].split(result[1]);
    day = dateSplitted[0];
    month = dateSplitted[1];
    year = dateSplitted[2];
  }
  result = d.match("[0-9]{4}([-/ .])[0-9]{2}[-/ .][0-9]{2}");
  if (null != result) {
    dateSplitted = result[0].split(result[1]);
    day = dateSplitted[2];
    month = dateSplitted[1];
    year = dateSplitted[0];
  }

  if (month > 12) {
    aux = day;
    day = month;
    month = aux;
  }

  return `${day}.${month}.${year}`;
  
  
}

3

Answers


  1. First you need to have actual dates. 1402 is not a proper (Shamsi) year

    Then you need to loop to get all of them

    function getAllDates(d) {
      let dates = [];
      const regex = /[0-9]{2}([-/ .])[0-9]{2}1[0-9]{4}/g;
      let match;
    
      while ((match = regex.exec(d)) !== null) {
        let dateSplitted = match[0].split(match[1]);
        let day = dateSplitted[0];
        let month = dateSplitted[1];
        let year = dateSplitted[2];
    
        // Swap day and month if necessary
        if (parseInt(month) > 12) {
          let aux = day;
          day = month;
          month = aux;
        }
    
        dates.push(`${day}.${month}.${year}`);
      }
    
      return dates;
    }
    
    const text = `Pathology report on 01.09.2023: Baso-squamous carcinoma in right thigh skin. Surgical pathology report on 30.03.2023, Multiple lymphoid tissue involved by metastatic epithelial tumor of right inguinal mass.`;
    
    console.log(getAllDates(text));
    Login or Signup to reply.
  2. So when you want to extract information from a text, and it follows a certain pattern using regex expression, makes sense.

    So what I have done is create an array called dates so I can use it to store the information you want which is the date.
    Then I look at the regex docs to create a regex_pattern which matches your text, I called it regex not a good name you might need to change it

    the regex.exec(interesting_date) is used to find each date match within the string you provided. Then we just simply return dates:

    I have provided a code snippet which also takes care of date and month

    const text = `Pathology report on 01.09.1402 (22.11.2023): Baso-squamous carcinoma in right thigh skin. Surgical pathology report on 30.03.1403, Multiple lymphoid tissue involved by metastatic epithelial tumor of right inguinal mass.`;
    
    console.log(getDates(text));
    
    function getDates(interesting_date) {
      const dates = [];
      const regex = /bd{2}[-/ .]d{2}[-/ .]d{4}b/g; // Regex to match all date formats
    
      let match;
      while ((match = regex.exec(interesting_date)) !== null) {
        dates.push(match[0]); // Add each matched date to the array
      }
    
      return dates;
    }
    

    The final result is this:

    [ '01.09.1402', '22.11.2023', '30.03.1403' ]
    
    Login or Signup to reply.
  3. The easiest way to achieve this would be calling String.prototype.match with a global (g) match on the following pattern: d{2}.d{2}.d{4}.

    I separated the extraction from the formatting so that they are not coupled.

    const text = "Pathology report on 01.09.1402 (22.11.2023): Baso-squamous carcinoma in right thigh skin. Surgical pathology report on 30.03.1403, Multiple lymphoid tissue involved by metastatic epithelial tumor of right inguinal mass.";
    
    const extractDates = (textStr) =>
      textStr.match(/d{2}.d{2}.d{4}/g).map(dateStr =>
        (([date, month, year]) => new Date(year, month - 1, date))
        (dateStr.split(/./g)));
    
    // Close to date.toLocaleDateString('de-DE') with explicit padding
    const formatDate = (dateObj) => {
      const day = String(dateObj.getDate()).padStart(2, '0');
      const month = String(dateObj.getMonth() + 1).padStart(2, '0');
      const year = dateObj.getFullYear();
      return `${day}.${month}.${year}`;
    }
    
    const extractedDates = extractDates(text).map(formatDate);
    
    console.log(extractedDates);
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search