skip to Main Content

I need to pull values from a JSON object that’s within a script tag in an HTML file. The HTML is actually an email (.eml) file.

I am using node’s "fs" module to read the file and that works fine. And, generally, I know how to select HTML elements (using document.getElementById, innerHTML, etc) and how to work my way through JSON object hierarchies to select values (using JSON.parse and dot notation, etc). But, I’m not sure how to go about selecting values from within code like this.

X-Account-Key: account31
X-UIDL: 00001b5f073425
X-Mozilla-Status: 0000
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
... more email header info ...
<html lang=3D"en-US"> <head> </head> <body> <div>  <script data-scope=3D"in=
boxmarkup" type=3D"application/json">{
  "api_version": "1.0",
  "publisher": {
    "api_key": "67892787u2cfedea31b225240gg3423t9",
    "name": "Google Alerts"
  },
  "cards": [ {
    "title": "Google Alert - "search keywords"",
    "subtitle": "Highlights from the latest email",
    "actions":
... and so on with JSON object, then closing script tag...
... email body wrapped in DIV tag ...

What if I want to grab publisher.name or any other property’s value from this code?

Any and all pointers appreciated.

2

Answers


  1. Chosen as BEST ANSWER

    This is a supplementary answer, built on that of @t-j-crowder. It's what I was shooting for, pulling key data from a nice and neat object in a Google Alert .eml file, rather than scraping the messy HTML of the email itself. If there's already an object in there, why not make use of it?

    Check out the "OUTPUT" comment at the end of the JS below to really see what I was going for with this.

    If you want to test it yourself, save both the javascript and the example email code below to separate files. And you'll need to install two NPM packages: mailparser and JSDOM.

    import fs from 'fs/promises';
    import path from 'path';
    import { fileURLToPath } from 'url';
    import { simpleParser } from 'mailparser';
    import { JSDOM } from 'jsdom';
    
    const __filename = fileURLToPath(import.meta.url);
    const __dirname = path.dirname(__filename);
    
    const alertInfoArr = [];
    
    const mailText = await fs.readFile(
      `${__dirname}/GoogleAlert-chatgtp+kenya_2023-01-20.eml`
    );
    const email = await simpleParser(mailText);
    const dom = new JSDOM(email.html);
    const script = dom.window.document.querySelector(
      "script[type='application/json']"
    );
    const json = script.textContent;
    const obj = JSON.parse(json);
    const alertKey = obj.entity.external_key;
    const targetKey = alertKey.replace('Google Alert - ', '').replaceAll('"', '');
    const alertDate = obj.entity.subtitle;
    const targetDate = alertDate.replace('Latest: ', '');
    const urlsParent = obj.cards[0].widgets;
    await urlsParent.map((obj) => {
      const targetTitle = obj.title;
      const targetDescription = obj.description;
      const redirectURL = obj.url;
      const urlParam = new URL(redirectURL).searchParams;
      const targetURL = urlParam.get('url');
      const newObject = {
        key: `${targetKey}`,
        title: `${targetTitle}`,
        description: `${targetDescription}`,
        url: `${targetURL}`,
        date: `${targetDate}`,
      };
      alertInfoArr.push(newObject);
    });
    console.log(alertInfoArr);
    
    /*
    OUTPUT:
    [
      {
        key: 'chatgtp + kenya',
        title: 'Mentally scarred: Kenyan workers taught ChatGPT to recognize offensive text - The Register',
        description: 'OpenAI reportedly hired workers in Kenya – screening tens of thousands of text samples for sexist, racist, violent and pornographic content – to ...',
        url: 'https://www.theregister.com/2023/01/20/kenyan_workers_chatgpt/',
        date: 'January 21, 2023'
      },
      {
        key: 'chatgtp + kenya',
        title: 'Unethical outsourcing: ChatGPT uses Kenyan workers for traumatic moderation - The Brussels Times',
        description: 'Unethical outsourcing: ChatGPT uses Kenyan workers for traumatic moderation. Credit: The Brussels Times. The artificial intelligence (AI) ...',
        url: 'https://www.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-uses-kenyan-workers-for-traumatic-moderation',
        date: 'January 21, 2023'
      }
    ]
    */
    

    Example Google Alert code:

    X-Account-Key: account11
    X-UIDL: UID6723-1602672813
    X-Mozilla-Status: 0001
    X-Mozilla-Status2: 00000000
    X-Mozilla-Keys:                                                                                 
    Return-Path: <[email protected]>
    Delivered-To: [email protected]
    Received: from ema.email.com
        by ema.email.com with LMTP
        id WEWiHN9uy2PWEAAAMqKFlg
        (envelope-from <[email protected]>)
        for <[email protected]>; Sat, 21 Jan 2023 04:49:35 +0000
    Return-path: <3sG7LYxQKBt8HPPFPDOS82971PMJDFGJMFT.PSH@alerts.bounces.google.com>
    Envelope-to: [email protected]
    Delivery-date: Sat, 21 Jan 2023 04:49:35 +0000
    Received: from mail-yb1-f199.google.com ([209.85.219.199]:54213)
        by ema.email.com with esmtps  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
        (Exim 4.94.2)
        (envelope-from <3sG7lYxQKBt8HPPFPPDPOS82971PMJDFGJMFT.PSH@alerts.bounces.google.com>)
        id 1pJ5on-00016q-Lv
        for [email protected]; Sat, 21 Jan 2023 04:49:35 +0000
    Received: by mail-yb1-f199.google.com with SMTP id a4-20020a5b0004008800b006fdc6aaec4fso7354172ybp.20
            for <[email protected]>; Fri, 20 Jan 2023 20:49:09 -0800 (PST)
    DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
            d=google.com; s=20210112;
            h=to:from:subject:message-id:list-unsubscribe:list-id:date
             :mime-version:from:to:cc:subject:date:message-id:reply-to;
            bh=wlbb4h1OkKGMEGEHyfSp/gOrY346qC9WPsNFLv7aoDA=;
    X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
            d=1e100.net; s=20210112;
            h=to:from:subject:message-id:list-unsubscribe:list-id:date
             :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
             :reply-to;
            bh=wlbb4h1OkKGMEGEHyfSp/gOrY346qC9WPsNFLv7aoDA=;
    X-Gm-Message-State: AFqh2kol4r/6gHBIlaMH2MFhzhXz5s7Abaw3vI8srl50X2GjsiTwk5c+
        CzWDFrWOIPE=
    X-Google-Smtp-Source: AMrXdXspIcFsq82rJ65AFyIIPUkY3GzreaIQgx8qoU7HItw+z4fWV9Yrbd/77PIoAH2/gmr+ZP4=
    MIME-Version: 1.0
    X-Received: by 2002:a81:7c88:0:b0:4eb:2b95:a29e with SMTP id
     x130-20020a817c880078200b004eb2b95a29emr2069504ywc.241.1674249828593; Fri, 20
     Jan 2023 20:48:48 -0800 (PST)
    Date: Fri, 20 Jan 2023 20:48:48 -0800
    List-Id: <12791515946235186142.alerts.google.com>
    List-Unsubscribe: <mailto:[email protected]?subject=AB2Xq4h5F1WWKAFWbW6o-Oo5IIJup1CEAsz2RPc>
    Message-ID: <[email protected]>
    Subject: Google Alert - chatgtp + kenya
    From: Google Alerts <[email protected]>
    To: [email protected]
    Content-Type: multipart/alternative; boundary="000000000000be1c6035f2beef1b2"
    X-Spam-Status: No, score=-7.7
    X-Spam-Score: -76
    X-Spam-Bar: -------
    X-Ham-Report: Spam detection software, running on the system "ema.email.com",
     has NOT identified this incoming email as spam.  The original
     message has been attached to this so you can view it or label
     similar future email.  If you have any questions, see
     root@localhost for details.
     Content preview:  === News - 2 new results for [chatgtp + kenya] === Mentally
        scarred: Kenyan workers taught ChatGPT to recognize offensive text - The
       Register The Register OpenAI reportedly hired workers in Kenya – screening
        tens of thousands of text samples for sex [...] 
     Content analysis details:   (-7.7 points, 5.0 required)
      pts rule name              description
     ---- ---------------------- --------------------------------------------------
      0.0 URIBL_BLOCKED          ADMINISTRATOR NOTICE: The query to URIBL was
                                 blocked.  See
                                 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
                                  for more information.
                                 [URIs: brusselstimes.com]
     -7.5 USER_IN_DEF_DKIM_WL    From: address is in the default DKIM
                                 welcome-list
     -0.0 SPF_PASS               SPF: sender matches SPF record
      0.0 HTML_MESSAGE           BODY: HTML included in message
     -0.1 DKIM_VALID             Message has at least one valid DKIM or DK signature
      0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily
                                 valid
     -0.1 DKIM_VALID_EF          Message has a valid DKIM or DK signature from
                                 envelope-from domain
     -0.1 DKIM_VALID_AU          Message has a valid DKIM or DK signature from
                                 author's domain
    X-Spam-Flag: NO
    
    --000000000000be1c6035f2beef1b2
    Content-Type: text/plain; charset="UTF-8"; format=flowed; delsp=yes
    Content-Transfer-Encoding: base64
    
    PT09IE5ld3MgLSAyIG5ldyByZXN1bHRzIGZvciBbY2hhdGd0cCArIGtlbnlhXSA9PT0NCg0KTWVu
    dGFsbHkgc2NhcnJlZDogS2VueWFuIHdvcmtlcnMgdGF1Z2h0IENoYXRHUFQgdG8gcmVjb2duaXpl
    IG9mZmVuc2l2ZSB0ZXh0DQotIFRoZSBSZWdpc3Rlcg0KVGhlIFJlZ2lzdGVyDQpPcGVuQUkgcmVw
    b3J0ZWRseSBoaXJlZCB3b3JrZXJzIGluIEtlbnlhIOKAkyBzY3JlZW5pbmcgdGVucyBvZiB0aG91
    c2FuZHMgb2YNCnRleHQgc2FtcGxlcyBmb3Igc2V4aXN0LCByYWNpc3QsIHZpb2xlbnQgYW5kIHBv
    cm5vZ3JhcGhpYyBjb250ZW50IOKAkyB0byAuLi4NCjxodHRwczovL3d3dy5nb29nbGUuY29tL3Vy
    bD9yY3Q9aiZzYT10JnVybD1odHRwczovL=
    --000000000000be1c6035f2beef1b2
    Content-Type: text/html; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable
    
    <html lang=3D"en-US"> <head> </head> <body> <div>  <script data-scope=3D"in=
    boxmarkup" type=3D"application/json">{
      "api_version": "1.0",
      "publisher": {
        "api_key": "668269e72cfedea31b22524041ff21d9",
        "name": "Google Alerts"
      },
      "entity": {
        "external_key": "Google Alert - chatgtp + kenya",
        "title": "Google Alert - chatgtp + kenya",
        "subtitle": "Latest: January 21, 2023",
        "avatar_image_url": "https://www.gstatic.com/images/branding/product/1x=
    /gsa_512dp.png",
        "main_image_url": "https://www.gstatic.com/bt/C3341AA7A1A076756462EE2E5=
    CD71C11/smartmail/mobile/il_newspaper_header_r1.png"
      },
      "updates": {
        "snippets": [ {
          "icon": "BOOKMARK",
          "message": "Mentally scarred: Kenyan workers taught ChatGPT to recogn=
    ize offensive text - The Register"
        }, {
          "icon": "BOOKMARK",
          "message": "Unethical outsourcing: ChatGPT uses Kenyan workers for tr=
    aumatic moderation - The Brussels Times"
        } ]
      },
      "cards": [ {
        "title": "Google Alert - chatgtp + kenya",
        "subtitle": "Highlights from the latest email",
        "actions": [ {
          "name": "See more results",
          "url": "https://www.google.com/alerts"
        } ],
        "widgets": [ {
          "type": "LINK",
          "title": "Mentally scarred: Kenyan workers taught ChatGPT to recogniz=
    e offensive text - The Register",
          "description": "OpenAI reportedly hired workers in Kenya =E2=80=93 sc=
    reening tens of thousands of text samples for sexist, racist, violent and p=
    ornographic content =E2=80=93 to ...",
          "url": "https://www.google.com/url?rct=3Dju0026sa=3Dtu0026url=3Dhtt=
    ps://www.theregister.com/2023/01/20/kenyan_workers_chatgpt/u0026ct=3Dgau0=
    026cd=3DCAEYACoUMTI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE1ZTNiYzdlMDE5MGM6Y29tOmV=
    uOlVTu0026usg=3DAOvVaw2yLGqNbNV5mcqGgXZhgz1S"
        }, {
          "type": "LINK",
          "title": "Unethical outsourcing: ChatGPT uses Kenyan workers for trau=
    matic moderation - The Brussels Times",
          "description": "Unethical outsourcing: ChatGPT uses Kenyan workers fo=
    r traumatic moderation. Credit: The Brussels Times. The artificial intellig=
    ence (AI) ...",
          "url": "https://www.google.com/url?rct=3Dju0026sa=3Dtu0026url=3Dhtt=
    ps://www.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-us=
    es-kenyan-workers-for-traumatic-moderationu0026ct=3Dgau0026cd=3DCAEYASoUM=
    TI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE1ZTNiYzdlMDE5MGM6Y29tOmVuOlVTu0026usg=3D=
    AOvVaw1vnDYspAyAx44Qw2AVhZCG"
        } ]
      } ]
    }
    </script> <!--[if mso]>
     <table><tr><td width=3D650>
    <![endif]-->
     <div style=3D"width:100%;max-width:650px"> <div style=3D"font-family:Arial=
    "> <table style=3D"border-collapse:collapse;border-left:1px solid #e4e4e4;b=
    order-right:1px solid #e4e4e4"> <tr> <td style=3D"background-color:#f8f8f8;=
    padding-left:18px;border-bottom:1px solid #e4e4e4;border-top:1px solid #e4e=
    4e4"></td> <td valign=3D"middle" style=3D"padding:13px 10px 8px 0px;backgro=
    und-color:#f8f8f8;border-top:1px solid #e4e4e4;border-bottom:1px solid #e4e=
    4e4"> <a href=3D"https://www.google.com/alerts?source=3Dalertsmail&amp;hl=
    =3Den&amp;gl=3DUS&amp;msgid=3DMTI3PQR4MjEyNzcxODk4zM19I4ODI" style=3D"text-de=
    coration:none"> <img src=3D"https://www.google.com/intl/en_us/alerts/logo.p=
    ng?cd=3DKhQxMjc0NDgyMTI3NzE4OTgzMjg4Mg" alt=3D"Google" border=3D"0" height=
    =3D"25"> </a> </td> <td style=3D"background-color:#f8f8f8;padding-left:18px=
    ;border-top:1px solid #e4e4e4;border-bottom:1px solid #e4e4e4"></td> </tr> =
     <tr> <td style=3D"padding-left:32px"></td> <td style=3D"padding:18px 0px 0=
    px 0px;vertical-align:middle;line-height:20px;font-family:Arial"> <span sty=
    le=3D"color:#262626;font-size:22px">chatgtp + kenya</span> <div style=3D"ve=
    rtical-align:top;padding-top:6px;color:#aaa;font-size:12px;line-height:16px=
    "> <span>As-it-happens update</span> <span style=3D"padding:0px 4px 0px 4px=
    ">&sdot;</span> <a style=3D"color:#aaa;text-decoration:none">January 21, 20=
    23</a> </div> </td> <td style=3D"padding-left:32px"></td> </tr>  <tr> <td s=
    tyle=3D"padding-left:18px"></td> <td style=3D"padding:16px 0px 12px 0px;bor=
    der-bottom:1px solid #e4e4e4"> <span style=3D"font-size:12px;color:#737373"=
    > NEWS </span> </td> <td style=3D"padding-right:18px"></td> </tr>   <tr ite=
    mscope=3D"" itemtype=3D"http://schema.org/Article"> <td style=3D"padding-le=
    ft:18px"></td> <td style=3D"padding:18px 0px 12px 0px;vertical-align:top;fo=
    nt-family:Arial"> <a></a> <div>  <span style=3D"padding:0px 6px 0px 0px"> <=
    a href=3D"https://www.google.com/url?rct=3Dj&amp;sa=3Dt&amp;url=3Dhttps://w=
    ww.theregister.com/2023/01/20/kenyan_workers_chatgpt/&amp;ct=3Dga&amp;cd=3D=
    CAEYACoUMTI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE1ZTNiYzdlMDE5MGM6Y29tOmVuOlVT&am=
    p;usg=3DAOvVaw2yLGqNbNV5mcqGgXZhgz1S" itemprop=3D"url" style=3D"color:#427f=
    ed;display:inline;text-decoration:none;font-size:16px;line-height:20px"> <s=
    pan itemprop=3D"name">Mentally scarred: <b>Kenyan</b> workers taught <b>Cha=
    tGPT</b> to recognize offensive text - The Register</span> </a> </span>  <d=
    iv> <div style=3D"padding:2px 0px 8px 0px"> <div itemprop=3D"publisher" ite=
    mscope=3D"" itemtype=3D"http://schema.org/Organization" style=3D"color:#737=
    373;font-size:12px"> <a style=3D"text-decoration:none;color:#737373"> <span=
     itemprop=3D"name">The Register</span> </a> </div> <div itemprop=3D"descrip=
    tion" style=3D"color:#252525;padding:2px 0px 0px 0px;font-size:12px;line-he=
    ight:18px">OpenAI reportedly hired workers in <b>Kenya</b> =E2=80=93 screen=
    ing tens of thousands of text samples for sexist, racist, violent and porno=
    graphic content =E2=80=93 to&nbsp;...</div> </div>   <table> <tr> <td width=
    =3D"16" style=3D"padding-right:6px"> <a href=3D"https://www.google.com/aler=
    ts/share?hl=3Den&amp;gl=3DUS&amp;ru=3Dhttps://www.theregister.com/2023/01/2=
    0/kenyan_workers_chatgpt/&amp;ss=3Dfb&amp;rt=3DMentally+scarred:+Kenyan+wor=
    kers+taught+ChatGPT+to+recognize+offensive+text+-+The+Register&amp;cd=3DKhQ=
    xMjc0NDgyMTI3NzE4OTgzMjg4MjIaOWZlMTVlM2JjN2UwMTkwYzpjb206ZW46VVM&amp;ssp=3D=
    AMJHsmVmDUYq_zvMZ9c1AgtGcEDDviq6ng" style=3D"text-decoration:none"> <img al=
    t=3D"Facebook" src=3D"https://www.gstatic.com/alerts/images/fb-24.png" bord=
    er=3D"0" height=3D"16" width=3D"16"></a> </td> <td width=3D"16" style=3D"pa=
    dding-right:6px"> <a href=3D"https://www.google.com/alerts/share?hl=3Den&am=
    p;gl=3DUS&amp;ru=3Dhttps://www.theregister.com/2023/01/20/kenyan_workers_ch=
    atgpt/&amp;ss=3Dtw&amp;rt=3DMentally+scarred:+Kenyan+workers+taught+ChatGPT=
    +to+recognize+offensive+text+-+The+Register&amp;cd=3DKhQxMjc0NDgyMTI3NzE4OT=
    gzMjg4MjIaOWZlMTVlM2JjN2UwMTkwYzpjb206ZW46VVM&amp;ssp=3DAMJHsmVmDUYq_zvMZ9c=
    1AgtGcEDDviq6ng" style=3D"text-decoration:none"> <img alt=3D"Twitter" src=
    =3D"https://www.gstatic.com/alerts/images/tw-24.png" border=3D"0" height=3D=
    "16" width=3D"16"></a> </td> <td style=3D"padding:0px 0px 6px 15px;font-fam=
    ily:Arial"> <a href=3D"https://www.google.com/alerts/feedback?ffu=3Dhttps:/=
    /www.theregister.com/2023/01/20/kenyan_workers_chatgpt/&amp;source=3Dalerts=
    mail&amp;hl=3Den&amp;gl=3DUS&amp;msgid=3DMTI3PQR4MjEyNzcxODk4zM19I4ODI&amp;s=
    =3DAB2Xq4h5F1WWKAFWbW6o-Oo5IIJup1CEAsz2RPc" style=3D"text-decoration:none;v=
    ertical-align:middle;color:#aaa;font-size:10px"> Flag as irrelevant </a> </=
    td> </tr> </table>  </div> </div> </td> <td style=3D"padding-right:18px"></=
    td> </tr>    <tr itemscope=3D"" itemtype=3D"http://schema.org/Article"> <td=
     style=3D"padding-left:18px"></td> <td style=3D"padding:18px 0px 12px 0px;v=
    ertical-align:top;border-top:1px solid #e4e4e4;font-family:Arial"> <a></a> =
    <div>  <span style=3D"padding:0px 6px 0px 0px"> <a href=3D"https://www.goog=
    le.com/url?rct=3Dj&amp;sa=3Dt&amp;url=3Dhttps://www.brusselstimes.com/busin=
    ess/355283/unethical-outsourcing-chatgpt-uses-kenyan-workers-for-traumatic-=
    moderation&amp;ct=3Dga&amp;cd=3DCAEYASoUMTI3NDQ4MjEyNzcxODk4MzI4ODIyGjlmZTE=
    1ZTNiYzdlMDE5MGM6Y29tOmVuOlVT&amp;usg=3DAOvVaw1vnDYspAyAx44Qw2AVhZCG" itemp=
    rop=3D"url" style=3D"color:#427fed;display:inline;text-decoration:none;font=
    -size:16px;line-height:20px"> <span itemprop=3D"name">Unethical outsourcing=
    : <b>ChatGPT</b> uses <b>Kenyan</b> workers for traumatic moderation - The =
    Brussels Times</span> </a> </span>  <div> <div style=3D"padding:2px 0px 8px=
     0px"> <div itemprop=3D"publisher" itemscope=3D"" itemtype=3D"http://schema=
    .org/Organization" style=3D"color:#737373;font-size:12px"> <a style=3D"text=
    -decoration:none;color:#737373"> <span itemprop=3D"name">The Brussels Times=
    </span> </a> </div> <div itemprop=3D"description" style=3D"color:#252525;pa=
    dding:2px 0px 0px 0px;font-size:12px;line-height:18px">Unethical outsourcin=
    g: <b>ChatGPT</b> uses <b>Kenyan</b> workers for traumatic moderation. Cred=
    it: The Brussels Times. The artificial intelligence (AI)&nbsp;...</div> </d=
    iv>   <table> <tr> <td width=3D"16" style=3D"padding-right:6px"> <a href=3D=
    "https://www.google.com/alerts/share?hl=3Den&amp;gl=3DUS&amp;ru=3Dhttps://w=
    ww.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-uses-ken=
    yan-workers-for-traumatic-moderation&amp;ss=3Dfb&amp;rt=3DUnethical+outsour=
    cing:+ChatGPT+uses+Kenyan+workers+for+traumatic+moderation+-+The+Brussels+T=
    imes&amp;cd=3DKhQxMjcDONgyITM3Nz4E0TgzMjg4MjIaOlWJMTVlM2JjN2UwMTkwYzpjb206Z=
    W46VVM&amp;ssp=3DAMJHsmXhB6J6qymeYIqCDy13u3pmNYDdig" style=3D"text-decorati=
    on:none"> <img alt=3D"Facebook" src=3D"https://www.gstatic.com/alerts/image=
    s/fb-24.png" border=3D"0" height=3D"16" width=3D"16"></a> </td> <td width=
    =3D"16" style=3D"padding-right:6px"> <a href=3D"https://www.google.com/aler=
    ts/share?hl=3Den&amp;gl=3DUS&amp;ru=3Dhttps://www.brusselstimes.com/busines=
    s/355283/unethical-outsourcing-chatgpt-uses-kenyan-workers-for-traumatic-mo=
    deration&amp;ss=3Dtw&amp;rt=3DUnethical+outsourcing:+ChatGPT+uses+Kenyan+wo=
    rkers+for+traumatic+moderation+-+The+Brussels+Times&amp;cd=3DKhMxQjc0NDgyMT=
    I3NzE4OTgzMjg4MjIaOWZlVTMlM2JjN2UwMTkwYzpjb206ZW46VVM&amp;ssp=3DAHJAsmXhB6J=
    6qymeYIqCDy13u3pmNYDdig" style=3D"text-decoration:none"> <img alt=3D"Twitte=
    r" src=3D"https://www.gstatic.com/alerts/images/tw-24.png" border=3D"0" hei=
    ght=3D"16" width=3D"16"></a> </td> <td style=3D"padding:0px 0px 6px 15px;fo=
    nt-family:Arial"> <a href=3D"https://www.google.com/alerts/feedback?ffu=3Dh=
    ttps://www.brusselstimes.com/business/355283/unethical-outsourcing-chatgpt-=
    uses-kenyan-workers-for-traumatic-moderation&amp;source=3Dalertsmail&amp;hl=
    =3Den&amp;gl=3DUS&amp;msgid=3DMI89WPQR4MjEyNzcxkDO4zM19I4ODI&amp;s=3D7BH2qX4hF7=1WWKAFWb6Wo-Oo5IIJup1CEAsz2RPc" style=3D"text-decoration:none;vertical-alig=
    n:middle;color:#aaa;font-size:10px"> Flag as irrelevant </a> </td> </tr> </=
    table>  </div> </div> </td> <td style=3D"padding-right:18px"></td> </tr>   =
     <tr> <td colspan=3D"3" valign=3D"middle" style=3D"background-color:#f8f8f8=
    ;font-size:14px;vertical-align:middle;text-align:center;padding:10px 10px 1=
    0px 10px;line-height:20px;border:1px solid #e4e4e4;font-family:Arial"> <a h=
    ref=3D"https://www.google.com/alerts?s=3DAB2Xq4h5F1WWKAFWbW6o-Oo5IIJup1CEAs=
    z2RPc&amp;start=3D1678713928&amp;end=3D167920528&amp;source=3Dalertsmail&a=
    mp;hl=3Den&amp;gl=3DUS&amp;msgid=3DMTI3SPO4MjEyNzcxODk4zM149IODI#history" sty=
    le=3D"text-decoration:none;vertical-align:middle;color:#427fed">  See more =
    results  </a> <span style=3D"font-size:12px;padding-left:15px;padding-right=
    :15px;color:#aaa">|</span> <a href=3D"https://www.google.com/alerts/edit?" style=3D"text-decoration:none;vertical-align:middle;color=
    :#427fed">Edit this alert</a>  </td> </tr>  </table>  </div> </div> <!--[if mso]>
    </td></tr></table>
    <![endif]-->  </div>  </body> </html>
    --000000000000be1c6035f2beef1b2--
    

    The script could break if Google changes their alert emails at some point, but this is more of a one-time helper for me to pull data from thousands of emails. It's a piece of a larger puzzle that will run through those emails all at once.


  2. You’ll need to do these steps:

    1. Read the email file (you’re already doing that)
    2. Parse the email file and get the HTML body from it
    3. Parse the DOM defined by that HTML
    4. Select the script element
    5. Get its text content
    6. Parse it via JSON.parse
    7. Access the property from the resulting object

    You’re already reading the file, but just for completeness, here’s an example reading it via the fs/promises module’s readFile:

    import fs from "fs/promises";
    //...
    const mailText = await fs.readFile("./test.eml");
    

    Then we need to parse it. As you mentioned in a comment, there’s a mailparser npm module that does just that:

    import { simpleParser } from "mailparser";
    // ...
    const email = await simpleParser(mailText);
    

    Then we need to get the HTML body and parse it. There are several DOM parsers for Node.js; here I’m using jsdom:

    import { JSDOM } from "jsdom";
    // ...
    const dom = new JSDOM(email.html);
    

    Then we can use querySelector on dom.window.document to select the script element:

    const script = dom.window.document.querySelector("script[type='application/json']");
    

    If there are several, you may need to add more attributes to narrow it down, for instance:

    const script = dom.window.document.querySelector("script[type='application/json'][data-scope='data-scope='inboxmarkup']");
    

    Once you have the script element, you can access its text content via .textContent.

    Once you have the text, you can parse it with JSON.parse.

    Once you have the object, obj.publisher.name should give you the value you’re looking for.

    So:

    import fs from "fs/promises";
    import { simpleParser } from "mailparser";
    import { JSDOM } from "jsdom";
    
    const mailText = await fs.readFile(/*...your email file name...*/);
    const email = await simpleParser(mailText);
    const dom = new JSDOM(email.html);
    const script = dom.window.document.querySelector("script[type='application/json']");
    const json = script.textContent;
    const obj = JSON.parse(json);
    const name = obj.publisher.name;
    console.log(name); // "Google Alerts"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search