I need to pull values from a JSON object that’s within a script tag in an HTML file. The HTML is actually an email (.eml) file.
I am using node’s "fs" module to read the file and that works fine. And, generally, I know how to select HTML elements (using document.getElementById
, innerHTML
, etc) and how to work my way through JSON object hierarchies to select values (using JSON.parse
and dot notation, etc). But, I’m not sure how to go about selecting values from within code like this.
X-Account-Key: account31
X-UIDL: 00001b5f073425
X-Mozilla-Status: 0000
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
... more email header info ...
<html lang=3D"en-US"> <head> </head> <body> <div> <script data-scope=3D"in=
boxmarkup" type=3D"application/json">{
"api_version": "1.0",
"publisher": {
"api_key": "67892787u2cfedea31b225240gg3423t9",
"name": "Google Alerts"
},
"cards": [ {
"title": "Google Alert - "search keywords"",
"subtitle": "Highlights from the latest email",
"actions":
... and so on with JSON object, then closing script tag...
... email body wrapped in DIV tag ...
What if I want to grab publisher.name
or any other property’s value from this code?
Any and all pointers appreciated.
2
Answers
This is a supplementary answer, built on that of @t-j-crowder. It's what I was shooting for, pulling key data from a nice and neat object in a Google Alert .eml file, rather than scraping the messy HTML of the email itself. If there's already an object in there, why not make use of it?
Check out the "OUTPUT" comment at the end of the JS below to really see what I was going for with this.
If you want to test it yourself, save both the javascript and the example email code below to separate files. And you'll need to install two NPM packages: mailparser and JSDOM.
Example Google Alert code:
The script could break if Google changes their alert emails at some point, but this is more of a one-time helper for me to pull data from thousands of emails. It's a piece of a larger puzzle that will run through those emails all at once.
You’ll need to do these steps:
script
elementJSON.parse
You’re already reading the file, but just for completeness, here’s an example reading it via the
fs/promises
module’sreadFile
:Then we need to parse it. As you mentioned in a comment, there’s a
mailparser
npm
module that does just that:Then we need to get the HTML body and parse it. There are several DOM parsers for Node.js; here I’m using
jsdom
:Then we can use
querySelector
ondom.window.document
to select thescript
element:If there are several, you may need to add more attributes to narrow it down, for instance:
Once you have the
script
element, you can access its text content via.textContent
.Once you have the text, you can parse it with
JSON.parse
.Once you have the object,
obj.publisher.name
should give you the value you’re looking for.So: