need to extraxt a value(dip) out of a html
</span><span class="pron dpron">/<span class="ipa dipa lpr-2 lpl-1">buːm</span>/</span></span></div><div class="pos-body">
my code leads into:
microsoft jscript runtime error object doesn’t support this property or method
@if (@CodeSection == @Batch) @then
@echo off
setlocal
curl https://dictionary.cambridge.org/de/worterbuch/englisch/boom >phoneme.html
set "htmlfile=phoneme.html"
rem // invoke JScript hybrid code and capture its output
for /f %%I in ('cscript /nologo /e:JScript "%~f0" "%htmlfile%"') do set "converted=%%I"
echo %converted%
rem // end main runtime
PAUSE
goto :EOF
@end // end batch / begin JScript chimera
var fso = WSH.CreateObject('scripting.filesystemobject'),
DOM = WSH.CreateObject('htmlfile'),
htmlfile = fso.OpenTextFile(WSH.Arguments(0), 1),
html = htmlfile.ReadAll();
DOM.write(html);
htmlfile.Close();
var scrape = DOM.getElementsByTagName('pron dpron').getElementsByClassName('ipa dipa lpr-2 lpl-1')[0].innerText;
WSH.Echo(scrape.match(/^.*=s+(S+).*$/)[0]);
copy&pasted this and slightly edited.
need to get "bu:m" into a value or echoed.
Many thanks.
3
Answers
Thank you for all the tips. With @Reino and Echo-ing unicode character I was able to get what I need.
Replace the redirectors with spaces, then process the value as a series of space-separated tokens. When the token
"dip"
appears, setgrab
to obtain the next token, grab it and exit thefor
.Note :
converted
will be mangled if"dip"
does not appear.And : What appears to be a colon between
bu
andm
is some unicode character. I replaced it with a colon for testing.You don’t need JScript in order to extract such a value from the .html file; you can do it directly with a Batch file.
If the structure of the desired line is always the same:
… you can do it as simple as this line:
If the line could change, first get the line with
"dip"
value via afindstr
command, and then extract thedip
value:New code added
This new method was designed and extracted from OP’s comments…
1- In your question you specified that you are looking for this string:
"dip"
. However, in your comment it seems that the real string you want is this:"ipa dipa lpr-2 lpl-1"
. Please, note that the second string is very different than the first one because it contain spaces and most Batch commands are sensitive to spaces, so the code must be modified accordingly. BTW it is very bad "netiquette" that you provide us a certain data, test the code we wrote with different data, and then you say: "Your code not works"! Did you tested our code with the data you provided?2- In my answer I specified: "If the structure of the desired line is always the same:"
However, it seems that the real line is very different:
I added: "If the line could change…" use the second code.
Why did you tested the first code if the real line is entirely different than the line you posted? You should use the second code instead… The aid could over-complicate if simple instructions are not followed…
3- In your comment you indicated that the html file is created with this line:
When I tested such a line I got this:
… but your complaint was:
I just get: 'a href='https:'
I really don’t know what else to say…
I prepared a test file with this contents:
This is the new code:
… and this is the output:
It seems that the output contain an Unicode character that, of course, can not be properly managed by a Batch file…
:(
PS – The Unicode character could be properly generated if
chcp 65001
command is used…