skip to Main Content

need to extraxt a value(dip) out of a html

</span><span class="pron dpron">/<span class="ipa dipa lpr-2 lpl-1">buːm</span>/</span></span></div><div class="pos-body">

my code leads into:
microsoft jscript runtime error object doesn’t support this property or method

@if (@CodeSection == @Batch) @then

@echo off
setlocal

curl https://dictionary.cambridge.org/de/worterbuch/englisch/boom >phoneme.html

set "htmlfile=phoneme.html"

rem // invoke JScript hybrid code and capture its output
for /f %%I in ('cscript /nologo /e:JScript "%~f0" "%htmlfile%"') do set "converted=%%I"

echo %converted%

rem // end main runtime
PAUSE
goto :EOF

@end // end batch / begin JScript chimera

var fso = WSH.CreateObject('scripting.filesystemobject'),
    DOM = WSH.CreateObject('htmlfile'),
    htmlfile = fso.OpenTextFile(WSH.Arguments(0), 1),
    html = htmlfile.ReadAll();

DOM.write(html);
htmlfile.Close();

var scrape = DOM.getElementsByTagName('pron dpron').getElementsByClassName('ipa dipa lpr-2 lpl-1')[0].innerText;
WSH.Echo(scrape.match(/^.*=s+(S+).*$/)[0]);

copy&pasted this and slightly edited.

need to get "bu:m" into a value or echoed.

Many thanks.

3

Answers


  1. Chosen as BEST ANSWER

    Thank you for all the tips. With @Reino and Echo-ing unicode character I was able to get what I need.

    @ECHO OFF
    chcp 65001
    
    xidel -s "https://dictionary.cambridge.org/de/worterbuch/englisch/boom" -e "(//span[@class='pron dpron']/span[@class='ipa dipa lpr-2 lpl-1'])[1]"
    
    PAUSE
    GOTO :EOF


  2. @ECHO OFF
    SETLOCAL
    
    SET "converted=<span class="pro">/<span class="dip">bu:m</span>/</span>"
    SET converted
    SET "converted=%converted:<= %"
    SET "converted=%converted:>= %"
    SET "grab="
    FOR %%e IN (%converted%) DO IF DEFINED grab (
     SET "converted=%%e"
     GOTO done
     ) ELSE IF %%e=="dip" SET "grab=y"
    :done
    SET converted
    GOTO :EOF
    

    Replace the redirectors with spaces, then process the value as a series of space-separated tokens. When the token "dip" appears, set grab to obtain the next token, grab it and exit the for.

    Note : converted will be mangled if "dip" does not appear.
    And : What appears to be a colon between bu and m is some unicode character. I replaced it with a colon for testing.

    Login or Signup to reply.
  3. You don’t need JScript in order to extract such a value from the .html file; you can do it directly with a Batch file.

    If the structure of the desired line is always the same:

    <span class="pro">/<span class="dip">buːm</span>/</span>
    

    … you can do it as simple as this line:

    for /F "tokens=3 delims=</>" %%a in ('findstr ""dip"" phoneme.html') do set "dip=%%a"
    echo %dip%
    

    If the line could change, first get the line with "dip" value via a findstr command, and then extract the dip value:

    for /F "delims=" %%a in ('findstr ""dip"" phoneme.html') do set "html=%%a"
    set "dip=%html:*"dip">=%"
    set "dip=%dip:<=" & rem "%"
    echo %dip%
    

    New code added

    This new method was designed and extracted from OP’s comments

    1- In your question you specified that you are looking for this string: "dip". However, in your comment it seems that the real string you want is this: "ipa dipa lpr-2 lpl-1". Please, note that the second string is very different than the first one because it contain spaces and most Batch commands are sensitive to spaces, so the code must be modified accordingly. BTW it is very bad "netiquette" that you provide us a certain data, test the code we wrote with different data, and then you say: "Your code not works"! Did you tested our code with the data you provided?

    2- In my answer I specified: "If the structure of the desired line is always the same:"

    <span class="pro">/<span class="dip">buːm</span>/</span>
    

    However, it seems that the real line is very different:

    </span><span class="pron dpron">/<span class="ipa dipa lpr-2 lpl-1">buːm</span>/</span></span> <span class="us dpron-i "><span class="region dreg">us</span><span class="daud"> converted= /span span class="pron dpron" / span class="ipa dipa lpr-2 lpl-1" buːm /span / /span /span span class="us dpron-i " span class="region dreg" us /span span class="daud"
    

    I added: "If the line could change…" use the second code.

    Why did you tested the first code if the real line is entirely different than the line you posted? You should use the second code instead… The aid could over-complicate if simple instructions are not followed…

    3- In your comment you indicated that the html file is created with this line:

    curl dictionary.cambridge.org/de/worterbuch/englisch/boom
    

    When I tested such a line I got this:

    <html>
    <head><title>301 Moved Permanently</title></head>
    <body>
    <center><h1>301 Moved Permanently</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>
    

    … but your complaint was: I just get: 'a href='https:'

    I really don’t know what else to say…


    I prepared a test file with this contents:

    Any other line...
    </span><span class="pron dpron">/<span class="ipa dipa lpr-2 lpl-1">buːm</span>/</span></span> <span class="us dpron-i "><span class="region dreg">us</span><span class="daud"> converted= /span span class="pron dpron" / span class="ipa dipa lpr-2 lpl-1" buːm /span / /span /span span class="us dpron-i " span class="region dreg" us /span span class="daud"
    Any other line...
    

    This is the new code:

    @echo off
    setlocal EnableDelayedExpansion
    
    REM  curl dictionary.cambridge.org/de/worterbuch/englisch/boom > phoneme.html
    
    for /F "delims=" %%a in ('findstr /C:""ipa dipa lpr-2 lpl-1"" phoneme.html') do set "html=%%a"
    set "dip=%html:*"ipa dipa lpr-2 lpl-1">=%"
    set "dip=%dip:<=" & rem "%"
    echo %dip%
    

    … and this is the output:

    buːm
    

    It seems that the output contain an Unicode character that, of course, can not be properly managed by a Batch file… :(

    PS – The Unicode character could be properly generated if chcp 65001 command is used…

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search