skip to Main Content

extracting <h2> title text from html where title text might include newlines

I have an html file with some <h2> tags such as a <- '<section id="sec-standard-stoet-geary" class="level2" data-number="9.4"> <h2 data-number="9.4" class="anchored" data-anchor-id="sec-standard-stoet-geary"> <span class="header-section-number">9.4</span> Standardising PISA results</h2>' b <- '<span class="fu">read_parquet</span>(<span class="st">"&lt;folder&gt;PISA_2015_student_subset.parquet"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre> </div> </div> </section><section id="sec-leftjoin"…

VIEW QUESTION

Convert the text input file to JSON in Python

I am using python to convert the text input file to json. My Code: import json import re filename = "text.txt" text = {} pattern = re.compile(r's*([^=t]+)s*=s*(.*)') with open(filename, encoding='utf8') as file: for line in file: match = pattern.match(line.strip()) if…

VIEW QUESTION
Back To Top
Search