skip to Main Content

I have html file.
I make list with dataframes with this comand:

html_df = pd.read_html(folder + '/' + file)

I delete values from few dfs in list and now I need save it as new html file with original structure.

P. S. Dataframes must keeping as separate tables.

How can I do it?

2

Answers


  1. Chosen as BEST ANSWER

    I got the answer. If you got better decision, feel free to add it. HTML file with result looks ugly, but it's readable with pandas.

    import pandas as pd
    import numpy as np
    import re
    import codecs
    
    # read file
    folder = 'folder_path'
    file = 'file_name.html'
    html_df = pd.read_html(folder + '/' + file)
    
    # check dataframes
    # I need keep data from some dataframes
    
    html_match = re.compile(r'_TOM$|_TOD$')
    df_check = []
    for i, df in enumerate(html_df):
        for col in df.columns:
            try:
                if len(df[df[col].str.contains(html_match) == True]) != 0:
                    df_check.append(i)
                else:
                    continue
            except AttributeError:
                continue
    
    # clear all tables except dataframe with '_TOM$|_TOD$'
    for i in range(len(html_df)):
        if i in df_check:
            continue
        else:
            html_df[i][html_df[i].columns[1:]] = ''
    
    
    # make str to put all tables togever
    html_all = str()
    # pattern to clear string
    pat = re.compile(r"[ ]{2,}")
    
    # put all dataframes at one string
    # to make html-code I used df.to_markdown.
    html_all = str()
    for i in range(len(html_df)):
        ner_str = str(html_df[i].to_markdown(tablefmt="html", index=False))
        ner_str = re.sub(pat, '''<br>
        ''', ner_str)
        html_all = '<br>'.join([html_all, ner_str])
    
    # save result
    # right encoding is very important
    file1 = 'new_file.html'
    with codecs.open(folder + '/' + file1, 'w', encoding='utf-8-sig') as f:
        f.write(html_all)
        f.close()
    

  2. assume the HTML is not complex and you can use pandas to render to a html file.

    for example:

    html_df = pd.to_html(folder + '/' + output_file)
    

    formal documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_html.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search