skip to Main Content

Suppose I have the following HTML table:

 <table>
  <tr>
    <th>Data</th>
    <th>More data</th>
    <th>Header 2</th>
    <th>Header 3</th>
    <th>Header 4</th>
  </tr>
  <tr>
    <td> -1234.1596 </td>
    <td> -0.15 </td>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td> 3714.8146 </td>
    <td>0.48</td>
    <td>Centro comercial </td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
</table> 

This HTML snippet looks like this:

enter image description here

However, I need to find a way of left-aligning all text and right-aligning all numbers. Note that this is just a minimal reproducible example. I generate giant tables in Python and need to strategically align everything without knowing exactly what the table looks like. For example, in Python I currently have

table.replace("<th>", "<th align="left" >")

but this also left-aligns the number cells.

2

Answers


  1. You could first get a tr with data to check for text or numbers.

    Something like parsing with BeautifulSoup, grabbing the td children of the first tr.

    Then use that as a way to assign types to indexes.

    Try to convert the first td’s content to integer, if it fails, save 0 as text, if it succeeds save 0 as number.

    Then based on that find the 0th th and assign an align attribute based on what you found on the tds.

    Login or Signup to reply.
  2. If you insist on using python to transform your string, you can use re.sub with a custom replacement function that returns a different string depending on whether the value in the cell is numeric or not (I assume you want to work on the cells, not the headers, but if not it will be easy to adapt or complete the code):

    import re
    
    def align_replace(m):
        numeric = re.match(r'[+-]?d+(.[d+])?', m.group(2).strip())
        if numeric:
            return f'<td align="right">{m.group(2)}'
        else:
            return f'<td align="left">{m.group(2)}'
    
    aligned_table = re.sub(r'(<td>)([^<]+)', align_replace, table)
    

    On your example string, this yields:

    '<table>  <tr>    <th>Data</th>    <th>More data</th>    <th>Header 2</th>    <th>Header 3</th>    <th>Header 4</th>  </tr>  <tr>    <td align="right"> -1234.1596 </td>    <td align="right"> -0.15 </td>    <td align="left">Alfreds Futterkiste</td>    <td align="left">Maria Anders</td>    <td align="left">Germany</td>  </tr>  <tr>    <td align="right"> 3714.8146 </td>    <td align="right">0.48</td>    <td align="left">Centro comercial </td>    <td align="left">Francisco Chang</td>    <td align="left">Mexico</td>  </tr></table>'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search