skip to Main Content

I’m currently trying to build an html parser to take yaml files like the one below and convert it into actual html. I’ve built up the parser to take the dict and parse out what’s required, but didn’t realize yaml doesn’t support duplicate elements/keys in the same dict, taking the last one as it’s value. Is there a way I can force it to disregard this, or a similar config file that would keep the minimal look to it besides JSON?

The main reason I picked yaml is due to it not needing anything like xml’s open/close brackets (cause at that point it should just be an xml generated website), or Json’s massive amounts of open/close braces, quotes, etc.


html:
  head:
    title: 
      content: Always Under Development
    meta: 
      attributes:
        charset: utf-8
    link: 
      attributes: 
        rel: stylesheet
        href: static/css/base.css
  body:
    div:
      attributes:
        class: Index_Div_Nav_Bar
      ul:
        li:
          content:
            index : Home
        li:
          content:
            projects : Projects
    div:
      attributes:
        class: foreground
    footer:
      div:
        attributes:
          class: Index_Div_Social_Media
        a:
          content:
            img:
              attributes:
                src: static/images/Linkedin-logo-2011-2019-241642781.png
                style: 'width:8%; height:5%;'
          attributes:
            href: 'https://www.linkedin.com/'
        br:
        a:
          content:
            img:
              attributes:
                src: static/images/gitlab-logo-square-1426672963.png
                style: 'width:5%; height:5%'
          attributes:
            href: 'https://gitlab.com/AltTabLife'

I’ve attempted to find other config files, or use yaml.load directly, which both have failed. Config files like JSON and direct python require quotes for everything and massive amounts of brain overloading curly braces. Configs like INI, don’t support complex file structures (to my understanding).

yaml.load() doesn’t seem to have a way to load it like I’m wanting. I straight just want to use the whitespace syntax to define where each element is meant to be nested.

Lastly, I’ve attempted using ruamel YAML allowing duplicate attributes, but all that does is give the warning, then process it similarly to before, only allowing one div tree to make it through.

2

Answers


  1. If your keys aren’t unique, you probably want to use a list – note the dashes denoting each element.

    import yaml
    
    
    vals = """
    A:
       - 1
       - 2
       - 3
    B:
       C: 4
       D: 5
    """
    
    print(yaml.load(vals,yaml.SafeLoader))
    

    Yields:

    {'A': [1, 2, 3], 'B': {'C': 4, 'D': 5}}
    
    Login or Signup to reply.
  2. YAML, by specification, is not allowed to have duplicate keys in a mapping. So if a parser incorrectly ignores this
    directive then there is no guarantee which of the values for the key is taken, especially because
    the YAML specification also states that the order of the keys is not significant.
    Since the usual data structure created for a YAML mapping is a Python dict, there is no
    way for it to contain information about multiple values and keep the order of all key value pairs (
    you can make each dict value a list of one or more elements, but that would only
    be able to keep the order of the values for a key, and still mean loss
    of original ordering of the key-value pairs).

    What you are looking for is parsing something that is not YAML, but since it
    is close to YAML, that doesn’t mean you can’t
    you can’t start with a YAML parser and derive a parser for your purposes from it.
    E.g. when ruamel.yaml parses a mapping all the key-value pairs are kept in order, and you can
    change the method that constructs a mapping to forgo checking duplicate keys and
    create a data structure that that keeps the info you need for generating HTML.

    Assming your input is in a file input.yaml:

    import sys
    from pathlib import Path
    import ruamel.yaml
    
    class MyConstructor(ruamel.yaml.RoundTripConstructor):
        def construct_mapping(self, node, datatyp, deep = False):
            if not isinstance(node, ruamel.yaml.nodes.MappingNode):
                raise ConstructorError(
                    None, None, f'expected a mapping node, but found {node.id!s}', node.start_mark,
                )
            ret_val = datatyp
            for key_node, value_node in node.value:
                # keys can be list -> deep
                key = self.construct_object(key_node, deep=True)
                assert isinstance(key, str)
                value = self.construct_object(value_node, deep=deep)
                ret_val.append((key, value))
            return ret_val
    
        def construct_yaml_map(self, node):
            data = []
            yield data
            self.construct_mapping(node, data, deep=True)
    
    MyConstructor.add_constructor(
        'tag:yaml.org,2002:map', MyConstructor.construct_yaml_map
    )
    
    file_in = Path('input.yaml')
        
    yaml = ruamel.yaml.YAML()
    yaml.Constructor = MyConstructor
    data = yaml.load(file_in)
    
    def attr(data):
        ret_val = ''
        if not isinstance(data, list):
            return ret_val
        for key, value in data:
            if key == 'attributes':
                for k1, v1 in value:
                    ret_val += f' {k1}="{v1}"'
        return ret_val
    
    def html(data, level=0):
        indent = '  ' * level
        if isinstance(data, list):
            for elem in data:
                if elem[0] == 'attributes':
                    continue
                if elem[1] is None or len(elem[1]) == 1 and elem[1][0] == 'attributes':
                    print(f'{indent}<{elem[0]}{attr(elem[1])}>')
                    continue
                print(f'{indent}<{elem[0]}{attr(elem[1])}>')
                html(elem[1], level+1)
                print(f'{indent}</{elem[0]}>')
        elif isinstance(data, str):
            print(f'{indent}{data}')
        else:
            print('type', type(data))
            # raise NotImplementedError
    
    html(data)
    

    which gives:

    <html>
      <head>
        <title>
          <content>
            Always Under Development
          </content>
        </title>
        <meta charset="utf-8">
        </meta>
        <link rel="stylesheet" href="static/css/base.css">
        </link>
      </head>
      <body>
        <div class="Index_Div_Nav_Bar">
          <ul>
            <li>
              <content>
                <index>
                  Home
                </index>
              </content>
            </li>
            <li>
              <content>
                <projects>
                  Projects
                </projects>
              </content>
            </li>
          </ul>
        </div>
        <div class="foreground">
        </div>
        <footer>
          <div class="Index_Div_Social_Media">
            <a href="https://www.linkedin.com/">
              <content>
                <img src="static/images/Linkedin-logo-2011-2019-241642781.png" style="width:8%; height:5%;">
                </img>
              </content>
            </a>
            <br>
            <a href="https://gitlab.com/AltTabLife">
              <content>
                <img src="static/images/gitlab-logo-square-1426672963.png" style="width:5%; height:5%">
                </img>
              </content>
            </a>
          </div>
        </footer>
      </body>
    </html>
    

    The data structure you end up with has a list of tuples where you normally would get a dict like object. That
    way you preserve the order and can handle tuples that have the same first element (i.e.. the "key"). YAML merge
    keys (<<) are not handled, but anchors/aliases can probably still be used in the input (if you do make sure
    you check for (infinite) recursion in the routine processing data).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search