I’m currently trying to build an html parser to take yaml files like the one below and convert it into actual html. I’ve built up the parser to take the dict and parse out what’s required, but didn’t realize yaml doesn’t support duplicate elements/keys in the same dict, taking the last one as it’s value. Is there a way I can force it to disregard this, or a similar config file that would keep the minimal look to it besides JSON?
The main reason I picked yaml is due to it not needing anything like xml’s open/close brackets (cause at that point it should just be an xml generated website), or Json’s massive amounts of open/close braces, quotes, etc.
html:
head:
title:
content: Always Under Development
meta:
attributes:
charset: utf-8
link:
attributes:
rel: stylesheet
href: static/css/base.css
body:
div:
attributes:
class: Index_Div_Nav_Bar
ul:
li:
content:
index : Home
li:
content:
projects : Projects
div:
attributes:
class: foreground
footer:
div:
attributes:
class: Index_Div_Social_Media
a:
content:
img:
attributes:
src: static/images/Linkedin-logo-2011-2019-241642781.png
style: 'width:8%; height:5%;'
attributes:
href: 'https://www.linkedin.com/'
br:
a:
content:
img:
attributes:
src: static/images/gitlab-logo-square-1426672963.png
style: 'width:5%; height:5%'
attributes:
href: 'https://gitlab.com/AltTabLife'
I’ve attempted to find other config files, or use yaml.load directly, which both have failed. Config files like JSON and direct python require quotes for everything and massive amounts of brain overloading curly braces. Configs like INI, don’t support complex file structures (to my understanding).
yaml.load() doesn’t seem to have a way to load it like I’m wanting. I straight just want to use the whitespace syntax to define where each element is meant to be nested.
Lastly, I’ve attempted using ruamel YAML allowing duplicate attributes, but all that does is give the warning, then process it similarly to before, only allowing one div tree to make it through.
2
Answers
If your keys aren’t unique, you probably want to use a list – note the dashes denoting each element.
Yields:
YAML, by specification, is not allowed to have duplicate keys in a mapping. So if a parser incorrectly ignores this
directive then there is no guarantee which of the values for the key is taken, especially because
the YAML specification also states that the order of the keys is not significant.
Since the usual data structure created for a YAML mapping is a Python dict, there is no
way for it to contain information about multiple values and keep the order of all key value pairs (
you can make each dict value a list of one or more elements, but that would only
be able to keep the order of the values for a key, and still mean loss
of original ordering of the key-value pairs).
What you are looking for is parsing something that is not YAML, but since it
is close to YAML, that doesn’t mean you can’t
you can’t start with a YAML parser and derive a parser for your purposes from it.
E.g. when
ruamel.yaml
parses a mapping all the key-value pairs are kept in order, and you canchange the method that constructs a mapping to forgo checking duplicate keys and
create a data structure that that keeps the info you need for generating HTML.
Assming your input is in a file
input.yaml
:which gives:
The data structure you end up with has a list of tuples where you normally would get a dict like object. That
way you preserve the order and can handle tuples that have the same first element (i.e.. the "key"). YAML merge
keys (
<<
) are not handled, but anchors/aliases can probably still be used in the input (if you do make sureyou check for (infinite) recursion in the routine processing
data
).