So, I have this as input file, temp.html
:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div id="ext-comp-1725" class="x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window" style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
<div class="x-window-tl"><div class="x-window-tr"><div class="x-window-tc"><div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
<div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536"><div class="x-tool x-tool-icon x-tool-close"> </div></div>
<span class="x-window-header-text" id="ext-gen1541">Hello</span>
</div></div></div></div>
</body></html>
I was hoping I could pretty-print and indent tags hierarchically by using xmlstarlet
:
$ xmlstarlet fo --html --recover --indent-spaces 2 --omit-decl temp.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<div id="ext-comp-1725" class="x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window" style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
<div class="x-window-tl"><div class="x-window-tr"><div class="x-window-tc"><div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
<div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536"><div class="x-tool x-tool-icon x-tool-close"> </div></div>
<span class="x-window-header-text" id="ext-gen1541">Hello</span>
</div></div></div></div>
</div></body>
</html>
… however, as it is obvious from the command output above, it only indents some tags (e.g. it split <html><body>
and indented those tags properly) – but fails on others (e.g. it kept </div></div></div></div>
in a single line).
Is it possible to persuade/set-up xmlstarlet
to split off and indent all tags, one tag per line, with proper indentation?
$ xmlstarlet --version
srcinfo-cache
compiled against libxml2 2.9.10, linked with 21209
compiled against libxslt 1.1.34, linked with 10142
2
Answers
Well, it seems
tidy
works here (found it via A command-line HTML pretty-printer: Making messy HTML readable):First convert the input file to XML (a
</div>
is missing).By default
format
uses an indentation of 2 spaces.