Use HTML Tidy to just indent HTML code?
Is it possible to use HTML Tidy to just indent HTML code?
Sample Code
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>
</form>
Desired Result
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"/>
</li>
<li><input class="submit" type="submit" value="Search"/></li>
</ul>
</form>
If I run it with the standard command, tidy -f errs.txt -m index.html
then I get this
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 15.3.6), see www.w3.org">
<t开发者_StackOverflow中文版itle></title>
</head>
<body>
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li><label class="screenReader" for=
"q">Keywords</label><input type="text" name="q" value="" id=
"q"></li>
<li><input class="submit" type="submit" value="Search"></li>
</ul>
</form>
</body>
</html>
How can I omit all the extra stuff and actually get it to indent the code?
Forgive me if that's not a feature that it's supposed to support, what library / tool am I looking for?
Use the indent
, tidy-mark
, and quiet
options:
tidy \
-indent \
--indent-spaces 2 \
-quiet \
--tidy-mark no \
index.html
Or, using a config file rather than command-line options:
indent: auto
indent-spaces: 2
quiet: yes
tidy-mark: no
Name it tidy_config.txt
and save it the same directory as the .html file. Run it like this:
tidy -config tidy_config.txt index.html
For more customization, use the tidy man page to find other relevant options such as markup: no
or force-output: yes
.
I didn't found a possibility "only reindent - without any changes". The next config file will "repair" as low as possible and (mostly) only re-indent the html. Tidy
still correcting some errorish conditions, like duplicated (repeated) attributes.
#based on http://tidy.sourceforge.net/docs/quickref.html
#HTML, XHTML, XML Options Reference
anchor-as-name: no #?
doctype: omit
drop-empty-paras: no
fix-backslash: no
fix-bad-comments: no
fix-uri:no
hide-endtags: yes #?
#input-xml: yes #?
join-styles: no
literal-attributes: yes
lower-literals: no
merge-divs: no
merge-spans: no
output-html: yes
preserve-entities: yes
quote-ampersand: no
quote-nbsp: no
show-body-only: auto
#Diagnostics Options Reference
show-errors: 0
show-warnings: 0
#Pretty Print Options Reference
break-before-br: yes
indent: yes
indent-attributes: no #default
indent-spaces: 4
tab-size: 4
wrap: 132
wrap-asp: no
wrap-jste: no
wrap-php: no
wrap-sections: no
#Character Encoding Options Reference
char-encoding: utf8
#Miscellaneous Options Reference
force-output: yes
quiet: yes
tidy-mark: no
For example the next html-fragment
<div>
<div>
<p>
not closed para
<h1>
h1 head
</h1>
<ul>
<li>not closed li
<li>closed li</li>
</ul>
some text
</div>
</div>
will changed to
<div>
<div>
<p>
not closed para
<h1>
h1 head
</h1>
<ul>
<li>not closed li
<li>closed li
</ul>some text
</div>
</div>
As you can notice, the hide-endtags: yes
hides the closing </li>
from the second bullet in the input. Setting the hide-endtags: no
- will get the next:
<div>
<div>
<p>
not closed para
</p>
<h1>
h1 head
</h1>
<ul>
<li>not closed li
</li>
<li>closed li
</li>
</ul>some text
</div>
</div>
so, tidy
adds closing </p>
and closing </li>
to first bullet.
I didn't found a possibility preserve everything on input and only reindent the file.
You need the following option:
tidy --show-body-only yes -i 4 -w 80 -m file.html
http://tidy.sourceforge.net/docs/quickref.html#show-body-only
-i 4
- indents 4 spaces (EDIT: tidy never uses tabs)
or
--indent-with-tabs yes
- instead (--tab-size
may affect wrapping)
-w 80
- wrap at column 80 (default on my system: 68, very narrow)
-m
- modify file inplace
(you may want to leave out the last option, and examine the output first)
Showing only body, will naturally leave out the tidy-mark
(generator meta
).
Another cool options are:
--quiet yes
- doesn't print W3C advertisements and other unnecessary output
(errors still reported)
To answer the poster's original question, using Tidy to just indent HTML code, here's what I use:
tidy --indent auto --quiet yes --show-body-only auto --show-errors 0 --wrap 0 input.html
input.html
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>
</form>
Output:
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li><label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"></li>
<li><input class="submit" type="submit" value="Search"></li>
</ul>
</form>
No extra HTML code added. Errors are suppressed. To find out what each option does, it's best to refer to the official reference.
I am very late to the party :)
But in your tidy config file set
tidy-mark: no
by default this is set to yes.
Once done, tidy will not add meta generator tag to your html.
If you'd like to simply format whatever html you receive, ignore errors and indent the code nicely this is a good one liner using tidy
tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null
You can use it with curl
too
curl -s someUrl | tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null
精彩评论