Using sed with html data
I'm having some problems using sed in combination with html. The followi开发者_StackOverflowng sample illustrates the problem:
HTML="<html><body>ENTRY</body><html>"
TABLE="<table></table>"
echo $HTML | sed -e s/ENTRY/$TABLE/
This outputs:
sed: -e expression #1, char 18: unknown option to `s'
If I leave out the /
from $TABLE
so that it becomes <table><table>
it works ok.
Any ideas on how to fix it?
Update
Here's a sample that can reproduce the problem:template.html:
<html>
<body>
<table>
ENTRIES
</table>
</body>
</html>
gui_template:
<tr>
<td class="td_tut_title">TITLE</td>
<td class="td_tut_content">
<a href="../tutorials/GUI/FILENAME"><img src="img/bbp.png" alt="bbp" /></a>
</td>
</tr>
genhtml.sh:
#!/bin/bash
HTML=`cat template.html`
ENTRIES=`cat gui_template | sed -e s/FILENAME/test/ | sed -e s/TITLE/title/`
DELIM=$'\377'
echo $HTML | sed -e "s${DELIM}ENTRIES${DELIM}$ENTRIES${DELIM}"
Output:
~/htmlgen $ ./genhtml.sh
sed: -e expression #1, char 14: unterminated `s' command
Use different delimiter @ for example
echo $HTML | sed -e s@ENTRY@$TABLE@
Issuing these lines on FreeBSD console:
HTML="<html><body>ENTRY</body></html>"
TABLE="<table></table>"
echo $HTML | sed -e "s#ENTRY#$TABLE#"
Result in:
<html><body><table></table></body></html>
You need to use a delimiter that can't appear in $TABLE, and if $TABLE is unpredictable enough this can be tricky. I'd suggest using a nonprinting character as a delimiter; it's easier to find one that's not going to show up in $TABLE and break everything. The only problem is they're harder to type in, so I'd suggest putting it in a variable and using that in the sed command:
DELIM=$'\377'
HTML="<html><body>ENTRY</body><html>"
TABLE="<table></table>"
echo "$HTML" | sed -e "s${DELIM}ENTRY${DELIM}$TABLE${DELIM}"
Note that the $'...'
construct is a bash-only feature; if you need this to run under generic sh you'll have to do something messier, like DELIM="$(printf "\377")"
. Also, I chose \377 (that's FF in hex) because it's illegal in the UTF-8 encoding, so it should be safe if you're using UTF-8 for your HTML; if you're using something else, like Windows-1252, then \177 (the 'DEL' character) might be a safer choice.
Oh, yeah, and if you ever try to debug this with bash -x
, be prepared for comedy.
精彩评论