开发者

sed -n s/pattern/\1/p printing both match and non-match?

I want to retrieve the board name for a 4chan thread using this pattern:

echo $(cat ~/Desktop/test.html | sed -n "s/<title>\(.*\) - />\1</p")

test.html contains:

<link rel="shortcut icon" href="http://static.4chan.org/image/favicon.ico" /><link rel="stylesheet" type="text/css" href="http://static.4chan.org/css/yotsuba.9.css" title="Yotsuba"><link rel="alternate stylesheet" type="text/css" href="http://static.4chan.org/css/yotsublue.9.css" title="Yotsuba B"><link rel="alternate stylesheet" type="text/css" href="http://static.4chan.org/css/futaba.9.css" title="Futaba"><link rel="alternate stylesheet" type="text/css" href="http://static.4chan.org/css/burichan.9.css" title="Burichan"><title>/b/ - Random</title>

I want to match /b/, but instead it just removes "<title>" and "-" like so:

<link rel=开发者_如何学JAVA"shortcut icon" href="http://static.4chan.org/image/favicon.ico" /><link rel="stylesheet" type="text/css" href="http://static.4chan.org/css/yotsuba.9.css" title="Yotsuba"><link rel="alternate stylesheet" type="text/css" href="http://static.4chan.org/css/yotsublue.9.css" title="Yotsuba B"><link rel="alternate stylesheet" type="text/css" href="http://static.4chan.org/css/futaba.9.css" title="Futaba"><link rel="alternate stylesheet" type="text/css" href="http://static.4chan.org/css/burichan.9.css" title="Burichan">>/b/<Random</title>

Why?


Because that's all you told it to substitute. If you want to remove from the beginning and to the end then you need to anchor the ends with ^ and $ and match all the characters between.


Something like this:

sed -n "s/.*<title>\([^<>]*\) - .*/\1/p" ~/Desktop/test.html

Your problem is that your regular expression doesn't match the beginning of the string (in my case .* do this" and end of string (again in my case it's ".*" in the end)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜