extract matches of a regex capturing group from a file
I want to perform the title-named action under linux command-line(several ca bash script will also do). the command I tried is:
sed 's/href="([^"])"/$1/g' page.html > list.lst
but obviously it failed.
To be precise, here is my input:
<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />
the output I want would be a comma-separated or space-separated list of all matches in the input file:
开发者_开发问答style/css/colors.css,style/css/global.css,style/css/icons.css
I think I got the right expression: href="([^"]*)"
but I have no clue how to perform this. sed would do a search/replace which is not exactly what I want.( to the contrary, I only need to keep matches and throw the rest away, and not to replace them )
grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g'
This will extract all the lines that contain href
in them and will only get the first href
on each line. Also, refer to this post about parsing HTML with regular expressions.
精彩评论