Reading file in a pattern using awk

2023-01-05 19:43 问答作者：

I have an input file in following manner

<td> Name1 </td>
<td> <span class="test"><a href="url1">Link </a></span&开发者_StackOverflow社区gt;</td>
<td> Name2 </td>
<td> <span class="test"><a href="url2">Link </a></span></td>

I want a awk script to read this file and output in following manner

url1 Name1
url2 Name2

Can anyone help me out in this trivial looking problem? Thanks.

~~Extracting one href per is relatively simple, so long as they conform to XHTML standards and there is only at most one on a line and you don't care about enclosing tags, but perl is easier:~~

~~$ perl -ne 'print "$1\n" if /href="([^"]+)"/'~~

If you care about enclosing tags or they are not standard conformant, you cannot use regular expressions to parse HTML. It is impossible.

added: oops, you do care about context, forget about regexps and use a real HTML parser

Here is an awk script that does the job

awk '
/a href=\".*\"/ { sub( /^.*a href=\"/,"" ); sub(/\".*/,"");  print $0, name }
                { name = $2 }
'

this might work:

awk 'BEGIN
     {i=1}{line[i++]=$0}
     END
     {
      j=1; 
      while (j<i) 
      {print line[j+1] line[j]; j+=2}
     }' yourfile|awk '{print substr($4,7,length($4)-6),$6}'

gawk '/^<td>/ {n = $2; getline; print gensub(/.*href="([^"]*).*/,"\\1",1), n}' infile

url1 Name1
url2 Name2

awk 'BEGIN{RS="></td>\n"; FS="> | </|\""}{print $7, $2}' infile

every 2 lines as a record.

Reading file in a pattern using awk

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？