Java Regexp question

2023-01-20 10:42 问答作者：

There is a web site, what i want to parse. The source is the following

 <tr> <td><a
 href="http://www.z104.com/"><b>WNVZ</b></a>
 - Z104</td> <td>Norfolk</td> <td>Virginia</td> <td><img
 src="mp3.gif" alt="MP3" width="12"
 height="12"></td> <td><a
 href="http://provisioning.streamtheworld.com/pls/WNVZFM.pls">64
 Kbps</a></td> <td>Top 40</td> </tr>

 <tr> <td><a
 href="http://www.z104.com/"><b>WNVZ</b></a>
 - Z104</td> <td>Norfolk</td> <td>Virginia</td> <td><img
 src="mp3.gif" alt="MP3" width="12"
 height="12"></td> 开发者_开发技巧<td><a
 href="http://provisioning.streamtheworld.com/pls/WNVZFM.pls">64
 Kbps</a></td> <td>Top 40</td> </tr>

... etc

How can i cut all the datas from it, i d like to use a regexp, the return string what i need:

WNVZ - Z104#Norfolk#Virginia#http://provisioning.streamtheworld.com/pls/WNVZFM.pls#Top 40

WNVZ - Z104#Norfolk#Virginia#http://provisioning.streamtheworld.com/pls/WNVZFM.pls#Top 40 etc.

so, i want to cut all of this, where the string is ".pls" or ".m3u"

sorry my english is shit.

Parsing HTML with a regex is difficult; you might have better luck using an XML parser such as SAX.

Don't try to use regexps, since HTML isn't regular, and the number of edge cases will make coding a regexp impossible. Instead you'll have a more reliable solution using an HTML parser such as JTidy.

If you insists to use regex, I make this regex for you:

Search for:

  <tr\b[^><]*>\s*<td\b[^><]*>\s*<a\b[^><]*>\s*<b>\s*(WNVZ)\s*<\/b>\s*<\/a>\s*-\s*(\w+)<\/td>\s*<td\b[^><]*>\s*(Norfolk)\s*<\/td>\s*<td\b[^><]*>\s*(Virginia)\s*</td>\s*<td\b[^><]*>\s*<img\b[^><]*>\s*</td>\s*<td\b[^><]*>\s*<a\b[^><]*href\s*=\s*["']([^"'><]+)["'][^><]*>[^><]*<\/a>\s*<\/td>\s*<td\b[^><]*>([^><]*)</td>

Replace with:

  $1 - $2#$3#$4#$5#$6

继续阅读：regex

Java Regexp question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？