Help with Regex. Need to extract `<A HREF`

2023-02-27 00:17 问答作者：

i have <A HREF="f110111.ZIP"> and f11开发者_运维知识库0111 - is an arbitrary char sequence. I need C# regex match expression to extract all above.

E. g. input is

<A HREF="f110111.ZIP"><A HREF="qqq.ZIP"><A HREF="gygu.ZIP">

I want the list:

f110111.ZIP
qqq.ZIP
gygu.ZIP

What you need is the htmlagility pack/! That will allow you to read HTML in an easy manner and provide an easy way to retrieve links.

If you can have multiple dots in the filename:

<A HREF="(^["]+?).zip

If you do not have dots in the filename (just one before the zip), you can use a faster one:

<A HREF="(^[".]+)

C# example:

Pattern pattern = Pattern.compile("<A HREF=\"(^[\"]+?).zip");

Matcher matcher = pattern.matcher(buffer);
while (matcher.find()) {
    // do something with: matcher.group(1)
}

NO NO! Do not use Regex to parse HTML!

Try an XML Parser. Or XPath perhaps.

Try this one:

/<a href="([^">]+.ZIP)/gi

I think Regular Expressions are a great way to filter text out of a given text.

This regex gets the File, Filename and Extension from the given text.

href="(?<File>(?<Filename>.*?)(?<Ext>\.\w{1,3}))"

Regex above expects an extension that exists out of word characters a-z A-Z 0-9, between 1 and 3 characters.

C# Code sample:

string regex = "href=\"(?<File>(?<Filename>.*?)(?<Ext>\\.\\w{1,3}))\"";
RegexOptions options = ((RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline) | RegexOptions.IgnoreCase);
Regex reg = new Regex(regex, options);

继续阅读：.net regex

Help with Regex. Need to extract `<A HREF`

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？