开发者

Java regex for HTML "<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> " parsing

I am new to regexps, can someone help me in getting a regex for parsing the tag

<meta http-equiv="Co开发者_如何学Cntent-Type" content="text/html; charset=ISO-8859-1"> 

with all the possiblities?


To cover "all the possibilities", you really should be using HTML 5's Determining the character encoding rules. These aren't expressible as a regular expression.

There is an open source Java implementation of it in validator.nu


If you insist on using a regular expression, then this will probably cover most cases where the encoding it declared with a meta element (it won't, for instance, cover XML declarations). It is however, dirty, makes some assumptions that are usually (but may not always be) right and I do not recommend it.

/<meta[^>]+charset=['"]?(.*?)['"]?[\/\s>]/i
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜