Java regex for HTML "<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> " parsing
I am new to regexps, can someone help me in getting a regex for parsing the tag
<meta http-equiv="Co开发者_如何学Cntent-Type" content="text/html; charset=ISO-8859-1">
with all the possiblities?
To cover "all the possibilities", you really should be using HTML 5's Determining the character encoding rules. These aren't expressible as a regular expression.
There is an open source Java implementation of it in validator.nu
If you insist on using a regular expression, then this will probably cover most cases where the encoding it declared with a meta element (it won't, for instance, cover XML declarations). It is however, dirty, makes some assumptions that are usually (but may not always be) right and I do not recommend it.
/<meta[^>]+charset=['"]?(.*?)['"]?[\/\s>]/i
精彩评论