开发者

Regex to match charset

I've been trying to make a Regex to match the charset of mime multipart emails so as I can decode them correctly. However I've found that there are some differences in the format that I can't seem to work out a Regex for, as I'm no expert. currently I'm using (?<=charset=).*(?=;) however the examples I've found by sending emails from different clients are:

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

charset=US-ASCII;

Content-Type: text/plain; charset=iso-8859-1

So my Regex works on first two but not the last, however if I remove (?=;) then I will also match the format=flowed part, which I don't wa开发者_StackOverflow中文版nt.


Instead of .*, you can use [^;]*. That is, match anything but the ;.

So, the pattern becomes:

(?<=charset=)[^;]*

References

  • regular-expressions.info/Character Classes


Building on this I've found this catches a couple more circumstances:

(?<=charset=)(([^;,\r\n]))*

Hope that helps.


Match on either ; or the end of line ($).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜