How to detect if a Unicode char is supported by EBCDIC in .NET 4.0?

2023-02-13 10:27 问答作者：

We have a web site and WinForms application written in .NET 4.0 that allows users to enter any Unicode char (pretty standard).

The problem is that a small amount of our data gets submitted to an old mainframe application. While we were testing a user entered a name with characters that ending up crashing the mainframe program. The name was BOËNS. The E is not supported.

What is the best way to detect if a unicode char is supported by EBCDIC?

I tried using the following regular expression but that restricted some standard special chars (/, _, :) which are fine for the mainframe.

I would prefer to use one method to validate each char or have a method that you just passed in a st开发者_StackOverflow社区ring and it returned true or false if chars not supported by EBCDIC were contained in the strig.

First, you would have to get the proper Encoding instance for EBCDIC, calling the static GetEncoding method which will takes the code page id as a parameter.

Once you have that, you can set the DecoderFallback property to the value in the static ExceptionFallback property on the DecoderFallback class.

Then, in your code, you would loop through each character in your string and call the GetBytes method to encode the character to the byte sequence. If it cannot be encoded, then a DecoderFallbackException is thrown; you would just have to wrap each call to GetBytes in a try/catch block to determine which character is in error.

Note, the above is required if you want to know the position of the character that failed. If you don't care about the position of the character, just if the string will not encode as a whole, then you can just call the GetBytes method which takes a string parameter and it will throw the same DecoderFallbackException if a character that cannot be encoded is encountered.

You can escape characters in Regex using the \ . So if you want to match a dot, you can do @"\." . To match /._,:[]- for example: @"[/._,:\-\[\]] . Now, EBDIC is 8 bits, but many characters are control characters. Do you have a list of "valid" characters?

I have made this pattern:

string pattern = @"[^a-zA-Z0-9 ¢.<(+&!$*);¬/|,%_>?`:#@'=~{}\-\\" + '"' + "]";

It should find "illegal" characters. If IsMatch then there is a problem.

I have used this: http://nemesis.lonestar.org/reference/telecom/codes/ebcdic.html

Note the special handling of the ". I'm using the @ at the beginning of the string to disable \ escape expansion, so I can't escape the closing quote, and so I add it to the pattern in the end.

To test it:

Regex rx = new Regex(pattern);
bool m1 = rx.IsMatch(@"a-zA-Z0-9 ¢.<(+&!$*);¬/|,%_>?`:#@'=~{}\-\\" + '"');
bool m2 = rx.IsMatch(@"€a-zA-Z0-9 ¢.<(+&!$*);¬/|,%_>?`:#@'=~{}\-\\" + '"');

m1 is false (it's the list of all the "good" characters), m2 is true (to the other list I've added the € symbol)

继续阅读：.net-4.0 ebcdic unicode

How to detect if a Unicode char is supported by EBCDIC in .NET 4.0?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？