What to do when unescapable character(s) are escaped?

2023-02-12 14:54 问答作者：

In designing of a (mini)language: When there are certain characters that should be escaped to lose special meanings (like quotes in some programming languages), what sh开发者_JAVA技巧ould be done, especially from a security perspective, when characters that are not escapable (e.g. normal characters which never have special meaning) are escaped? Should an error be "error"ed, or should the character be discarded, or should it be in the output the same as if it was not escaped?

Example: In a simple language where strings are delimited by double-quotes("), and any quotes in a given string are escaped with a back-slash(\): for input "We \said, \"We want Moshiach Now\"" -- what would should be done with the letter s in said which is escaped?

I prefer the lexer to whine when this occurs. A lexer/parser should be tight about syntax; one can always loosen it up later. If you are sloppy, you'll find you can't retract a decision you didn't think you made.

Assume that you initially decide to treat " backslash not-an-escape " as that pair of characters, and the "T" is not-an-escape today. Sometime later you decide to extend the language, and want "\T" to mean something special, and you change your language.

You'll find an angry mob of programmers storming your design castle, because for them, "\T" means "\" "T" (or "T" depending on your default decision), and you just broke their code. You hang your head in shame, retract the decision, and then realize... oops, there are no more available escape characters!

This lesson goes for any piece of syntax that isn't well defined in your language. If it isn't explicitly legal, it should be implicitly illegal and your compiler should check it. Or you'll never be able to extend your successful language.

If your language isn't going to be successful, you may not care as much.

Well, one way to solve the problem is for the backslash to just mean backslash when it precedes a non-escapable character. That's what Python does:

>>> print "a\tb"
a   b
>>> print "a\tb\Rc"
a   b\Rc

Obviously, most systems take the escape character to mean "take the next character verbatim", so escaping a "non-escapable" character is usually harmless. The problem later happens when you get to comparisons and such, where the literal text does not represent the actual value (that's where you see a lot of issues securitywise, especially with things like URLs).

So on the one hand, you can only accept a limited number of escaped characters. In that sense, you have an "escape sequence", rather than an escaped character (the \x is the entire sequence rather than a \ followed by an x). That's like the most safe mechanism, and it's not really burdensome to write.

The other option is to ensure that you you "canonicalizing" everything you compare, through some ruleset. This typically means removing all of the escape sequences properly up front, before comparison and comparing only the final values rather than the literals.

Most systems interpret the slash as Will Hartung says, except for alphanumerics which are variously used as aliases for control codes, character classes, word boundaries, the start of hex sequences, case region markers, hex or octal digits, etc. \s in particular often means white-space in perl5 style regexs. JavaScript, which interprets it as 's' in one context and as whitespace in another suffers from subtle bugs because of this choice. Consider /foo\sbar/ vs new RegExp('foo\sbar').

继续阅读：escaping language-design parsing

What to do when unescapable character(s) are escaped?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？