How does code coloring work?
How do code coloring engines work, exactly? Do they just generate a parse tree that preserves whitespace, color the leaves, and reconstruct the original program? How does live code coloring ma开发者_Go百科nage to be efficient enough to do it on the fly?
Most syntax hightligters I know of do not react to the syntax tree, but just tokenize the source and color text according to which kinds of tokens it forms. The most difficult task such as highlighter has to do is recognizing multi-line comments (and/or strings, if the language allows that); everything else can be kept within a single source line.
Automatic indentation engines are more involved. In theory the best results would come from reconstructing a full syntax tree, but that is slow and raises problems of error handling (because most programs are not even well-formed while they're being edited). Instead they use various kinds of simplified scanning and heuristics, which doesn't always manage to match the true syntax of the language.
(edit: on further thought this is not completely true. For example, Eclipse's Java editor will also change the color of identifiers according to whether they name local variables, instance fields or static variables/methods. This happens in a separate pass from the basic lexical highlighting, after the editor has parsed and typechecked the code for live crossreferencing).
Syntax highlighting usually works at the lexer level, not the parser level.
It's essentially a finite state machine derived from a set of regular expressions, so it's very quick to run.
精彩评论