syntax highlighting design

2023-01-26 23:06 问答作者：

I'm writing my own syntax highlighter in javascript for fun and see a couple of approaches but they both have pros and some pretty serious cons that I can't get around. What do you guys think about these approaches and are there better methods that I'm missing?

Assumption

Code to highlight exists in a single string.

Approaches

Treat code in it's string form and use regular expressions to find patterns.
Pros
Simple to define and search for patterns
Cons
Hard to disregard keywords inside of quotes or comments

Split the string by spaces and linebreaks and 开发者_StackOverflow中文版loop over the array.
Pros
Easy to keep track of scope
Cons
Hard to keep track of spaces and linebreaks after the split

EDIT: Lexical Analysis

So, if I understand it, using Lexical Analysis you break the string into tokens. This somehow sounds a lot like approach number 2? How do you approach reassembling the tokens into the original string?

Note: This uses jQuery. It can pretty well be rewritten to work with straight javascript if you want.

I actually wrote a little plugin for fun that does this:

(function($) {
 $.fn.codeBlock = function(blockComment) {

  // Setup keyword regex
   var keywords = /(abstract|boolean|break|byte|case|catch|char|class|const|continue|debugger|default|delete|do|double|else|enum|export|extends|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|long|native|new|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|try|typeof|var|void|volatile|while|with|true|false|prototype)(?!\w|=)/gi;

  // Booleans to toggle comment, regex, quote exclusions
   var comment = false;
   var quote = false;
   var regex = false;

  /*  Array used to store values of regular expressions, quotes, etc.
   so they can be used to ID locations to be skipped durring keyword
   regexing.
  */
   var locator = new Array();
   var locatorIndex = 0;

   if (blockComment) locator[locatorIndex++] = 0;

  var text = $(this).html();
  var continuation;
  var numerals = /[0-9]/;

  var arr = ($(this).html()).split("");
  var outhtml = "";

  for (key in arr) {
   // Assign three variables common 'lookup' values for faster aquisition
    var keyd = key;
    var val = arr[keyd];
    var nVal = arr[keyd - 1];
    var pVal = arr[++keyd];

   if ((val == "\"" || val == "'") && nVal != "\\") {
    if (quote == false) {
     quote = true;
     outhtml += val;
    }
    else {
     outhtml += val;
     quote = false;
    }
    locator[locatorIndex++] = parseInt(key);
   }
   else if (numerals.test(val) && quote == false && blockComment == false && regex == false) {
    outhtml += '<span class="num">' + val + '</span>';
   }
   else if (val == "/" && nVal != "<") {
    var keys = key;
    if (pVal == "/") {
     comment = true;
     continuation = key;
     break;
    }
    else if (pVal == "*") {
     outhtml += "/";
     blockComment = true;
     locator[locatorIndex++] = parseInt(key);
    }
    else if (nVal == "*") {
     outhtml += "/";
     blockComment = false;
     locator[locatorIndex++] = parseInt(key);
    }
    else if (pVal == "[" && regex == false) {
     outhtml += "<span class='res'>/";
     regex = true;
    }
    else {
     outhtml += "/";
    }
   }
   else if (val == "," || val == ";" && regex == true) {
    outhtml += "</span>" + val;
    regex = false;
   }
   else {
    outhtml += val;
   }
  }

  if (comment == true) {
   outhtml = outhtml.replace(keywords, "<span class='res'>$1</span>");
   outhtml += '<span class="com">';
   outhtml += text.substring(continuation, text.length);
   outhtml += '</span>';
  }
  else {
   if ((locator.length % 2) != 0) locator[locator.length] = (text.length - 1);

   if (locator.length != 0) {
    text = outhtml;

    outhtml  = text.substring(0, locator[0]).replace(keywords, "<span class=\"res\">$1</span>");

    for (var i = 0; i < locator.length;) {
     qTest = text.substring(locator[i], locator[i] + 1);
     if (qTest == "'" || qTest == "\"") outhtml += "<span class=\"quo\">";
     else outhtml += "<span class=\"com\">";

     outhtml += text.substring(locator[i], locator[++i] + 1) + "</span>";

     outhtml += text.substring(locator[i] + 1, locator[++i]).replace(keywords, "<span class=\"res\">$1</span>");
    }
   }
   else {
    outhtml = outhtml.replace(keywords, "<span class=\"res\">$1</span>");
   }
  }

  text = outhtml;
  $(this).html(text);
  return blockComment;
 }
})(jQuery);

I'm not going to claim it is the most efficient way of doing this or the best but it does work. There are still probably a few bugs in there I haven't ID'd yet (and 1 I know about but haven't gotten around to fixing) but this should give you an idea of how you could go about this if you like.

My suggested implementation of this is to create a textarea or something and have the plugin run when you click a button or something (as far as testing it goes that is a decent idea) and of course you can set the text in the textarea to some starting code to make sure it works (Tip: You can put tags in between the the <textarea> tag and it will render as text, not HTML).

Also, blockComment is a boolean, make sure to pass false because true will trigger the block quoting. If you decided to parse something line by line, like:

<a>code</a>
<a>some more code</a>

Do something like:

blockComment = false;
$("a").each(function() {
  blockComment = $(this).codeBlock(blockComment);
});

继续阅读：javascript syntax-highlighting

syntax highlighting design

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？