开发者

syntax highlighting design

I'm writing my own syntax highlighter in javascript for fun and see a couple of approaches but they both have pros and some pretty serious cons that I can't get around. What do you guys think about these approaches and are there better methods that I'm missing?

Assumption

Code to highlight exists in a single string.

Approaches

  1. Treat code in it's string form and use regular expressions to find patterns.

    Pros

    Simple to define and search for patterns

    Cons

    Hard to disregard keywords inside of quotes or comments

  2. Split the string by spaces and linebreaks and 开发者_StackOverflow中文版loop over the array.

    Pros

    Easy to keep track of scope

    Cons

    Hard to keep track of spaces and linebreaks after the split

EDIT: Lexical Analysis

So, if I understand it, using Lexical Analysis you break the string into tokens. This somehow sounds a lot like approach number 2? How do you approach reassembling the tokens into the original string?


Note: This uses jQuery. It can pretty well be rewritten to work with straight javascript if you want.

I actually wrote a little plugin for fun that does this:

(function($) {
 $.fn.codeBlock = function(blockComment) {

  // Setup keyword regex
   var keywords = /(abstract|boolean|break|byte|case|catch|char|class|const|continue|debugger|default|delete|do|double|else|enum|export|extends|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|long|native|new|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|try|typeof|var|void|volatile|while|with|true|false|prototype)(?!\w|=)/gi;

  // Booleans to toggle comment, regex, quote exclusions
   var comment = false;
   var quote = false;
   var regex = false;

  /*  Array used to store values of regular expressions, quotes, etc.
   so they can be used to ID locations to be skipped durring keyword
   regexing.
  */
   var locator = new Array();
   var locatorIndex = 0;

   if (blockComment) locator[locatorIndex++] = 0;

  var text = $(this).html();
  var continuation;
  var numerals = /[0-9]/;

  var arr = ($(this).html()).split("");
  var outhtml = "";

  for (key in arr) {
   // Assign three variables common 'lookup' values for faster aquisition
    var keyd = key;
    var val = arr[keyd];
    var nVal = arr[keyd - 1];
    var pVal = arr[++keyd];

   if ((val == "\"" || val == "'") && nVal != "\\") {
    if (quote == false) {
     quote = true;
     outhtml += val;
    }
    else {
     outhtml += val;
     quote = false;
    }
    locator[locatorIndex++] = parseInt(key);
   }
   else if (numerals.test(val) && quote == false && blockComment == false && regex == false) {
    outhtml += '<span class="num">' + val + '</span>';
   }
   else if (val == "/" && nVal != "<") {
    var keys = key;
    if (pVal == "/") {
     comment = true;
     continuation = key;
     break;
    }
    else if (pVal == "*") {
     outhtml += "/";
     blockComment = true;
     locator[locatorIndex++] = parseInt(key);
    }
    else if (nVal == "*") {
     outhtml += "/";
     blockComment = false;
     locator[locatorIndex++] = parseInt(key);
    }
    else if (pVal == "[" && regex == false) {
     outhtml += "<span class='res'>/";
     regex = true;
    }
    else {
     outhtml += "/";
    }
   }
   else if (val == "," || val == ";" && regex == true) {
    outhtml += "</span>" + val;
    regex = false;
   }
   else {
    outhtml += val;
   }
  }

  if (comment == true) {
   outhtml = outhtml.replace(keywords, "<span class='res'>$1</span>");
   outhtml += '<span class="com">';
   outhtml += text.substring(continuation, text.length);
   outhtml += '</span>';
  }
  else {
   if ((locator.length % 2) != 0) locator[locator.length] = (text.length - 1);

   if (locator.length != 0) {
    text = outhtml;

    outhtml  = text.substring(0, locator[0]).replace(keywords, "<span class=\"res\">$1</span>");

    for (var i = 0; i < locator.length;) {
     qTest = text.substring(locator[i], locator[i] + 1);
     if (qTest == "'" || qTest == "\"") outhtml += "<span class=\"quo\">";
     else outhtml += "<span class=\"com\">";

     outhtml += text.substring(locator[i], locator[++i] + 1) + "</span>";

     outhtml += text.substring(locator[i] + 1, locator[++i]).replace(keywords, "<span class=\"res\">$1</span>");
    }
   }
   else {
    outhtml = outhtml.replace(keywords, "<span class=\"res\">$1</span>");
   }
  }

  text = outhtml;
  $(this).html(text);
  return blockComment;
 }
})(jQuery);

I'm not going to claim it is the most efficient way of doing this or the best but it does work. There are still probably a few bugs in there I haven't ID'd yet (and 1 I know about but haven't gotten around to fixing) but this should give you an idea of how you could go about this if you like.

My suggested implementation of this is to create a textarea or something and have the plugin run when you click a button or something (as far as testing it goes that is a decent idea) and of course you can set the text in the textarea to some starting code to make sure it works (Tip: You can put tags in between the the <textarea> tag and it will render as text, not HTML).

Also, blockComment is a boolean, make sure to pass false because true will trigger the block quoting. If you decided to parse something line by line, like:

<a>code</a>
<a>some more code</a>

Do something like:

blockComment = false;
$("a").each(function() {
  blockComment = $(this).codeBlock(blockComment);
});
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜