How can I de-comment JavaScript code with this preg_replace?

2023-02-14 11:06 问答作者：

I'm trying to decomment my // comments in my javascript with php preg_replace() and made a preg_replace which should do following:

1.When a comment start on a new line, delete that entire line: // COMMENTS .....

2.When comment is halfway behind a script, after 1 TAB // remove that comment part exampleScript(); // (1space) comments

3.Don't match the // in http://

This pregreplace does the above job, HOWEVER, it currently removes 3 lines of code with // in it. (see the false matches header below) which it should skip.

$buffer = preg_replace('/(?<!http:)\/\/\s*[^\r\n]*/', '', $buffer);

good matches

//something

// something *!&~@#^hjksdhaf

function();// comment

false matches

(/\/\.\//)
"//"  
"://"

So, How can I filter these three false matches out and how to change the below regex?

(?<!http:)\/\/\s*[^\r\n]*

PS, I don't wish to u开发者_如何学编程se others' code minifiers/frameworks with their own overheads. Just my own for now.

Why not use a preexisting JavaScript minifier, like the YUI Compressor (PHP bindings here)?

If you are really set on writing your own, have a look through the source code to see how it's done.
Short version: The Right Way is to use a proper parser/tokenizer approach.

The grammar of JavaScript is a context-free grammar (I believe it's LL(1)-parseable). It cannot be parsed with regular expressions.

In the theory of formal languages in computability theory, there is a result known as the pumping lemma which proves that you cannot parse arbitrary context-free grammars with a regular expression.

The gist of the problem is this: you can't just look for the string //, because it could be contained inside otherwise valid code, for example, a string. You can't just look for a // inside two quotation marks, because then you'd get false positives like alert('no!') // can't do it where the text ) // can is technically contained between two ' marks. Instead, you'd have to detect where strings begin and end. Worse, one type of strings can be nested inside another type of strings, and strings (even half-open strings) can be nested inside of comments!

There is no simple general solution -- JavaScript syntactic elements like strings, brackets, parentheses, etc., can be nested arbitrarily many levels deep. The only way to accurately detect where any syntactic element begins and ends is to correctly parse all the syntactic elements that you might encounter along the way.

The correct answer is to use an actual parser.

$buffer = preg_replace('/(?<!\S)\/\/\s*[^\r\n]*/', '', $buffer);

Works on all of the instances mentioned in the question: keeps the positive matches, removes the false matches.

Three awesome websites on the net that help with finding the correct regex:

http://gskinner.com/RegExr/

http://lumadis.be/regex/test_regex.php

http://cs.union.edu/~hannayd/csc350/simulators/RegExp/reg.htm

继续阅读：javascript minify obfuscation php preg-replace

How can I de-comment JavaScript code with this preg_replace?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？