Filtering Java comments using StreamTokenizer
My goal is to analyze java source files to find line numbers containing non-comment code. Since StreamTokenizer has slashStarComments() and slashSlashComments(), I figured I'll use it to filter out the lines that have only comments and no code.
The program below prints the line numbers and any string tokens on that line, for each line that has something that's not a comment.
It works most of the time, but sometimes not... For example, line numbers get skipped every now and then begining with the comment line 144 in the following source file from log4j, Category.java: http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/Category.html StreamTokenizer sometimes just seem to skip some lines at the end of javadoc comments.
Here's my co开发者_如何学运维de:
import java.io.FileReader; import java.io.IOException; import java.io.Reader; import java.io.StreamTokenizer; public class LinesWithCodeFinder { public static void main(String[] args) throws IOException { String filePath = args[0]; Reader reader = new FileReader(filePath); StreamTokenizer tokenizer = new StreamTokenizer(reader); tokenizer.slashStarComments(true); tokenizer.slashSlashComments(true); tokenizer.eolIsSignificant(false); int ttype = 0; int lastline = -1; String s = ""; while (ttype != StreamTokenizer.TT_EOF) { ttype = tokenizer.nextToken(); int lineno = tokenizer.lineno(); String sval = ttype == StreamTokenizer.TT_WORD ? tokenizer.sval : ""; if (lineno == lastline) { s += " " + sval; } else { if (lastline != -1) System.out.println(lastline + "\t" + s); s = sval; } lastline = lineno; } } }
Does anyone understand why StreamTokenizer behaves as it does?
Any alternative ideas on how to filter out the comments would be appreciated.
Paragraphs within the comments are throwing off the line count. Starting at line 137...
/**
This constructor created a new <code>Category</code> instance and
sets its name.
<p>It is intended to be used by sub-classes only. You should not
create categories directly.
@param name The name of the category.
*/
...the two empty lines are shifting the line count off by two. So line 146 is being reported as line 144, etc. Not sure why, however. If you change the comment to the following:
/**
This constructor created a new <code>Category</code> instance and
sets its name.
<p>It is intended to be used by sub-classes only. You should not
create categories directly.
@param name The name of the category.
*/
...the line numbers after the comment will report correctly.
I think I found the bug in StreamTokenizer! I copied the class and renamed it to MyStreamTokenizer, and changed line 700 from:
if (c == '\n')
to
while (c == '\n')
and it works! A nasty bug by
@author James Gosling
@since JDK1.0
Try using the codehaus javancss library (NCSS = Non Commenting Source Statements).
There is a jar and source available on the central maven repo at http://repo1.maven.org/maven2/org/codehaus/javancss/javancss/32.53/
I just found that there is an unfixed bug in SDN's bugs database, bug 4517649 marked "Closed, Will Not Fix". http://localhost/hawk.html?gwt.codesvr=127.0.0.1:9997&locale=en
Due to compatibility restraints we will not further evolve this legacy class. xxxxx@xxxxx 2002-02-14
No workaround is given either :-(
精彩评论