Current state of the art for c++ parsers?
I understand that it's a very hard thing to do, what with #ifdef, #define, and开发者_运维知识库 templates, but what is the state of the art of c++ parsers (be it open source, or proprietary?).
I mean, for a university project I'm thinking of creating a tool for analysing c++ code bases, but it seems very hard to find a good parser for it.
Should I give up and settle for java parsers? Similarly, what's the state of the art for java parsers? What about c#?
Also, would ripping the parser part of g++ apart from it ever work for the purposes of code analysis, or is it too much effort trying to do so?
You're in luck! Clang just started being able to parse most c++ programs within the last few months: http://clang.llvm.org/ It's one of the few open source parsers actually able to parse most of C++. (Mostly just GCC and CLANG, I hear Oink(?) Can get pretty good sometimes) And it's built to be used as a library by IDEs and the like, even has architecture built to support code rewriting.
There are some proprietary parsers that get the job done, But none of them are really usable without source access.
Regarding ripping apart gcc, That's not very practical for code analysis depending on what you are trying to do, you could use the new plugin architecture to get some usable information out of it, however at a very early level in parsing, it does something called term folding, where the parser itself will optimize out things like "x = x" (A simplistic example) And other aspects of the compiler expects this to happen, so it's not trivial to remove. Thus making gcc nearly useless for anything resembling source rewriting.
For C++ you can use GCC with -fdump-translation-unit
& friends option to get AST from it.
See: http://www.manpagez.com/man/1/g++/
If you can compile something by g++ then you can get tree from it.
The industry stardard C++ parser, widely used in compilers, in EDG's C++ front end. I have no experience with this; but I understand it handles a huge variety of C++ dialects. I understand you can get it free for research purposes.
The open source standard is the GCC compiler. I hear is it difficult to understand and modify.
There's CLANG as mentioned in other answers. I have no experience here. My understanding is that it is fairly sophisticated especially in terms of supporting analysis.
Our proprietary DMS Software Reengineering Toolkit has full C++ parser with full name and type resolution, preprocessor expansion (or retention, which the other tools will not do). The C++ front end handles several dialects of C++: ANSI, GCC, MS Visual Studio. As you might guess, I have a lot of experience with this one.
DMS/CppFrontEnd has been used to carry out program analyses as well as massive program source-to-source transformations on C++ code, enabled by DMS's pattern parser, which will parse any fragment of C++ code. I believe the other C++ front ends don't provide source-to-source transformations. With those you can likely hack at the ASTs procedurally, but this is pretty inconvenient because you have to know the precise AST structure and for C++ this is pretty complicated.
DMS also has full C, Java and COBOL front ends with name and type resolution as well as control and data flow analysis. It has parsers (but not name and type analysis) for many other langauges, including C#. AFAIK, the other "C++ parsers" can't do this, sort of by definition. One can apply source-to-source transformations on any of these, or any mixture of these.
clang is worth looking into. it's fast and they provide apis to hook into their backend.
Xcode 4 uses clang for tasks such as parsing, error reporting/detection in some cases, auto-completion, and fix-its.
精彩评论