开发者

Source to source manipulations

I need to do some source-to-source manipulations in Linux kernel. I tried to use clang for this purpose but there is a problem. Clang does preprocessing of the source code, i.e. macro and include expansion. This causes clang to sometimes produce broken C code in terms of Linux kernel. I can't maintain all the changes manually, since I expect to have thousands of changes per single file.

I tried ANTLR, but the public grammars available are incomplete and not suitable for such projects as Linux kernel.

So my question is the following. Are there any ways to perform source-to-source manipulations for a C code without preprocessing 开发者_如何学运维it?

So assume following code.

#define AAA 1
void f1(int a){
    if(a == AAA)
        printf("hello");
}

After applying source-to-source manipulation I want to get this

#define AAA 1
void f1(int a){
    if(functionCall(a == AAA))
        printf("hello");
}

But Clang, for instance, produces following code which does not fit my requirements, i.e. it expands macro AAA

#define AAA 1
void f1(int a){
    if(functionCall(a == 1))
        printf("hello");
}

I hope I was clear enough.

Edit

The above code is only an example. The source-to-source manipulations I want to do are not restricted with if() statement substitution, but also inserting unary operator in front of expression, replace arithmetic expression with its positive or negative value, etc.

Solution

There is one solution I found for my self. I use gcc in order to produce preprocessed source code and then apply Clang. Then I don't have any issues with macro expansion and includes, since that job is done by gcc. Thanks for the answers!


You may consider http://coccinelle.lip6.fr/ : it provides a nice semantics patching framwork.


An idea would be to replace all occurrences of

if(a == AAA)

with

if(functionCall(a == AAA))

You can do this easily using, e.g., the sed tool.

If you have a finite collection of patterns to be replaced you can write a sed script to perform the substitution.

Would this solve your problem?


Handling the preprocessor is one of the most difficult problems in applying transformations to C (and C++) code.

Our DMS Software Reengineering Toolkit with its C Front End come relatively close to doing this. DMS can parse C source code, preserving most preprocessor conditionals, macro defintions and uses.

It does so by allow preprocessor actions in "well-structured" places. Examples: #defines are allowed where declarations or statements can occur, macro calls and conditionals as replacements for many of the nonterminals in the language (e.g., function head, expression, statement, declarations) and in many non-structured places that people commonly place them (e.g, #if fooif (...) {#endif). It parses the source code and preprocessor directives as if they were part of one language (they ARE, its called "C"), and builds corresponding ASTs, which can be transformed and will regenerate correctly with the captured preprocessor directives. [This level of capability handles OP's example perfectly.]

Some directives are poorly placed (both in the syntax sense, e.g., across multiple fragments of the language, and the "you've got to be kidding" understandability sense). These DMS handles by expanding them away, with some guidance from the advance engineer ("alway expand this macro"). A less satisfactory approach is to hand-convert the unstructured preprocessor conditionals/macro calls into structured ones; this is a bit painful but more workable than one might expect since the bad cases occur with considerably less frequency than the good ones.

To do better than this, one needs to have symbol tables and flow analysis that take into account the preprocessor conditions, and capture all the preprocessor conditionals. We've done some experimental work with DMS to capture conditional declarations in the symbol table (seems to work fine), and we're just starting work on a scheme for the latter.

Not easy being green.


Clang maintains extremely accurate information about the original source code.

Most notably, the SourceManager is able to tell if a given token has been expanded from a macro or written as is, and Chandler Caruth recently implemented macro diagnosis which are able to display the actual macro expansion stack (at the various stages of expansions) tracing back to the actual written code (3.0).

Therefore, it is possible to use the generated AST and then rewrite the source code with all its macros still in place. You would have to query virtually every node to know whether it comes from a macro expansion or not, and if it does retrieve the original code of the expansion, but still it seems possible.

  • There is a rewriter module in Clang
  • You can dig up Chandler's code on the macro diagnosis stack

So I guess you should have all you need :) (And hope so because I won't be able to help much more :p)


I would advise to resort to Rose framework. Source is available on github.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜