Source to source manipulations

2023-03-28 15:58 问答作者：

I need to do some source-to-source manipulations in Linux kernel. I tried to use clang for this purpose but there is a problem. Clang does preprocessing of the source code, i.e. macro and include expansion. This causes clang to sometimes produce broken C code in terms of Linux kernel. I can't maintain all the changes manually, since I expect to have thousands of changes per single file.

I tried ANTLR, but the public grammars available are incomplete and not suitable for such projects as Linux kernel.

So my question is the following. Are there any ways to perform source-to-source manipulations for a C code without preprocessing 开发者_如何学运维it?

So assume following code.

#define AAA 1
void f1(int a){
    if(a == AAA)
        printf("hello");
}

After applying source-to-source manipulation I want to get this

#define AAA 1
void f1(int a){
    if(functionCall(a == AAA))
        printf("hello");
}

But Clang, for instance, produces following code which does not fit my requirements, i.e. it expands macro AAA

#define AAA 1
void f1(int a){
    if(functionCall(a == 1))
        printf("hello");
}

I hope I was clear enough.

Edit

The above code is only an example. The source-to-source manipulations I want to do are not restricted with if() statement substitution, but also inserting unary operator in front of expression, replace arithmetic expression with its positive or negative value, etc.

Solution

There is one solution I found for my self. I use gcc in order to produce preprocessed source code and then apply Clang. Then I don't have any issues with macro expansion and includes, since that job is done by gcc. Thanks for the answers!

You may consider http://coccinelle.lip6.fr/ : it provides a nice semantics patching framwork.

An idea would be to replace all occurrences of

if(a == AAA)

with

if(functionCall(a == AAA))

You can do this easily using, e.g., the sed tool.

If you have a finite collection of patterns to be replaced you can write a sed script to perform the substitution.

Would this solve your problem?

Handling the preprocessor is one of the most difficult problems in applying transformations to C (and C++) code.

Our DMS Software Reengineering Toolkit with its C Front End come relatively close to doing this. DMS can parse C source code, preserving most preprocessor conditionals, macro defintions and uses.

It does so by allow preprocessor actions in "well-structured" places. Examples: #defines are allowed where declarations or statements can occur, macro calls and conditionals as replacements for many of the nonterminals in the language (e.g., function head, expression, statement, declarations) and in many non-structured places that people commonly place them (e.g, #if fooif (...) {#endif). It parses the source code and preprocessor directives as if they were part of one language (they ARE, its called "C"), and builds corresponding ASTs, which can be transformed and will regenerate correctly with the captured preprocessor directives. [This level of capability handles OP's example perfectly.]

Some directives are poorly placed (both in the syntax sense, e.g., across multiple fragments of the language, and the "you've got to be kidding" understandability sense). These DMS handles by expanding them away, with some guidance from the advance engineer ("alway expand this macro"). A less satisfactory approach is to hand-convert the unstructured preprocessor conditionals/macro calls into structured ones; this is a bit painful but more workable than one might expect since the bad cases occur with considerably less frequency than the good ones.

To do better than this, one needs to have symbol tables and flow analysis that take into account the preprocessor conditions, and capture all the preprocessor conditionals. We've done some experimental work with DMS to capture conditional declarations in the symbol table (seems to work fine), and we're just starting work on a scheme for the latter.

Not easy being green.

Clang maintains extremely accurate information about the original source code.

Most notably, the SourceManager is able to tell if a given token has been expanded from a macro or written as is, and Chandler Caruth recently implemented macro diagnosis which are able to display the actual macro expansion stack (at the various stages of expansions) tracing back to the actual written code (3.0).

Therefore, it is possible to use the generated AST and then rewrite the source code with all its macros still in place. You would have to query virtually every node to know whether it comes from a macro expansion or not, and if it does retrieve the original code of the expansion, but still it seems possible.

There is a rewriter module in Clang
You can dig up Chandler's code on the macro diagnosis stack

So I guess you should have all you need :) (And hope so because I won't be able to help much more :p)

I would advise to resort to Rose framework. Source is available on github.

继续阅读：c clang linux-kernel parsing

Source to source manipulations

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？