开发者

What is AST,CFG,CLANG, how can we use it in deadcode removal algorithm? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I am about to write a dead-code removal algorithm using C language for an online event with our team.

The requirements are.....

  1. To read a C program source file,Which has many forms of dead-codes.
  2. And our output should be a file, Which is free from all dead-codes.

While surfing the internet, we came across the SO links...

How can I know which parts in the code are never used?

Dead code detection in legacy C/C++ project

Before seeing these links,we had the basic idea... Reading the input C file, line by line using normal file stream and store in an string array. Then to analyze those strings and determine the very basic dead codes like if(0) and if(1) etc.. And making a stack, for maintaining the parenthesis. And so more...

But this has a great problem, that this idea will lead us to do more with string operations rather than removing dead-codes.

But After seeing these link... We came to know about Clang library,Abstract Syntax Tree,Control-Flow-Graph etc...

But we are开发者_StackOverflow社区 very newbie to those libraries and those concepts. We came to know that they are used to parse the C code.

Hence we need some basic ideas about these AST,CFG and some basic guidance, explaining how can we use that in our code...

Can we include that clang library as a normal library like math.h?

Where can we download that library?

Does we can use those Clang libraries in windows?


I can explain to you the concept of control flow graphs, but I am not familiar with the library itself.

The concept is simple. Imagine any sequential lines of code (that is without if, goto or function call or labels) as one node of a graph. Every goto or function call creates a directional link from the current node to the node where the goto label is or the function it is calling. Remember that a function itself could be a graph and not a simple node, because it may have ifs or other function calls inside. Each function call also creates a directional link from leaf nodes of the function (where the function returns) to the node containing the codes right after the function call. (That can create a lot of links outgoing from the function graph because the function can be called in many parts of the code)

Likewise, if you have an if, you have two direction links from the current node to both the if part and the else part of the if statement (unless you detect if(0) or if(1) like you said in which case there is only one link to the proper location)

The root of your graph is the entry point of main. Now what you must do to find dead code is to simply traverse the graph from the root position (using DFS or BFS for example) and in the end see which nodes were NOT visited. This shows you the dead codes, that is places in the code that no matter what direction your program takes, it won't reach those locations.

If you want to implement this yourself, you can take a recursive approach (similar to parsing the code but simpler). For example if you see an if you say:

typedef char *line;
FlowGraph *get_flow_graph(line *code)
{
    FlowGraph *current_node = malloc(sizeof *current_node);
    current_node->flow_to = malloc(some_maximum * sizeof *current_node->flow_to);
    current_node->flow_to_count = 0;
    ...
    if (is_if_statement(code[0]))
    {
        FlowGraph *if_part = get_flow_graph(code + 1);
        FlowGraph *else_part = get_flow_graph(code + find_matching_else(code));
        current_node->flow_to[current_node->flow_to_count++] = if_part;
        current_node->flow_to[current_node->flow_to_count++] = else_part;
    }
    else
    ...
}


You can see examples of control and data flow graphs extracted by the DMS Software Reengineering Toolkit.

We have done this on very large applications (26 million lines of C) using DMS's data flow analysis machinery and its C Front End, including a points-to analysis, which is a practical necessity if you really want to find dead functions in a large C system.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜