开发者

How can I delete all /* */ comments from a C source file?

I have a C file which I copied from somewhere else, but it has a lot of comments like below:

int matrix[20];
/开发者_运维问答* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)

How can I delete all the comments enclosed by /* and */. Sometimes, the comments are consist of 4-5 lines, and i need to delete all those lines.

Basically, I need to delete all text between /* and */ and even \n can come in between. Please help me do this using one of sed, awk or perl.


Why not just use the c preprocessor to do this? Why are you confining yourself to a home-grown regex?

[Edit] This approach also handles Barts printf(".../*...") scenario cleanly

Example:

[File: t.c]
/* This is a comment */
int main () {
    /* 
     * This
     * is 
     * a
     * multiline
     * comment
     */
    int f = 42;
    /*
     * More comments
     */
    return 0;
}

.

$ cpp -P t.c
int main () {







    int f = 42;



    return 0;
}

Or you can remove the whitespace and condense everything

$ cpp -P t.c | egrep -v "^[ \t]*$"
int main () {
    int f = 42;
    return 0;
}

No use re-inventing the wheel, is there?

[Edit] If you want to not expand included files and macroa by this approach, cpp provides flags for this. Consider:

[File: t.c]

#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

.

$ cpp -P -fpreprocessed t.c | grep -v "^[ \t]*$"
#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

There is a slight caveat in that macro expansion can be avoided, but the original definition of the macro is stripped from the source.


See perlfaq6. It's quite a complex scenario.

$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;

A word of warning - once you've done this, do you have a test scenario to prove to yourself that you've just removed the comments and nothing valuable ? If you're running such a powerful regexp I'd ensure some sort of test (even if you simply record the behaviour before/afterwards).


Take a look at the strip_comments routine in Inline::Filters:

sub strip_comments {
    my ($txt, $opn, $cls, @quotes) = @_;
    my $i = -1;
    while (++$i < length $txt) {
    my $closer;
        if (grep {my $r=substr($txt,$i,length($_)) eq $_; $closer=$_ if $r; $r}
        @quotes) {
        $i = skip_quoted($txt, $i, $closer);
        next;
        }
        if (substr($txt, $i, length($opn)) eq $opn) {
        my $e = index($txt, $cls, $i) + length($cls);
        substr($txt, $i, $e-$i) =~ s/[^\n]/ /g;
        $i--;
        next;
        }
    }
    return $txt;
}


Please do not use cpp for this unless you understand the ramifications:

$ cat t.c
#include <stdio.h>

#define MSG "Hello World"

int main(void) {
    /* ANNOY: print MSG using the puts function */
    puts(MSG);
    return 0;
}

Now, let's run it through cpp:

$ cpp -P t.c -fpreprocessed


#include <stdio.h>



int main(void) {


    puts(MSG);
    return 0;
}

Clearly, this file is no longer going to compile.


Consider:

printf("... /* ...");
int matrix[20];
printf("... */ ...");

In other words: I wouldn't use regex for this task, unless you're doing a replace-once and are positive that the above does not occur.


You MUST use a C preprocessor for this in combination with other tools to temporarily disable specific preprocessor functionality like expanding #defines or #includes, all other approaches will fail in edge cases. This will work for all cases:

[ $# -eq 2 ] && arg="$1" || arg=""
eval file="\$$#"
sed 's/a/aA/g;s/__/aB/g;s/#/aC/g' "$file" |
          gcc -P -E $arg - |
          sed 's/aC/#/g;s/aB/__/g;s/aA/a/g'

Put it in a shell script and call it with the name of the file you want parsed, optionally prefixed by a flag like "-ansi" to specify the C standard to apply.


Try this on the command line (replacing 'file-names' with the list of file that need to be processed):

perl -i -wpe 'BEGIN{undef $/} s!/\*.*?\*/!!sg' file-names

This program changes the files in-place (overwriting the original file with the corrected output). If you just want the output without changing the original files, omit the '-i' switch.

Explanation:

perl -- call the perl interpreter
-i      switch to 'change-in-place' mode.
-w      print warnings to STDOUT (if there are any)
 p      read the files and print $_ for each record; like while(<>){ ...; print $_;}
 e      process the following argument as a program (once for each input record)

BEGIN{undef $/} --- process whole files instead of individual lines.
s!      search and replace ...
  /\*     the starting /* marker
  .*?     followed by any text (not gredy search)
  \*/     followed by the */ marker
!!      replace by the empty string (i.e. remove comments)  
  s     treat newline characters \n like normal characters (remove multi-line comments)
   g    repeat as necessary to process all comments.

file-names   list of files to be processed.


When I want something short and simple for CSS, I use this:

awk -vRS='*/' '{gsub(/\/\*.*/,"")}1' FILE

This won't handle the case where comment delimiters appear inside strings but it's much simpler than a solution that does. Obviously it's not bulletproof or suitable for everything but you know better than the pedants on SO whether or not you can live with that.

I believe this one is bulletproof however.


Try the below recursive way of finding and removing Java script type comments, XML type Comments and single line comments

/* This is a multi line js comments.

Please remove me*/

for f in find pages/ -name "*.*"; do perl -i -wpe 'BEGIN{undef $/} s!/*.*?*/!!sg' $f; done

<!-- This is a multi line xml comments.

Please remove me -->

for f in find pages/ -name "*.*"; do perl -i -wpe 'BEGIN{undef $/} s!<!--.*?-->!!sg' $f; done

//This is single line comment Please remove me.

for f in find pages/ -name "*.*"; do sed -i 's///.*//' $f; done

Note : pages is a root directory and the above script will find and remove in all files located in root and sub directories as well.


very simplistic example using gawk. Please test a lot of times before implementing. Of course it doesn't take care of the other comment style // (in C++??)

$ more file
int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)
/*
function(){
 blah blah
}
*/
float a;
float b;

$ awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' file
int matrix[20];


for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;


for (index = 0; index < 5 ;index++)


float a;
float b;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜