How to extract comment out of header file using python, perl, or sed?
I have a header file like this:
/*
* APP 180-2 ALG-254/258/772 implementation
* Last update: 03/01/2006
* Issue date: 08/22/2004
*
* Copyright (C) 2006 Somebody's Name here
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of the project nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef HEADER_H
#define HEADER_H
/* More comments and C++ code here. */
#endif /* End of file. */
And I wish to extract out the contents of the first C style comment only and drop the " *" at the start of each line to get a file with the following contents:
APP 180-2 ALG-254/258/772 implementation
Last update: 03/01/2006
Issue date: 08/22/2004
Copyright (C) 2006 Somebody's Name here
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the dis开发者_JS百科tribution.
3. Neither the name of the project nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
Please suggest an easy way to do this with Python, Perl, sed, or some other way on Unix. Preferably as a one-liner.
This should work for you:
sed -n '/\*\//q; /^\/\*/d; s/^ \* \?//p' <file.h >comment.txt
Here's an explanation: sed (as you may know) is a command that goes through a file applying a list of rules to each line. Each rule consists of a "selector" and commands that are applied to that line only if the selector matches.
The first rule has the selector /\*\//
. This is a regular expression selector; it matches any line that contains the characters */
. Both of these need to be backslash-escaped since they have special meanings in a regexp. (I've assumed that this will only match the closing line of the comment in your case and that this entire line should be deleted.) The command is q
which means "quit." sed just stops. Ordinarily it would print out the line, but I provided the -n
option which means "don't print unless explicitly instructed to."
The second rule has the selector /^\/\*/
which is again a regexp selector that matches the characters /*
at the start of the line. Again, I've assumed this line will not contain part of the comment. The d
command tells sed to delete this line and move on.
The final rule has no selector, so it applies to all lines (unless a previous command prevented processing from reaching the final rule). The command in this last rule is a substitution command, s/PATTERN/REPLACEMENT/
, which finds text in the line that matches some pattern and replaces it with a replacement text. The pattern here is ^ \* \?
, which matches a space, an asterisk, and either 0 or 1 spaces, but only at the beginning of the line. And the replacement is nothing. So sed simply deletes the leading space-asterisk-(space)? sequence. The p
is actually a flag to the substitution command that tells sed to print out the result of the substitution. It's needed because of the -n
option.
Pyparsing includes a built-in pattern for matching comment formats from various languages. Using cStyleComment
and scanString
to find the first comment in the source file makes the rest just string functions:
c_src = open(c_source_file).read()
from pyparsing import cStyleComment
cmt = cStyleComment.scanString(c_src).next()[0][0]
lines = [l[3:] for l in cmt.splitlines()]
print '\n'.join(lines)
scanString
is a generator that returns each match before going to the next instance, so only the first comment gets processed. With your sample code, this returns:
APP 180-2 ALG-254/258/772 implementation
Last update: 03/01/2006
Issue date: 08/22/2004
Copyright (C) 2006 Somebody's Name here
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the project nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
sed -i -r "s/[\/\ ]{1}\*[\/\ ]?//g" YOURFILENAME
This replaces trims comments from your file, keeping the content. This will modify YOURFILENAME file though. If you don't want that remove -i from the line
精彩评论