script to convert date (month_name DD, YYYY) to (YYYY-MM-DD)
I have a text file with dates in the form: "date=month_name DD, YYYY" and "date=(month_name DD, YYYY)"
How can I convert these dates so they are in the form: "date=YYYY-MM-DD"?
I also have some dates preceded by the field name "accessdate=" or no field name, that I would like to convert.
Thanks.
ADDENDUM:
- The month names are are the ful开发者_开发百科l English month names e.g. January, February, etc.
- I would only like to convert the dates inside ref tags i.e. they would be surrounded by other text inside
<ref></ref>'
- I'm open to any language for the scripting. I've done a little bash, javascript & python. But I think awk, sed, perl, etc. would be also fine. Explanations of the code would be appreciated.
Depends on the tool you use.
E.g. with awk & sed you can do something like this:
awk '{
/date=(?Jan/ {print "s/\\(.\\+\\)date=(\\?month_name \\(\\d\\d\\), \\(\\d\\d\\d\\d\\))\\?\\(\.\\+\\)$/\\1date=\\3-01-\\2\\4"}
/date=(?Feb/ {print "s/\\(.\\+\\)date=(\\?month_name \\(\\d\\d\\), \\(\\d\\d\\d\\d\\))\\?\\(\.\\+\\)$/\\1date=\\3-02-\\2\\4"}
/date=(?Mar/ {print "s/\\(.\\+\\)date=(\\?month_name \\(\\d\\d\\), \\(\\d\\d\\d\\d\\))\\?\\(\.\\+\\)$/\\1date=\\3-03-\\2\\4"}
# ...
}' INPUT_FILE > tmp.sed
Then you can do an
sed -i.ORIG -f tmp.sed INPUT_FILE
Or you can write it in pure awk, by parsing $0.
You can begin with
echo 'date=April 13, 1985' | sed -e 's/January/01/' ... \
-e 's/April/04/' ... -e 's/December/12/' | \
sed 's/\([0-9]*\)[^0-9]\([0-9]*\)[^0-9] \([0-9]*\)$/\1-\2-\3/'
To handle "date=(month_name DD, YYYY)" you can also add sed 's/date=(\([^(]*\))/date=\1/'
to the pipe and so on.
Concerning your addendum. sed would not be enough to work with <ref></ref>
tag if it spans more then one line. So you have to use something more powerful. E.g. Python.
re.search()
can be used to find <ref>
and the matching </ref>
. Then re.match()
can be used to transform what's inside using the regexps similar to those used in sed. This algorithm have to be enclosed in a while
loop to traverse all the document.
精彩评论