rule based file parsing
I need to parse a file line by line on given rules.
Here is a requirement.
file can have multiple lines with different data..
01200344545143554145556524341232131
1120034454514355414555652434123213101200344545143554145556524341232131
2120034454514
and rules can be like this.
- if byte[0,1] == "0" then extract 开发者_JAVA技巧this line to /tmp/record0.dat
- if byte[0,1] == "1" then extract this line to /tmp/record1.dat
- if byte[0,1] == "2" then extract this line to /tmp/record2.dat
I am looking for any language which can do this in a fast manner with a very long file size like >2 GB.
Appreciate all the help in advance.
Thanks
It doesn't appear in your list of tags, but I'd use:
sed -n -e '/^0/w /tmp/record0.dat' \
-e '/^1/w /tmp/record1.dat' \
-e '/^2/w /tmp/record2.dat' "$@"
You can also do it in the other languages, but for conciseness and probable correctness, in this case, sed
is hard to beat.
This will work regardless of the value of the first character so it scales without having to add more rules:
awk '{c=substr($0,0,1); print $0 > "/tmp/record" c ".dat"}' inputfile.dat
awk -vFS= 'NF{print $0>"/tmp/record"$1".dat"}' file
精彩评论