Combining multiple lines into one line
I have this use case of an xml fi开发者_如何学编程le with input like
Input:
<abc a="1">
<val>0.25</val>
</abc>
<abc a="2">
<val>0.25</val>
</abc>
<abc a="3">
<val>0.35</val>
</abc>
...
Output:
<abc a="1"><val>0.25</val></abc>
<abc a="2"><val>0.25</val></abc>
<abc a="3"><val>0.35</val></abc>
I have around 200K lines in a file in the Input format, how can I quickly convert this into output format.
In vim you could do this with
:g/<abc/ .,/<\/abc/ join!
Normally :join will add a space at the end of each line before joining, but the !
suppresses that.
In general I would recommend using a proper XML parsing library in a language like Python, Ruby or Perl for manipulating XML files (I recommend Python+ElementTree), but in this case it is simple enough to get away with using a regex solution.
In Vim:
- position on first line
qq
: start recording macrogJgJ
: joins next two lines without adding spacesj
: go downq
: stop recordingN@q
: N = number of lines (actually around 1/3rd of all lines as they get condensed on the go)
$ awk '
/<abc/ && NR > 1 {print ""}
{gsub(" +"," "); printf "%s",$0}
' file
<abc a="1"> <val>0.25</val></abc>
<abc a="2"> <val>0.25</val></abc>
<abc a="3"> <val>0.35</val></abc>
Bash:
while read s; do echo -n $s; read s; echo -n $s; read s; echo $s; done < file.xml
You can record a macro. Basically what I would do is begin with my cursor at the start of the first line. Press 'qa' (records macro to the a register). The press shift-V to being line-wise visual mode. Then search for the ending tag '//abc'. Then press shift-J to join the lines. Then you would have to move the cursor to the next tag, probably with 'j^' and press 'q' to stop recording. You can then rerun the recording with '@a' or specify 10000@a if you like. If the tags are different or not right after each other you just need to change how you find the opening and closing tags to searches or something like that.
sed '/^<abc/{N;N;s/\n\| //g}'
# remove \n or "space"
# Result
<abca="1"><val>0.25</val></abc>
<abca="2"><val>0.25</val></abc>
<abca="3"><val>0.35</val></abc>
inelegant perl one-liner which should do the trick, though not particularly quickly.
cat file | perl -e '
$x=0;
while(<>){
s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/g;
print;
$x++;
if($x==3){
print"\n";
$x=0;
}
}' > output
You can do this:
perl -e '$i=1; while(<>){chomp;$s.=$_;if($i%3==0){$s=~s{>\s+<}{><};print "$s\n";$s="";}$i++;}' file
sed '/<abc/,/<\/abc>/{:a;N;s/\n//g;s|<\/abc>|<\/abc>\n|g;H;ta}' file
tr "\n" " "<myfile|sed 's|<\/abc>|<\/abc>\n|g;s/[ \t]*<abc/<abc/g;s/>[ \t]*</></g'
This should work in ex mode:
:%s/\(^<abc.*>\)^M^\(.*\)^M^\(^<\/abc>\).*^M/\1\2\3^M/g
I should have extra spaces (or a tab in between the value), but you coud remove it depending on what it is (\t or \ \ \ \ ).
What you are searching/replacing is here is (pattern1)[enter](pattern2)[enter](pattern3)[enter] and replacing it with (pattern1)(pattern2)(pattern3)[enter]
The ^M is done with ctrl+v CTRL+m
精彩评论