Remove previous lines then join when SED finds expression
I'm trying to join sentences in a document, but some of the sentences have been split apart with an empty line in between. For example:
The dog chased after a ball
开发者_运维问答that was thrown by its owner.
The ball travelled quite far.
to:
The dog chased after a ball that was thrown by its owner.
The ball travelled quite far.
I was thinking I could search for an empty line and then the beginning of the next line for a lower case character. It copies that line, removes it and the empty line above it, and then appends the copied sentence to the other broken sentence (sorry for the confusion).
I'm new to sed and tried it with this command:
sed "/$/{:a;N;s/\n\(^[a-z]* .*\)/ \1/;ba}"
But only does it once and only removes the empty line and not appending the 2nd half of the broken sentence to the first part.
Please help.
This should do the trick:
sed ':a;$!{N;N};s/\n\n\([a-z]\)/ \1/;ta;P;D' sentences
First time I used sed to perform such intricate replacements. It took me around 2 hours to come up with something :D
I used GNU sed
as I wasn't able to get branching working on my mac on a single line.
Here is the input content I used for testing:
The dog chased after a ball
that was thrown by its owner.
The ball
travelled quite far.
I took me a while to fix this file.
And now it's
working :)
Then here is the sed
command line I came up with:
$ sed -n '/^$/!bstore;/^$/N;s/\n\([a-z]\)/ \1/;tmerge;h;d;:store;H;b;:merge;H;g;s/\n \([a-z]\)/ \1/;p;s/.*//g;h;d' sentences.txt
And here is the output:
$ sed -n '/^$/!bstore;/^$/N;s/\n\([a-z]\)/ \1/;tmerge;h;d;:store;H;b;:merge;H;g;s/\n \([a-z]\)/ \1/;p;s/.*//g;h;d' sentences.txt
The dog chased after a ball that was thrown by its owner.
The ball travelled quite far.
I took me a while to fix this file.
And now it's working :)
You can notice there is an empty line inserted right at the beginning, but I think one can live with that. Please guys, comment on this one if you're mastering sed
as this is just a novice shoot.
if you have Python, you can try this snippet
import string
f=0
data=open("file").readlines()
alen=len(data)
for n,line in enumerate(data):
if line[0] in string.uppercase:
found_upper=n
f=1
if f and line[0] in string.lowercase:
data[found_upper] = data[found_upper].strip() + " " + line
data[n]=""
if n+1==alen:
if line[0] in string.lowercase:
data[found_upper] = data[found_upper].strip() + " " + line
data[n]=""
else : data[n]=line
output( added more scenarios of your file format)
$ cat file
the start
THE START
The dog chased after a ball
that was thrown by its owner.
My ball travelled quite far
and it smashed the windows
but it didn't cause much damage
THE END
THE FINAL DESTINATION
final
FINAL DESTINATION LAST EPISODE
the final final
$ ./python.py
the start
THE START
The dog chased after a ball that was thrown by its owner.
My ball travelled quite far and it smashed the windows but it didn't cause much damage
THE END
THE FINAL DESTINATION final
FINAL DESTINATION LAST EPISODE the final final the final final
精彩评论