开发者

Remove previous lines then join when SED finds expression

I'm trying to join sentences in a document, but some of the sentences have been split apart with an empty line in between. For example:

The dog chased after a ball

开发者_运维问答

that was thrown by its owner.

The ball travelled quite far.

to:

The dog chased after a ball that was thrown by its owner.

The ball travelled quite far.

I was thinking I could search for an empty line and then the beginning of the next line for a lower case character. It copies that line, removes it and the empty line above it, and then appends the copied sentence to the other broken sentence (sorry for the confusion).

I'm new to sed and tried it with this command:

sed "/$/{:a;N;s/\n\(^[a-z]* .*\)/ \1/;ba}"

But only does it once and only removes the empty line and not appending the 2nd half of the broken sentence to the first part.

Please help.


This should do the trick:

sed ':a;$!{N;N};s/\n\n\([a-z]\)/ \1/;ta;P;D' sentences


First time I used sed to perform such intricate replacements. It took me around 2 hours to come up with something :D

I used GNU sed as I wasn't able to get branching working on my mac on a single line.

Here is the input content I used for testing:

The dog chased after a ball

that was thrown by its owner.

The ball

travelled quite far.
I took me a while to fix this file.
And now it's

working :)

Then here is the sed command line I came up with:

$ sed -n '/^$/!bstore;/^$/N;s/\n\([a-z]\)/ \1/;tmerge;h;d;:store;H;b;:merge;H;g;s/\n \([a-z]\)/ \1/;p;s/.*//g;h;d' sentences.txt

And here is the output:

$ sed -n '/^$/!bstore;/^$/N;s/\n\([a-z]\)/ \1/;tmerge;h;d;:store;H;b;:merge;H;g;s/\n \([a-z]\)/ \1/;p;s/.*//g;h;d' sentences.txt

The dog chased after a ball that was thrown by its owner.

The ball travelled quite far.

I took me a while to fix this file.
And now it's working :)

You can notice there is an empty line inserted right at the beginning, but I think one can live with that. Please guys, comment on this one if you're mastering sed as this is just a novice shoot.


if you have Python, you can try this snippet

import string
f=0
data=open("file").readlines()
alen=len(data)
for n,line in enumerate(data):
    if line[0] in string.uppercase:
        found_upper=n
        f=1
    if f and line[0] in string.lowercase:
        data[found_upper] = data[found_upper].strip() + " " + line
        data[n]=""
    if n+1==alen:
        if line[0] in string.lowercase:
            data[found_upper] = data[found_upper].strip() + " " + line
            data[n]=""
        else : data[n]=line

output( added more scenarios of your file format)

$  cat file    
the start
THE START
The dog chased after a ball
that was thrown by its owner.

My ball travelled quite far




and it smashed the windows
but it didn't cause much damage


THE END
THE FINAL DESTINATION
final
FINAL DESTINATION LAST EPISODE
the final final

$ ./python.py
the start
THE START
The dog chased after a ball that was thrown by its owner.

My ball travelled quite far and it smashed the windows but it didn't cause much damage






THE END
THE FINAL DESTINATION final
FINAL DESTINATION LAST EPISODE the final final the final final
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜