Remove previous lines then join when SED finds expression

2022-12-16 07:42 问答作者：

I'm trying to join sentences in a document, but some of the sentences have been split apart with an empty line in between. For example:

The dog chased after a ball
开发者_运维问答
that was thrown by its owner.

The ball travelled quite far.

to:

The dog chased after a ball that was thrown by its owner.

The ball travelled quite far.

I was thinking I could search for an empty line and then the beginning of the next line for a lower case character. It copies that line, removes it and the empty line above it, and then appends the copied sentence to the other broken sentence (sorry for the confusion).

I'm new to sed and tried it with this command:

sed "/$/{:a;N;s/\n\(^[a-z]* .*\)/ \1/;ba}"

But only does it once and only removes the empty line and not appending the 2nd half of the broken sentence to the first part.

Please help.

This should do the trick:

sed ':a;$!{N;N};s/\n\n\([a-z]\)/ \1/;ta;P;D' sentences

First time I used sed to perform such intricate replacements. It took me around 2 hours to come up with something :D

I used GNU sed as I wasn't able to get branching working on my mac on a single line.

Here is the input content I used for testing:

The dog chased after a ball

that was thrown by its owner.

The ball

travelled quite far.
I took me a while to fix this file.
And now it's

working :)

Then here is the sed command line I came up with:

$ sed -n '/^$/!bstore;/^$/N;s/\n$[a-z]$/ \1/;tmerge;h;d;:store;H;b;:merge;H;g;s/\n $[a-z]$/ \1/;p;s/.*//g;h;d' sentences.txt

And here is the output:

$ sed -n '/^$/!bstore;/^$/N;s/\n\([a-z]\)/ \1/;tmerge;h;d;:store;H;b;:merge;H;g;s/\n \([a-z]\)/ \1/;p;s/.*//g;h;d' sentences.txt

The dog chased after a ball that was thrown by its owner.

The ball travelled quite far.

I took me a while to fix this file.
And now it's working :)

You can notice there is an empty line inserted right at the beginning, but I think one can live with that. Please guys, comment on this one if you're mastering sed as this is just a novice shoot.

if you have Python, you can try this snippet

import string
f=0
data=open("file").readlines()
alen=len(data)
for n,line in enumerate(data):
    if line[0] in string.uppercase:
        found_upper=n
        f=1
    if f and line[0] in string.lowercase:
        data[found_upper] = data[found_upper].strip() + " " + line
        data[n]=""
    if n+1==alen:
        if line[0] in string.lowercase:
            data[found_upper] = data[found_upper].strip() + " " + line
            data[n]=""
        else : data[n]=line

output( added more scenarios of your file format)

$  cat file    
the start
THE START
The dog chased after a ball
that was thrown by its owner.

My ball travelled quite far




and it smashed the windows
but it didn't cause much damage


THE END
THE FINAL DESTINATION
final
FINAL DESTINATION LAST EPISODE
the final final

$ ./python.py
the start
THE START
The dog chased after a ball that was thrown by its owner.

My ball travelled quite far and it smashed the windows but it didn't cause much damage






THE END
THE FINAL DESTINATION final
FINAL DESTINATION LAST EPISODE the final final the final final

继续阅读：line sed

Remove previous lines then join when SED finds expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？