开发者

SED to keep text between tags over multiple lines

I am very new to sed and so even with looking at examples I am totally at a loss as how to go about writing the correct code for my need (this one is close but it seems not for multi-line replacement.

Here is my input.txt

This is a test of splitting...

|firstword|secondwordthirdword fourthwordfifthwordsixthword

This is a test of splitting...

firstword|secondword|thirdword fourthwordfifthwordsixthword

This is a test of splitting...

firstwordsecondword|thirdword| fourthwordfifthwordsixthword

This is a test of splitting...

firstwordsecondwordthirdword |fourthword|fifthwordsixthword

This is a test of splitting...

firstwordsecondwordthirdword fourthword|fifthword|sixthword

This is a test of splitting...

firstwordsecondwordthirdword fourthwordfifthword|sixthword|

What I need to do is remove all text outside of the two "|" and keep the text inside of the two "|"

And then insert a Unicode zero-width-space between each of the words (U+200B)

Resulting in:

firstwordU+200BsecondwordU+200BthirdwordU+200BfourthwordU+200BfifthwordU+200Bsixthword

I tried

sed '\|/d;/|/,$开发者_StackOverflowd' input.txt

UPDATE: Which doesn't do much

And

sed -e 's/.*|\([^]]*\)|.*/\1/g' input.txt

Which comes close, but doesn't remove anything from lines that do not contain a "|" (I need to remove everything not contained inside two "|" And I don't know how to go about adding the zero-width-space between words. But like I said, I don't really know what I am doing.

Any help would be much appreciated.

-Nathan


If you are happy with the results of

sed -e 's/.*|\([^]]*\)|.*/\1/g' input.txt

other than its failure to remove lines that do not contain the delimiters, then just do:

sed -n -e 's/.*|\([^]]*\)|.*/\1/gp' input.txt

to only print lines in which the replace happens. Or, you can explicitly delete the unwanted lines:

sed -e '/|.*|/!d' -e 's/.*|\([^]]*\)|.*/\1/g'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜