SED to keep text between tags over multiple lines
I am very new to sed and so even with looking at examples I am totally at a loss as how to go about writing the correct code for my need (this one is close but it seems not for multi-line replacement.
Here is my input.txt
This is a test of splitting...
|firstword|secondwordthirdword fourthwordfifthwordsixthword
This is a test of splitting...
firstword|secondword|thirdword fourthwordfifthwordsixthword
This is a test of splitting...
firstwordsecondword|thirdword| fourthwordfifthwordsixthword
This is a test of splitting...
firstwordsecondwordthirdword |fourthword|fifthwordsixthword
This is a test of splitting...
firstwordsecondwordthirdword fourthword|fifthword|sixthword
This is a test of splitting...
firstwordsecondwordthirdword fourthwordfifthword|sixthword|
What I need to do is remove all text outside of the two "|" and keep the text inside of the two "|"
And then insert a Unicode zero-width-space between each of the words (U+200B)
Resulting in:
firstwordU+200BsecondwordU+200BthirdwordU+200BfourthwordU+200BfifthwordU+200Bsixthword
I tried
sed '\|/d;/|/,$开发者_StackOverflowd' input.txt
UPDATE: Which doesn't do much
And
sed -e 's/.*|\([^]]*\)|.*/\1/g' input.txt
Which comes close, but doesn't remove anything from lines that do not contain a "|" (I need to remove everything not contained inside two "|" And I don't know how to go about adding the zero-width-space between words. But like I said, I don't really know what I am doing.
Any help would be much appreciated.
-Nathan
If you are happy with the results of
sed -e 's/.*|\([^]]*\)|.*/\1/g' input.txt
other than its failure to remove lines that do not contain the delimiters, then just do:
sed -n -e 's/.*|\([^]]*\)|.*/\1/gp' input.txt
to only print lines in which the replace happens. Or, you can explicitly delete the unwanted lines:
sed -e '/|.*|/!d' -e 's/.*|\([^]]*\)|.*/\1/g'
精彩评论