Extracting word from file using grep or sed

2023-02-09 18:28 问答作者：

I have a file in the format below:

File                  : \\dvtbbnkapp115\nautilus\030db28a-f241-4054-a0e3-9bfa7e002535.dip was
 processed. 
Entries Found         : 0
Unarchived Documents  : 1 
            File Size : 1 K 

Error : The following line could not be processed.  Bad Document Type.

Error : Marketing and Contact preference change
        update||7000003735||078ef1f3-db6b-46a8-bb0d-c40bb2296ab5.pdf



File                  : \\dvtbbnkapp115\nautilus\078ef1f3-db6b-46a8-bb0d-c40bb2296ab5.dip was
 processed. 
Entries Found         : 0
Unarchived Documents  : 1 
            File Size : 1 K 

Error : The following line could not be processed.  Bad Document Type.

Error : Declined - Bureau Data (process)||7000003723|252204|2f1d71f4-052c-49f1-95cf-9ca9b4268f0c.pdf



File                  : \\dvtbbnkapp115\nautilus\2f1d71f4-052c-49f1-95cf-9ca9b4268f0c.dip was
 processed. 
Entries Found         : 0
Unarchived Documents  : 1 
            File Size : 1 K 

Error : The following line could not be processed.  Bad Document Type.

Error : Unable to call - please
        contact|40640510016710|7000003180||3e6a792f-c136-4a4b-a654-37f4476ccef8.pdf

I require to extract just the pdf file names after the double pipe and write them to a file. I am a novice when it comes to unix/sed/gr开发者_开发问答ep commands, i have tried but no luck? any ideas or examples i could use to extract the information above?

thanks

Give this a try if you only want PDF filenames if they follow double pipe characters and are the last thing on the line:

sed -n 's/.*||\([^|]*.pdf\)$/\1/p' inputfile

The second PDF filename in your example follows a single pipe character, but there is an earlier set of double pipes on that line. This should accommodate both styles of lines if the filename is the part that does not include any pipe characters:

sed -n 's/.*||.*|\([^|]*.pdf\)$/\1/p' inputfile

If your filenames consist on only hex digits and hyphens, you can be a little more selective like this:

sed -n 's/.*||.*|\([[:xdigit:]-]*.pdf\)$/\1/p' inputfile

If I understood correctly your request, this should do it:

grep -o -E "\|\|[^\|]*.pdf" < input | cut -f 3 -d "|"

grep looks for the lines containing double pipes,followed by a pdf name. cut, 'cuts' the line based on the delimiter, and selects the n-th field.

To get all pdf that are on a line with double pipe (not only after them):

grep "||" < input | cut -f 5 -d "|" > output

Edit: after seeing the comment I think you wanted something else, so I adjusted the answer. Putting both answer as it seems it is the simple case...

This will only extract the filenames that come immediately after a '||' sequence.

grep -o '||[^|]*\.pdf' YOUR_FILE | tr -d '|'

EDIT: I removed the ${...} to make it more readable.

Why not simply send your input through sed? Like this:

sed -n -e '/\|\|.*pdf$/ { s/.*\|\|//; p; }'

Ruby(1.9+)

$ ruby -F'\|\|' -ane 'print $F[-1] if $_["\.pdf"] && !$F[1].include?("|") ' file
078ef1f3-db6b-46a8-bb0d-c40bb2296ab5.pdf
3e6a792f-c136-4a4b-a654-37f4476ccef8.pdf

继续阅读：grep sed

Extracting word from file using grep or sed

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？