Unix shell bash 'one-liner' to isolate all parentheses containing a URL that includes ".mp3"
I'm completely new to this Unix bash stuff — and first question here! Hope you guys can help:)
Problem:
I have a mass of messy web source code (wrapping/unformatted) containing multiple occurrences of:
('http://www.example.com/path/audio.mp3')
Could you please help with a one-liner (sed/awk...) that will isolate these occurrences of parentheses containing a URL that includes ".mp3", clean leading/trailing "()" and " ' " characters, and then print as list (one per line) to an active .txt file.
Note: The one-liner will be used in Automator on Mac as a开发者_运维问答 service/workflow to action on 'selected text.'
Any help would be greatly appreciated as (despite trawling through all the online tuts) I'm completely lost.
Best Regards,
Dave
Using egrep
with -o
(output only the parts that match) should do the trick. Try something like this:
egrep -o "http://[^'\"]+.mp3" FILENAME
PERL, which Mac should have.
#!/usr/bin/perl
while(<STDIN>)
{
$_ =~ /.*(http:\/\/.*\.mp3).*/;
print $1 . '\n';
}
Try to refine the following:
perl -ne $'while(/\(\'(http:\/\/[\w.\/]+?\.mp3)\'\)/g) { print "$1\n"; }' < input_file > output_file
It read stdin (here: input_file
) one line at a time, looks for every occurrence of a "url" in that line and prints it to stdout (here: output_file
) without ('
and ')
.
awk '{print $2}' FS="('|')" < filename
cat filename | tr ')' '\n' | awk '{print $2}' FS="('|')" > output.txt
Just replace filename
with the name of your file containing these lines..
OR
echo "your multiline\
text here" | tr ')' '\n' | awk '{print $2}' FS="('|')"
JUST A TRY:
tr ')' '\n' | awk '{print $2}' FS="('|')"
This will match the URLs that appear within parentheses and single quotes:
grep -Po "(?<=\(')http.*?mp3(?='\))"
The URLs are output, one per line, without the parentheses or single quotes. The -P
option for Perl-compatible regular expressions is available (at least) in GNU and OS X grep
versions.
精彩评论