regex script in python or perl
It would really make my work easier if someone could help me with writing script in python or perl in which from given file it retreives all sentences like:
[LANG::...]
- ... means anything
for ecxample:
[LANG::Sample text with digits 0123]
and writes it to the fileeach in single line.
Tha开发者_如何学运维nks very much for help
EDIT:
Thanks for help, and now something more advanced.
if it finds something like [:ANG:: ...] please write only ... without brackets ang LANG:: tag.
Thanks guys You are awesome :)
import re
with open('input.txt', 'w') as f:
text = f.read()
#text = 'Intro [LANG::First text 1] goes on [LANG::Second text 2] and finishes.'
with open('output.txt', 'w') as f:
for match in re.findall('\[LANG::.*?\]', text):
f.write(match+'\n')
outputs:
[LANG::First text 1]
[LANG::Second text 2]
Second part of the question: if it finds something like [:ANG:: ...] please write only ... without brackets and LANG:: tag.
Change the last part to:
with open('output.txt', 'w') as f:
for match in re.findall('\[.ANG::.*?\]', text):
if match.startswith('[:ANG'):
f.write(match[7:-1]+'\n')
else:
f.write(match+'\n')
Fix that substring part match[7:-1]
to your needs.
perl version
perl -lne "print if /\[LANG::.+?\]/;" infile > outfile
Perl version (edited to get input from file):
#!/usr/bin/perl
use strict;
use warnings;
open(my $in, '<', 'input.txt');
open(my $out, '>', 'output.txt');
while ( <$in> ) {
my @found = /\[LANG::.*?\]/g;
print $out "$_\n" for @found;
}
Perl
$ perl -nE'say $1 while /\[LANG::([^]]+)\]/g' input.txt >output.txt
Python
#!/usr/bin/env python
import fileinput, re
for line in fileinput.input():
for match in re.findall(r'\[LANG::([^]]+)\]', line):
print match
Usage: $ print-lang input.txt >output.txt
input.txt
井の中の蛙、大海を知らず [LANG::Japanese] a frog in a well cannot conceive of the ocean [LANG::English] терпи казак, атаманом будешь [LANG::Russian] no pain, no gain [LANG::English]
output.txt
Japanese
English
Russian
English
精彩评论