开发者

regex script in python or perl

It would really make my work easier if someone could help me with writing script in python or perl in which from given file it retreives all sentences like:

[LANG::...]
  • ... means anything

for ecxample:

[LANG::Sample text with digits 0123]

and writes it to the fileeach in single line.

Tha开发者_如何学运维nks very much for help

EDIT:

Thanks for help, and now something more advanced.

if it finds something like [:ANG:: ...] please write only ... without brackets ang LANG:: tag.

Thanks guys You are awesome :)


import re

with open('input.txt', 'w') as f:
    text = f.read()
#text = 'Intro [LANG::First text 1] goes on [LANG::Second text 2] and finishes.'

with open('output.txt', 'w') as f:
    for match in re.findall('\[LANG::.*?\]', text):
        f.write(match+'\n')

outputs:

[LANG::First text 1]
[LANG::Second text 2]

Second part of the question: if it finds something like [:ANG:: ...] please write only ... without brackets and LANG:: tag.

Change the last part to:

with open('output.txt', 'w') as f:
    for match in re.findall('\[.ANG::.*?\]', text):
        if match.startswith('[:ANG'):
            f.write(match[7:-1]+'\n')
        else:
            f.write(match+'\n')

Fix that substring part match[7:-1] to your needs.


perl version

perl -lne "print if /\[LANG::.+?\]/;" infile > outfile


Perl version (edited to get input from file):

#!/usr/bin/perl 

use strict;
use warnings;

open(my $in, '<', 'input.txt');
open(my $out, '>', 'output.txt');

while ( <$in> ) {
    my @found = /\[LANG::.*?\]/g;
    print $out "$_\n" for @found;
}


Perl

$ perl -nE'say $1 while /\[LANG::([^]]+)\]/g' input.txt >output.txt

Python

#!/usr/bin/env python
import fileinput, re

for line in fileinput.input():
    for match in re.findall(r'\[LANG::([^]]+)\]', line):
        print match

Usage: $ print-lang input.txt >output.txt

input.txt

井の中の蛙、大海を知らず [LANG::Japanese] a frog in a well cannot conceive 
of the ocean [LANG::English]

терпи казак, атаманом будешь [LANG::Russian] no pain, no gain [LANG::English]

output.txt

Japanese
English
Russian
English
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜