Regex help: capture an entire line if it starts with a 1. or 2.

2022-12-12 20:12 问答作者：

I'm awfu开发者_StackOverflow社区l at regexes, but would love some help defining a rule that would take this text:

Il Cuccio, via Ronchi 43/b, 14047 Mombercelli, Asti.
Tel: 380 7277050 Fax: 0141 959282 E-mail: info@ilcuccio.it www.ilcuccio.it
Accommodation in communal room or tent. French and English spoken. Contact: Cristina Belotti.
Apicoltura Leida Barbara, Strada Crevenzolo 21, Viguzzolo, 15058 Alessandria.
Tel: 0131 899166 & 392 9078020 E-mail: barbaraleida@tiscali.it The farm, situated in the plains, is certified organic (CCPB).

and return the addresses, that is, the rest of the line past [1-9].

Extra points for a coherent explanation that would actually help me learn a tad.

EDIT : I'll show my work as I go, until someone else steps in. Right now I have ^\d+\. which is a startline, digits, period.

in ruby

mystring="1. Il Cuccio, via Ronchi 43/b, 14047 Mombercelli, Asti.  \nTel: 380 7277050  Fax: 0141 959282  E-mail: info@ilcuccio.it  www.ilcuccio.it  \nAccommodation in communal room or tent. French and English \nspoken. Contact: Cristina Belotti. \n\n2. Apicoltura Leida Barbara, Strada Crevenzolo 21, Viguzzolo, 15058 Alessandria.  \nTel: 0131 899166 & 392 9078020  E-mail: barbaraleida@tiscali.it \nThe farm, situated in the plains, is certified organic (CCPB).\n\n"

# scan returns a list like [['addr1'], ['addr2'], ['addr3'], ...]
puts mystring.scan(/^\d+\. (.+)$/)

output:

Il Cuccio, via Ronchi 43/b, 14047 Mombercelli, Asti.  
Apicoltura Leida Barbara, Strada Crevenzolo 21, Viguzzolo, 15058 Alessandria.

#!/usr/bin/perl
use strict; use warnings;

my $str = <<'EO_STR';
2. Il Cuccio, via Ronchi 43/b, 14047 Mombercelli, Asti.
Tel: 380 7277050  Fax: 0141 959282  E-mail: info@ilcuccio.it  www.ilcuccio.it
Accommodation in communal room or tent. French and English
spoken. Contact: Cristina Belotti.

3. Apicoltura Leida Barbara, Strada Crevenzolo 21, Viguzzolo, 15058 Alessandria.
Tel: 0131 899166 & 392 9078020  E-mail: barbaraleida@tiscali.it
The farm, situated in the plains, is certified organic (CCPB).
EO_STR

while ( $str =~ /^[0-9]\. ([^.]+)\./mg ) {
    print "$1\n";
}

As I understand, no . appears in the address part. So, the address is the part between the [0-9]\. and the following period. Therefore, the expression above captures all non-. characters between the [0-9]\. and the \. It uses the m modifier so ^ matches the beginning of each line rather than the beginning of the string. It uses the g modifier to go through each match in return.

If you just wanted to grab all captures:

my @addresses = $str =~ /^[0-9]\. ([^.]+)\./mg;

print $_, "\n" for @addresses;

You want something like:

/^[1-9]+\. (.*)$/

^ means to start at the beginning of the line.

[1-9] means any number 1-9, but I think you knew that one.

+ means that we want multiple of the previous items matched. ie the numbers 1-9.

\. means literally find a .

(.*) should grab anything left in the line and stick in a variable for you to use.

$ means the expression should go to the end of the line.

In perl you should be able to pull the address out of $1.

^\d+\. (.*?)

Meaning:

^       At line start
\d+     take one or more digits
\.      followed by a period character and a space
(.*?)   match (and remember) all characters until line end

You can test your regular expressions online at RegExr: Free Online RegEx Testing Tool

/^\d+.\s+(.+)$/

Assert position at the start of the string «^»
Match a single digit 0..9 «\d+»
- Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character "." literally «.»
Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.) «\s+»
- Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 1 «(.+)»
- Match any single character that is not a line break character «.+»
  - Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

I use RegexBuddy for all my regexing. It has excellent help and an easy testing interface to check how your regex will work on some sample text.

You really have two problems: finding the lines that start with numbers, and extracting the address portion. This little expression should find the lines:

^[[:space:]]*[[:digit:]]*\.[[:space:]]

The hat ("^") character matches the beginning of the line. This expression finds lines beginning with numbers and a period. It ignores any white space at the beginning.

The second problem - extracting the address - depends on the tool. For example, this Perl script prints only the address lines:

# perl -ne 'if (m/^\s*\d+\.\s*/) { s/^\s*\d+\.\s*//; print}' test.txt 

Il Cuccio, via Ronchi 43/b, 14047 Mombercelli, Asti.
Apicoltura Leida Barbara, Strada Crevenzolo 21, Viguzzolo, 15058 Alessandria.

The "\s" and "\d" are Perl shorthand for matching spaces (\s) and digits (\d). Same regular expression. It just fits neatly on one line.

I used the expression twice. The first time finds the lines to print. And the second is a "substitute" command. It replaces the first expression with the second. In this case, the second contains blank - essentially erasing the numbers.

what language are you using?? There is no need for regex. Here's an example in Python

myaddr="""2. Il Cuccio, via Ronchi 43/b, 14047 Mombercelli, Asti.
Tel: 380 7277050  Fax: 0141 959282  E-mail: info@ilcuccio.it  www.ilcuccio.it
Accommodation in communal room or tent. French and English
spoken. Contact: Cristina Belotti.
"""

print myaddr.split("\n",1)[0].split(" ",1)[-1]

It says, split the string on newlines (since your sample strings has newlines, right? ). Then get the first element of the splitted string. That will be your address part. Split on it again using spaces as delimiters and remove the first element , which is the digit. The rest will be your address. No regex needed. simple algorithm you can implement in your favourite language

PHP version:

$str = <<<EOF
2. Il Cuccio, via Ronchi 43/b, 14047 Mombercelli, Asti.
    Tel: 380 7277050  Fax: 0141 959282  E-mail: info@ilcuccio.it  www.ilcuccio.it
    Accommodation in communal room or tent. French and English
    spoken. Contact: Cristina Belotti.
EOF;

$s = explode("\n",$str,2);
$addr = explode(" ",$s[0]);
array_shift($addr);
print "Address is: " . implode($addr," ");

继续阅读：language-agnostic regex

Regex help: capture an entire line if it starts with a 1. or 2.

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？