开发者

Is there any way to specify "any character but [aeiou]" in a PHP regular expression?

I am developing a C# Web Service whose responses are always collections of things. Since I am too lazy and do not want to explicitly define collections of things, I implemented a generic class representing a collection of things that can be serialized using XML.

Now, ASP.NET usually gives horrible names to generics, such as CollectionOfOrdenPago (in Spanish, "orden de pago" means "payment order") or PageOfLineaDetalleReporte (in Spanish, "línea de detalle de reporte" means "report detail line"). I wanted to give my collections more sensible names like OrdenesPago ("payment orders") or LineasDetalleReporte ("report detail lines"), so I defined the following method:

internal static string Pluralize(string input)
{
    int i = 0;
    while (++i < input.Length)
        if (!char.IsLower(input[i]))
            break;

    StringBuilder builder = new StringBuilder(input);
    if ("aeiou".IndexOf(input[i - 1]) == -1)
        builder.Insert(i++, 'e');
    builder.Insert(i, 's');

    return builder.ToString();
}

This Web Service is consumed by a PHP website, which I am developing as well. Since pluralizing a noun does not seem to be a good reason to call the C# Web Service, I reimplemented the Pluralize function in PHP:

function pluralize($element) {
    return preg_replace_callback('/^([A-Z][a-z]*)([A-Z]|$)/', function($args) {
        // If the first word ends i开发者_开发知识库n consonant, append "e" first. After that, append "s".
        return preg_replace('/([B-DF-HJ-NP-TV-Z])$/i', '\1e', $args[1]) . "s{$args[2]}";
    }, $element);
}

But I am still not happy. The term [B-DF-HJ-NP-TV-Z] is ugly. As in the C# method, I would like to specify "a character not in [aeiou]" as a term. Is that possible?


Use a negated character class

[^AEIOU]

instead of [B-DF-HJ-NP-TV-Z].

N.B. as per @fireeyedboy's comment, this regex matches non-alphabetic characters as well.


Sure. A caret (^) negates a character class:

/[^aeiou]/i


First, your string needs to be in Normalization Form D. Otherwise you will miss things like María, Ángeles, Argüelles, and Bogotá. Here’s an example in Perl:

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;    
use Unicode::Normalize qw(NFD NFC);    
binmode(STDOUT, ":utf8") || die;    
my @names = qw(María Ángeles Argüelles Bogotá cáñamo);
for my $orig ("@names", @names) {
    my $nfd = NFD($orig);
    $nfd =~ s/( (?: (?! [aeiou] ) (?= \pL ) \X ) +)/<$1>/xig;
    print NFC($nfd), "\n";
}

When run, that prints out this:

<M>a<r>ía Á<ng>e<l>e<s> A<rg>üe<ll>e<s> <B>o<g>o<t>á <c>á<ñ>a<m>o
<M>a<r>ía
Á<ng>e<l>e<s>
A<rg>üe<ll>e<s>
<B>o<g>o<t>á
<c>á<ñ>a<m>o

I don’t how to pull in the needed NFD function in PHP, but the rest should be completely transferable once you figure that part out.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜