How can I generate URL slugs in Perl?

2023-01-21 13:57 问答作者：

Web frameworks such as Rails and Django has built-in support for "slugs" which are used to generate readable and SEO-friendly URLs:

Slugs in Rails
Slugs in Django

A slug string typically contains only of the characters a-z, 开发者_如何学JAVA0-9 and - and can hence be written without URL-escaping (think "foo%20bar").

I'm looking for a Perl slug function that given any valid Unicode string will return a slug representation (a-z, 0-9 and -).

A super trivial slug function would be something along the lines of:

$input = lc($input),
$input =~ s/[^a-z0-9-]//g;

However, this implementation would not handle internationalization and accents (I want ë to become e). One way around this would be to enumerate all special cases, but that would not be very elegant. I'm looking for something more well thought out and general.

My question:

What is the most general/practical way to generate Django/Rails type slugs in Perl? This is how I solved the same problem in Java.

The slugify filter currently used in Django translates (roughly) to the following Perl code:

use Unicode::Normalize;

sub slugify($) {
    my ($input) = @_;

    $input = NFKD($input);         # Normalize (decompose) the Unicode string
    $input =~ tr/\000-\177//cd;    # Strip non-ASCII characters (>127)
    $input =~ s/[^\w\s-]//g;       # Remove all characters that are not word characters (includes _), spaces, or hyphens
    $input =~ s/^\s+|\s+$//g;      # Trim whitespace from both ends
    $input = lc($input);
    $input =~ s/[-\s]+/-/g;        # Replace all occurrences of spaces and hyphens with a single hyphen

    return $input;
}

Since you also want to change accented characters to unaccented ones, throwing in a call to unidecode (defined in Text::Unidecode) before stripping the non-ASCII characters seems to be your best bet (as pointed out by phaylon).

In that case, the function could look like:

use Unicode::Normalize;
use Text::Unidecode;

sub slugify_unidecode($) {
    my ($input) = @_;

    $input = NFC($input);          # Normalize (recompose) the Unicode string
    $input = unidecode($input);    # Convert non-ASCII characters to closest equivalents
    $input =~ s/[^\w\s-]//g;       # Remove all characters that are not word characters (includes _), spaces, or hyphens
    $input =~ s/^\s+|\s+$//g;      # Trim whitespace from both ends
    $input = lc($input);
    $input =~ s/[-\s]+/-/g;        # Replace all occurrences of spaces and hyphens with a single hyphen

    return $input;
}

The former works well for strings that are primarily ASCII, but falls short when the entire string is formed of non-ASCII characters, since they all get stripped out, leaving you with an empty string.

Sample output:

string        | slugify       | slugify_unidecode
-------------------------------------------------
hello world     hello world     hello world
北亰                            bei-jing
liberté         liberta         liberte

Note how 北亰 gets slugifies to nothing with the Django-inspired implementation. Note also the difference the NFC normalization makes -- liberté becomes 'liberta' with NFKD after stripping out the second part of the decomposed character, but would becomes 'libert' after stripping out the re-assembled 'é' with NFC.

Are you looking for something like Text::Unidecode?

String::Dirify is used for making slugs in the blogging software Movable Type/Melody.

Adding Text::Unaccent to the beginning of the chain looks like it will do what you want.

The most turn-key solution is using Text::Slugify which does what you need. It's a trivial amount of code which nicely provides a slugify function for you.

It relies on Text::Unaccent::PurePerl to remove accents from characters.

继续阅读：cpan perl seo url-rewriting

How can I generate URL slugs in Perl?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？