开发者

How can I write a regex to match a torrents title format?

I'm trying to match and break up a typical tv torrent's title:

MyTV.Show.S09E01.HDTV.XviD

MyTV.Show.S10E02.HDTV.XviD

MyTV.Show.901.HDTV.XviD

MyTV.Show.1102.HDTV.XviD

I'm trying to break these strings up into 3 capture groups for each entry: Title, Season, Episode.

I can handle the first 2 easy enough:

^([a-zA-Z0-9.]*)\.S([0-9]{1,2})E([0-9]{1,2}).*$

However, the third and fourth one prove difficult to break apart the season and episode. If I could work backwards it would be easier. For example, with "901", If I could work backwards it would be take the first to digits as the episode number, anything remaini开发者_运维问答ng before that is the season number.

Does anyone have any tips for how I can break these strings up into those relevant capture groups?


Here's what I would use:

(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)

Has capture groups:

1: Name
2: Season
3: Episode
4: The Rest

Here's some code in C# (courtesy of this post): see it live

using System;
using System.Text.RegularExpressions;

public class Test
{

    public static void Main()
    {
        string s = @"MyTV.Show.S09E01.HDTV.XviD
            MyTV.Show.S10E02.HDTV.XviD
            MyTV.Show.901.HDTV.XviD
            MyTV.Show.1102.HDTV.XviD";

        Extract(s);

    }

    private static readonly Regex rx = new Regex
        (@"(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)", RegexOptions.IgnoreCase);

    static void Extract(string text)
    {
        MatchCollection matches = rx.Matches(text);

        foreach (Match match in matches)
        {
            Console.WriteLine("Name: {0}, Season: {1}, Ep: {2}, Stuff: {3}\n",
                match.Groups[1].ToString().Trim(), match.Groups[2], 
                match.Groups[3], match.Groups[4].ToString().Trim());
        }
    }

}

Produces:

Name: MyTV.Show, Season: 09, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 10, Ep: 02, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 9, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 11, Ep: 02, Stuff: HDTV.XviD


Almost every media file I've ever seen that has come from a torrent had two-digit episodes. With that, you should be able to do E([0-9]{2}). instead and get the expression to match.

I'd estimate 99.9% of shows are marked with two digit episodes. If you're trying to write a script to easily label your own shows, I'd go with the two digit episode assumption and manually rename mistagged files you come across. If you're trying to write something for public consumption, you probably have a lot more syntaxes that you'll need to consider. I've seen this tried by other applications in the past, and all have worked just so-so. It's a hard problem that probably has no single solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜