开发者

Regular expression help parsing SQLIO output

I've been working on a regular expression to parse the output of a series of SQLIO runs. I've gotten pretty far, but not quite there yet. I'm seeking a 100% regex solution and no pre-manipulation of the input. Could anyone assist with a little guidance with the 开发者_JAVA百科following regular expression:

.*v(?<SQLIOVersion>\d\.\d).*\n.*\n(?<threads>\d*)\s.*for\s(?<Seconds>\d+).*\n.*using\s(?<clustersize>[0-9]*)KB.*\n.*\n.*size:\s(?<currentfilesize>\d+).*\n.*\n.*\n.*\n.*\s(?<IOs>\d*\.\d*).*\n.*\s(?<MBs>\d*\.\d*).*\n.*\n.*\s(?<MinLatency_ms>\d+).*\n.*\s(?<AvgLatency_ms>\d+).*\n.*\s(?<MaxLatency_ms>\d+).*\n.*\n.*\n\%\:..(?<ms>\d*\s+)*

Here's a snippet of the output - note the headers, which change during the SQLIO batch run: File


The problem appears to be here:

    using 8KB random IOs
    buffering set to use hardware disk cache (but not file cache)

After capturing the cluster size, you use .*\n to consume the second line before going on to capture the file size, but sometimes there's a third line:

    using 8KB random IOs
    enabling multiple I/Os per thread with 8 outstanding
    buffering set to use hardware disk cache (but not file cache)

I added (?:.*\n)? to the relevant section of the regex, and now it matches all 36 entries.

I know you want to go 100% regex, but have you considered writing the regex in extended format with comments (i.e., IgnorePatternWhitespace mode)? I would also recommend using more literal text in the regex to make it easier to follow. For example,

(?<threads>\d+) threads? reading for (?<Seconds>\d+) secs.*\n

instead of

(?<threads>\d*)\s.*for\s(?<Seconds>\d+).*\n

Unreadable code is unmaintainable code, and regexes need all the help they can get. :-/


The hell with counting lines, as long as the order doesn't change you can do the following. Oh, and using /x for big regex helps. ;)

qr§
^sqlio\s+v(?<SQLIOVersion>\d+\.\d+)

(?> # atomic match, dont backtrack in here when matched
.{0,400}? # dont match so far that we can get the next result
(?<threads>\d+)\s+thread)

(?>.{0,400}?
\b for\s+(?<Seconds>\d+)\s*sec)

(?>.{0,400}?
\b using\s+(?<clustersize>\d+)\s*KB)

(?>.{0,400}?
\b size:\s+(?<currentfilesize>\d+))

(?>.{0,400}?
\b IOs/sec\D*(?<IOs>\d+\.\d+))

(?>.{0,400}?
\b MBs/sec\D*(?<IOs>\d+\.\d+))

(?>.{0,400}?
\b Min_Latency\D*(?<MinLatency_ms>\d+))

(?>.{0,400}?
\b Avg_Latency\D*(?<AvgLatency_ms>\d+))

(?>.{0,400}?
\b Max_Latency\D*(?<MaxLatency_ms>\d+))

(?>.{0,400}?
^\%:\s*(?<ms>(?:\d+\s+)+))

§mixs

PCRE/Perl qr§§ used for quoting.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜