Regular expression help parsing SQLIO output
I've been working on a regular expression to parse the output of a series of SQLIO runs. I've gotten pretty far, but not quite there yet. I'm seeking a 100% regex solution and no pre-manipulation of the input. Could anyone assist with a little guidance with the 开发者_JAVA百科following regular expression:
.*v(?<SQLIOVersion>\d\.\d).*\n.*\n(?<threads>\d*)\s.*for\s(?<Seconds>\d+).*\n.*using\s(?<clustersize>[0-9]*)KB.*\n.*\n.*size:\s(?<currentfilesize>\d+).*\n.*\n.*\n.*\n.*\s(?<IOs>\d*\.\d*).*\n.*\s(?<MBs>\d*\.\d*).*\n.*\n.*\s(?<MinLatency_ms>\d+).*\n.*\s(?<AvgLatency_ms>\d+).*\n.*\s(?<MaxLatency_ms>\d+).*\n.*\n.*\n\%\:..(?<ms>\d*\s+)*
Here's a snippet of the output - note the headers, which change during the SQLIO batch run: File
The problem appears to be here:
using 8KB random IOs
buffering set to use hardware disk cache (but not file cache)
After capturing the cluster size, you use .*\n
to consume the second line before going on to capture the file size, but sometimes there's a third line:
using 8KB random IOs
enabling multiple I/Os per thread with 8 outstanding
buffering set to use hardware disk cache (but not file cache)
I added (?:.*\n)?
to the relevant section of the regex, and now it matches all 36 entries.
I know you want to go 100% regex, but have you considered writing the regex in extended format with comments (i.e., IgnorePatternWhitespace mode)? I would also recommend using more literal text in the regex to make it easier to follow. For example,
(?<threads>\d+) threads? reading for (?<Seconds>\d+) secs.*\n
instead of
(?<threads>\d*)\s.*for\s(?<Seconds>\d+).*\n
Unreadable code is unmaintainable code, and regexes need all the help they can get. :-/
The hell with counting lines, as long as the order doesn't change you can do the following. Oh, and using /x for big regex helps. ;)
qr§
^sqlio\s+v(?<SQLIOVersion>\d+\.\d+)
(?> # atomic match, dont backtrack in here when matched
.{0,400}? # dont match so far that we can get the next result
(?<threads>\d+)\s+thread)
(?>.{0,400}?
\b for\s+(?<Seconds>\d+)\s*sec)
(?>.{0,400}?
\b using\s+(?<clustersize>\d+)\s*KB)
(?>.{0,400}?
\b size:\s+(?<currentfilesize>\d+))
(?>.{0,400}?
\b IOs/sec\D*(?<IOs>\d+\.\d+))
(?>.{0,400}?
\b MBs/sec\D*(?<IOs>\d+\.\d+))
(?>.{0,400}?
\b Min_Latency\D*(?<MinLatency_ms>\d+))
(?>.{0,400}?
\b Avg_Latency\D*(?<AvgLatency_ms>\d+))
(?>.{0,400}?
\b Max_Latency\D*(?<MaxLatency_ms>\d+))
(?>.{0,400}?
^\%:\s*(?<ms>(?:\d+\s+)+))
§mixs
PCRE/Perl qr§§ used for quoting.
精彩评论