Regular Expression for parsing JSON like text
I have regular expression of the form:
Field1:Value
Field2:Value
Field3:Value
Field1:Value
Field2:Value
Field3:Value
Field1:Value
Field2:Value
Field3:Value
Field1:Value
Field2:Value
Field3:Value
Thing开发者_开发百科s to the left of the colon are standard alphabetical characters ([a-zA-Z]
) and the first character always starts with a capital letter. They can't be anything other than Field1 or Field2 or Field3. The value to the right, however, can span multiple lines and can contain any character: [a-zA-Z]
, white space, $
, %
, ^
, etc. I am trying for a regular expression that could match {Field1:value}{Field2:value}{Field3:value} separately in TCL.
In general, I'd work by parsing the data first into lines, then assigning to each line an interpretation (e.g., start line or continuation line), then combining the start lines with their following continuations (forming “logical” lines). Only once that was done would I then use an RE to split the key from the value. As a suggestion for format, try having the line be a continuation if it starts with a space. That's dead easy to implement and looks good in a file.
As code:
# Read the data from a file and split into lines
set f [open "filename"]
set lines [split [read $f] "\n"]
close $f
# Recombine into logical lines
set logicalLines {}
foreach realline $lines {
if {[regexp "^ (.*)" $realline -> tail]} {
append current "\n$tail"
} else {
if {[info exist current]} {
lappend logicalLines $current
}
set current $realline
}
}
lappend logicalLines $current ;# Assume at least one line :-)
# Parse the logical lines
foreach line $logicalLines {
if {[regexp {^([A-Z]\w+):(.*)$} $line -> key value]} {
# OK, got $key mapping to $value
} else {
# It's a bogus line; waaaah!
}
}
OK, you might have different rules for combining the lines, but by splitting things up into two stages like this, you make your life much easier. Similarly, it's possible to use a tighter test for line validity (replacing ([A-Z]\w+)
with (Field[123])
for example) but I'm not convinced it's actually sensible.
精彩评论