RegEx in VBA: Break a complex string into multiple tokens?
I am trying t开发者_如何学Co parse a line in a mmCIF Protein file into separate tokens using Excel 2000/2003. Worst case it COULD look something like this:
token1 token2 "token's 1a',1b'" 'token4"5"' 12 23.2 ? . 'token' tok'en to"ken
Which should become the following tokens:
token1
token2
token's 1a',1b' (note: the double quotes have disappeared)
token4"5" (note: the single quotes have disappeared)
12
23.2
?
.
token (note: the single quotes have disappeared)
to'ken
to"ken
I am looking to see if a RegEx is even possible to split this kind of line into tokens?
Nice puzzle. Thanks.
This pattern (aPatt below) gets the tokens separated, but I can't figure how to remove the outer quotes.
tallpaul() produces:
token1
token2
"token's 1a',1b'"
'token4"5"'
12
23.2
?
.
'token'
tok'en
to"ken
If you can figure out how to lose the outer quotes, please let us know. This needs a reference to "Microsoft VBScript Regular Expressions" to work.
Option Explicit
''returns a list of matches
Function RegExpTest(patrn, strng)
Dim regEx ' Create variable.
Set regEx = New RegExp ' Create a regular expression.
regEx.Pattern = patrn ' Set pattern.
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True ' Set global applicability.
Set RegExpTest = regEx.Execute(strng) ' Execute search.
End Function
Function tallpaul() As Boolean
Dim aString As String
Dim aPatt As String
Dim aMatch, aMatches
'' need to pad the string with leading and trailing spaces.
aString = " token1 token2 ""token's 1a',1b'"" 'token4""5""' 12 23.2 ? . 'token' tok'en to""ken "
aPatt = "(\s'[^']+'(?=\s))|(\s""[^""]+""(?=\s))|(\s[\w\?\.]+(?=\s))|(\s\S+(?=\s))"
Set aMatches = RegExpTest(aPatt, aString)
For Each aMatch In aMatches
Debug.Print aMatch.Value
Next
tallpaul = True
End Function
It is possible to do:
You'll need to reference "Microsoft VBScript Regular Expressions 5.5" in your VBA Project, then...
Private Sub REFinder(PatternString As String, StringToTest As String)
Set RE = New RegExp
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = PatternString
End With
Set Matches = RE.Execute(StringToTest)
For Each Match In Matches
Debug.Print Match.Value & " ~~~ " & Match.FirstIndex & " - " & Match.Length & " = " & Mid(StringToTest, Match.FirstIndex + 1, Match.Length)
''#You get a submatch for each of the other possible conditions (if using ORs)
For Each Item In Match.SubMatches
Debug.Print "Submatch:" & Item
Next Item
Debug.Print
Next Match
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set SubMatch = Nothing
End Sub
Sub DoIt()
''#This simply splits by space...
REFinder "([.^\w]+\s)|(.+$)", "Token1 Token2 65.56"
End Sub
This is obviously just a really simple example as I'm not very knowledgable of RegExp, it's more just to show you HOW it can be done in VBA (you'd probably also want to do something more useful than Debug.Print with the resulting tokens!). I'll have to leave writing the RegExp expression to somebody else I'm afraid!
Simon
精彩评论