vbscript regex replace headache
I have a text file I'm trying to process with vbscript, it looks like this:
111 , , ,Yes ,Yes
222 , , ,Yes ,Yes
333 , , ,Yes ,Yes
444 , , ,Yes ,Yes
555 , , ,Yes ,Yes
666 , , ,Yes ,Yes
What I want is to remove the carriage returns and tabs, commas and 'yes' (or the regex "\t,\t,\t\t,Yes\t,Yes") to give this output:
('111','222','333','444','555','666')
I'm using this code:
Const ForReading = 1
Const ForWriting = 2
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(filePath, ForReading)
strText = objFile.ReadAll
objFile.Close
'chr(010) = line feed chr(013) = carriage return
strNewText = Replace(strText, "\t,\t,\t\t,Yes\t,Yes" & chr(013) & chr(010), "','")
Set objFile = objFSO.OpenTextFile(filePath, ForWriting)
objFile.WriteLine strNewText
objFile.Close
This isn't giving the desired output however, If I take the ""\t,\t,\t\t,Yes\t,Y开发者_JAVA百科es" &" out of the replace it removes the carriage returns, which is fine but I also need the commas tabs and 'yes' removed, as well as having a (' at the start and ') at the end. I'm guessing it's the way I've used the regex but I've not used much vbscript so I'm not sure
Instead of hunting down what you don't want, it's easier and less errorprone to concentrate on what you want:
Dim sExp : sExp = "('111','222','333','444','555','666')"
Dim aLines : aLines = Array( _
"111 , , ,Yes ,Yes" _
, "222 , , ,Yes ,Yes" _
, "333 , , ,Yes ,Yes" _
, "444 , , ,Yes ,Yes" _
, "555 , , ,Yes ,Yes" _
, "666 , , ,Yes ,Yes" _
)
Dim sAll : sAll = Join( aLines, vbCrLf )
WScript.Echo sAll
Dim reCut : Set reCut = New RegExp
reCut.Global = True
reCut.MultiLine = True
reCut.Pattern = "^\d+"
Dim oMTS : Set oMTS = reCut.Execute( sAll )
If 0 = oMTS.Count Then
WScript.Echo "Bingo A!"
Else
ReDim aNums( oMTS.Count - 1 )
Dim nI
For nI = 0 To UBound( aNums )
aNums( nI ) = oMTS( nI ).Value
Next
Dim sRes : sRes = "('" & Join( aNums, "','" ) & "')"
If sRes = sExp Then
WScript.Echo "QED:", sRes
Else
WScript.Echo "Bingo B!"
End If
End If
output:
111 , , ,Yes ,Yes
222 , , ,Yes ,Yes
333 , , ,Yes ,Yes
444 , , ,Yes ,Yes
555 , , ,Yes ,Yes
666 , , ,Yes ,Yes
QED: ('111','222','333','444','555','666')
Annotations:
I use an array to build my string to process (sAll). Your string (strText) comes from a file. So:
Dim sAll : sAll = Join( aLines, vbCrLf )
==>
Dim sAll : sAll = objFile.ReadAll
The string is parsed by an RegExp (reCut), its pattern ^\d+ looks for a sequence (+) of digits (\d) at the start (^) of a line (not the whole string; that's why the MultiLine attribute is set to True). The result of .Execute is a Match Collection (oMTS), containg Matches.
To make the the concatenation of the expected result easier, the values of the Matches are copied to an array (aNums).
The "('" & Join( aNums, "','" ) & "')"
expression combines the array's
elements using the separator (combinator?) ',' - to complete the result,
we need just a suitable head (' resp. tail ').
Try this
(.*?)(?:\s*,){3}Yes\s*,Yes\r?
you need to take care of the linebreaks, with Regexr \r
was fine. I put the line breaks into the regex because I wanted to have it optional using the ?
afterwards. Otherwise the last row will not be replaced if it does not end with a line break.
and replace it with
'$1',
Here you will get a additional comma at the end. I am at the moment not sure how to handle this.
$1
is the content of the first capturing group, in your case the part before the first comma should be in it.
See it here on Regexr
精彩评论