Excel string manipulation to check data consistency
Background information: - There are nearly 7000 individuals and there is data about their performances in one, two or three test开发者_JS百科s.
Every individual has taken the 1st test (let's call it Test M). Some of those who have taken Test M have also taken Test I, and some of those who have taken Test I have also taken Test B.
For the first two tests (M and I), students can score grades I, II, or III. Depending on the grades they are awarded points -- 3 for grade I, 2 for II, 1 for III.
The last Test B is just a pass or a fail result with no grades. Those passing this test get 1 point, with no points for failure. (Well actually, grades are awarded, but all grades are given a common 1 point).
An amateur has entered data to represent these students and their grades in an Excel file. Problem is, this person has done the worst thing possible - he has developed his own notation and entered all test information in a single cell --- and made my life hell.
The file originally had two text columns, one for individual's id, and the second for test info, if one could call it that.
It's horrible, I know, and I am suffering. In the image, if you see "M-II-2 I-III-1" it means the person got grade II in Test M for 2 points and grade III in Test I for 1 point. Some have taken only one test, some two, and some three.When the file came to me for processing and analyzing the performance of students, I sent it back with instructions to insert 3 additional columns with only the grades for the three tests. The file now looks as follows. Columns C and D represent grades I, II, and III using 1,2 and 3 respectively. Column C is for Test M, column D for Test I. Column E says BA (B Achieved!) if the individual has passed Test B.
Now that you have the above information, let's get to the problem. I don't trust this and want to check whether data in column B matches with data in columns C,D and E.
That is, I want to examine the string in column B and find out whether the figures in columns C,D and E are correct.
All help is really appreciated.
P.S. - I had exported this to MySQL via ODBC and that is why you are seeing those NULLs. I tried doing this in MySQL too, and really will accept a MySQL or an Excel solution, I don't have a preference.
Edit : - See file with sample data
To create a flat file from the original data:
Sub GetData()
Dim cn As Object
Dim rs As Object
Dim strFile As String
Dim strCon As String
Dim strSQL As String
Dim s As String, t As Variant, x As Variant
Dim i As Integer, j As Integer, k As Integer
''This is not the best way to refer to the workbook
''you want, but it is very conveient for notes
''It is probably best to use the name of the workbook.
strFile = ActiveWorkbook.FullName
''Note that if HDR=No, F1,F2 etc are used for column names,
''if HDR=Yes, the names in the first row of the range
''can be used.
''This is the Jet 4 connection string, you can get more
''here : http://www.connectionstrings.com/excel
strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
''Late binding, so no reference is needed
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
cn.Open strCon
strSQL = "SELECT * " _
& "FROM [Sheet1$] "
''Open the recordset for more processing
''Cursor Type: 3, adOpenStatic
''Lock Type: 3, adLockOptimistic
''Not everything can be done with every cursor type and
''lock type. See http://www.w3schools.com/ado/met_rs_open.asp
rs.Open strSQL, cn, 3, 3
''Pick a suitable empty worksheet for the results
With Worksheets("Sheet2")
''Fill headers into the first row of the worksheet
.Cells(1, 1) = "ID"
.Cells(1, 2) = "Exam"
.Cells(1, 3) = "Grade"
.Cells(1, 4) = "Points"
''Working with the recordset ...
''Counter for Fields/Columns in Recordset and worksheet
''Row one is used with titles, so ...
i = 1
Do While Not rs.EOF
''Store the ID to a string (if it is a long,
''change the type) ...
s = rs!ID
t = Split(rs!testinfo, " ")
For j = 0 To UBound(t)
''(Counter)
i = i + 1
.Cells(i, 1) = s
x = Split(t(j), "-")
For k = 0 To UBound(x)
If t(j) = "BA-1" Then
.Cells(i, 2) = "B"
.Cells(i, 3) = "A"
.Cells(i, 4) = 1
Else
.Cells(i, k + 2) = x(k)
End If
Next
Next
''Keep going
rs.MoveNext
Loop
''Finished with the sheet
End With
''Tidy up
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
End Sub
To check the extra columns:
Sub CheckData()
Dim cn As Object
Dim rs As Object
Dim strFile As String
Dim strCon As String
Dim strSQL As String
Dim s As String, t As Variant, x As Variant
Dim i As Integer, j As Integer, k As Integer
Dim BAErr, MErr, IErr
strFile = ActiveWorkbook.FullName
strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
cn.Open strCon
strSQL = "SELECT * " _
& "FROM [Sheet1$] "
rs.Open strSQL, cn, 3, 3
Do While Not rs.EOF
t = Split(rs!testinfo, " ")
For j = 0 To UBound(t)
x = Split(t(j), "-")
Select Case x(0)
Case "BA"
If rs![test b] <> "BA" Then
BAErr = BAErr & "," & rs!ID
End If
Case "M"
If String(rs![test m], "I") <> x(1) Then
MErr = MErr & "," & rs!ID
End If
Case "I"
If String(rs![test i], "I") <> x(1) Then
IErr = IErr & "," & rs!ID
End If
End Select
Next
rs.MoveNext
Loop
''Tidy up
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
If BAErr <> "" Then
MsgBox Mid(BAErr, 2), , "B Errors"
End If
If MErr <> "" Then
MsgBox Mid(MErr, 2), , "M Errors"
End If
If IErr <> "" Then
MsgBox Mid(IErr, 2), , "I Errors"
End If
End Sub
精彩评论