开发者

Excel string manipulation to check data consistency

Background information: - There are nearly 7000 individuals and there is data about their performances in one, two or three test开发者_JS百科s.

Every individual has taken the 1st test (let's call it Test M). Some of those who have taken Test M have also taken Test I, and some of those who have taken Test I have also taken Test B.

For the first two tests (M and I), students can score grades I, II, or III. Depending on the grades they are awarded points -- 3 for grade I, 2 for II, 1 for III.

The last Test B is just a pass or a fail result with no grades. Those passing this test get 1 point, with no points for failure. (Well actually, grades are awarded, but all grades are given a common 1 point).

An amateur has entered data to represent these students and their grades in an Excel file. Problem is, this person has done the worst thing possible - he has developed his own notation and entered all test information in a single cell --- and made my life hell.

The file originally had two text columns, one for individual's id, and the second for test info, if one could call it that.

Excel string manipulation to check data consistency

It's horrible, I know, and I am suffering. In the image, if you see "M-II-2 I-III-1" it means the person got grade II in Test M for 2 points and grade III in Test I for 1 point. Some have taken only one test, some two, and some three.

When the file came to me for processing and analyzing the performance of students, I sent it back with instructions to insert 3 additional columns with only the grades for the three tests. The file now looks as follows. Columns C and D represent grades I, II, and III using 1,2 and 3 respectively. Column C is for Test M, column D for Test I. Column E says BA (B Achieved!) if the individual has passed Test B.

Excel string manipulation to check data consistency

Now that you have the above information, let's get to the problem. I don't trust this and want to check whether data in column B matches with data in columns C,D and E.

That is, I want to examine the string in column B and find out whether the figures in columns C,D and E are correct.

All help is really appreciated.

P.S. - I had exported this to MySQL via ODBC and that is why you are seeing those NULLs. I tried doing this in MySQL too, and really will accept a MySQL or an Excel solution, I don't have a preference.

Edit : - See file with sample data


To create a flat file from the original data:

Sub GetData()
    Dim cn As Object
    Dim rs As Object
    Dim strFile As String
    Dim strCon As String
    Dim strSQL As String
    Dim s As String, t As Variant, x As Variant
    Dim i As Integer, j As Integer, k As Integer

    ''This is not the best way to refer to the workbook
    ''you want, but it is very conveient for notes
    ''It is probably best to use the name of the workbook.

    strFile = ActiveWorkbook.FullName

    ''Note that if HDR=No, F1,F2 etc are used for column names,
    ''if HDR=Yes, the names in the first row of the range
    ''can be used.
    ''This is the Jet 4 connection string, you can get more
    ''here : http://www.connectionstrings.com/excel

    strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
        & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"

    ''Late binding, so no reference is needed

    Set cn = CreateObject("ADODB.Connection")
    Set rs = CreateObject("ADODB.Recordset")


    cn.Open strCon

    strSQL = "SELECT * " _
           & "FROM [Sheet1$] "

    ''Open the recordset for more processing
    ''Cursor Type: 3, adOpenStatic
    ''Lock Type: 3, adLockOptimistic
    ''Not everything can be done with every cursor type and
    ''lock type. See http://www.w3schools.com/ado/met_rs_open.asp

    rs.Open strSQL, cn, 3, 3

    ''Pick a suitable empty worksheet for the results

    With Worksheets("Sheet2")

        ''Fill headers into the first row of the worksheet


        .Cells(1, 1) = "ID"
        .Cells(1, 2) = "Exam"
        .Cells(1, 3) = "Grade"
        .Cells(1, 4) = "Points"

        ''Working with the recordset ...

        ''Counter for Fields/Columns in Recordset and worksheet
        ''Row one is used with titles, so ...
        i = 1

        Do While Not rs.EOF


            ''Store the ID to a string (if it is a long,
            ''change the type) ...

            s = rs!ID

            t = Split(rs!testinfo, " ")

            For j = 0 To UBound(t)
                ''(Counter)
                i = i + 1

                .Cells(i, 1) = s

               x = Split(t(j), "-")

                For k = 0 To UBound(x)
                    If t(j) = "BA-1" Then
                        .Cells(i, 2) = "B"
                        .Cells(i, 3) = "A"
                        .Cells(i, 4) = 1
                    Else
                        .Cells(i, k + 2) = x(k)
                    End If
                Next
            Next


            ''Keep going 
            rs.MoveNext

        Loop

   ''Finished with the sheet
   End With

   ''Tidy up
   rs.Close
   Set rs = Nothing
   cn.Close
   Set cn = Nothing
End Sub

To check the extra columns:

Sub CheckData()
    Dim cn As Object
    Dim rs As Object
    Dim strFile As String
    Dim strCon As String
    Dim strSQL As String
    Dim s As String, t As Variant, x As Variant
    Dim i As Integer, j As Integer, k As Integer
    Dim BAErr, MErr, IErr

    strFile = ActiveWorkbook.FullName

    strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
        & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"

    Set cn = CreateObject("ADODB.Connection")
    Set rs = CreateObject("ADODB.Recordset")

    cn.Open strCon

    strSQL = "SELECT * " _
           & "FROM [Sheet1$] "

    rs.Open strSQL, cn, 3, 3

    Do While Not rs.EOF

        t = Split(rs!testinfo, " ")

        For j = 0 To UBound(t)
           x = Split(t(j), "-")

           Select Case x(0)
                Case "BA"
                    If rs![test b] <> "BA" Then
                        BAErr = BAErr & "," & rs!ID
                    End If
                Case "M"
                    If String(rs![test m], "I") <> x(1) Then
                        MErr = MErr & "," & rs!ID
                    End If
                Case "I"
                    If String(rs![test i], "I") <> x(1) Then
                        IErr = IErr & "," & rs!ID
                    End If
           End Select
        Next

        rs.MoveNext

    Loop


   ''Tidy up
   rs.Close
   Set rs = Nothing
   cn.Close
   Set cn = Nothing

   If BAErr <> "" Then
      MsgBox Mid(BAErr, 2), , "B Errors"
   End If

   If MErr <> "" Then
      MsgBox Mid(MErr, 2), , "M Errors"
   End If

   If IErr <> "" Then
      MsgBox Mid(IErr, 2), , "I Errors"
   End If

End Sub
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜