Need help thinking about ways to normalize some messy data
I have some data in an Access table. I am well-versed in Access and am using it in this instance for its report-generating qualities and ease of use for the non-tech proficient.开发者_Python百科 This is the only tool at my disposal. The data came to me as an .xls with a lot of junk and stuff basically all over the place. I made a bunch of queries to get rid of the junk and re-organize (which was a lot easier to do in Access than Excel).
However this is where I'm stuck. This is what I have now:
Name | Test Date | Test data
Jane | ----------| --------
-----| 3/10/09 | --------
-----| --------- | 1
-----| --------- | 2
-----| --------- | 3
John | ----------| ------
-----| 3/12/09 | ------
-----| --------- | 3
-----| --------- | 1
-----| --------- | 5
-----| 3/13/09 | ------
-----| --------- | 2
-----| --------- | 5
-----| --------- | 7
This is what I'd like to have:
Name | Test Date | Test data
Jane | 3/10/09 | 1
Jane | 3/10/09 | 2
Jane | 3/10/09 | 3
John | 3/12/09 | 3
John | 3/12/09 | 1
John | 3/12/09 | 5
John | 3/13/09 | 2
John | 3/13/09 | 5
John | 3/13/09 | 7
Some of the names have more than one test under them. How many or which names are impossible to predict, although all tests have exactly 12 rows of data. But you need to SEE the data to know which names and dates go with which data.
I know ideally you would have Names and Test dates in their own table, but I'm trying to make this a one-table deal for ease of use by people who are not me who will be ok with importing and running the queries (which is basically one click using macros) but not much else.
I can make and run a bunch of queries and forms to do this, but I can't seem to figure out which ones. Does anyone have any ideas?
I'm trying to make this a one-table deal for ease of use by people who are not me who will be ok with importing and running the queries.
Perhaps it would be better to have multiple tables to normalize it, and then create views for other people?
If you have a key field that holds the order in which these rows should be, I suggest you step through using VBA and build a new table.
I would recommend transforming it permanently to your preferred form, but even still, it should be possible to do this in a straight Access query using triangular joins to find the highest previous non-blank row. You need to have a row_id to preserve row order, of course - without that the data will be uninterpretable.
SELECT *
FROM tbl AS test_data
INNER JOIN (
SELECT test_data.row_id, MAX(name_data.row_id) AS name_row_id
FROM tbl AS test_data
INNER JOIN tbl AS name_data
ON name_data.row_id < test_data.row_id
AND name_data.name IS NOT NULL -- or whatever your empty columns contain
GROUP BY test_data.row_id
) AS name_row_id
ON name_id.row_id = tbl.row_id
INNER JOIN (
SELECT test_data.row_id, MAX(date_data.row_id) AS date_row_id
FROM tbl AS test_data
INNER JOIN tbl AS date_data
ON date_data.row_id < test_data.row_id
AND date_data.[test date] IS NOT NULL -- or whatever your empty columns contain
GROUP BY test_data.row_id
) AS date_row_id
INNER JOIN tbl AS name_data
ON name_data.row_id = name_row_id.name_row_id
INNER JOIN tbl AS date_data
ON date_data.row_id = date_row_id.date_row_id
WHERE test_data.[test data] IS NOT NULL -- or whatever your empty columns contain
You should definitely isolate the data model from your user's concern. By not doing so you will create a data model that will be very hard to manage, adapt and upgrade. The 'user friendly' considerations can usually be solved easily by giving them access to views instead of having them query the original tables.
If you want to (really) normalize your data, I guess you'll have to go through the following steps:
- create tables
- transfer data
- create view
Create tables
table_person (id_person PK, namePerson)
table_test (id_test PK, id_Person FK, dateTest)
table_result (id_result PK, id_test FK, valueResult)
This data model takes into consideration the 1 to many relations between (a)Table_Person
and Table_test
and (b)Table_test
and Table_result
.
transfer data
Once this is done, you'll have to write some vba code, similar to this one (I assume here that your original table is called table_data, and records are correctly ordered to match your example!):
Dim rsData as DAO.recordset, _
rsperson as DAO.recordset, _
rsTest as DAO.recordset, _
rsResult as DAO.recordset
set rsData = currentDb.openRecordset("Table_Data")
set rsPerson = currentDb.openRecordset("Table_Persone")
set rsTest = currentDb.openRecordset("Table_Test")
set rsResult = currentDb.openRecordset("Table_result")
rsData.moveFirst
Do while not rsData.eof
'person is already known
if isnull(rsData.fields("name") or _
rsData.fields("name") = rsPerson.fields("name") Then
'test reference is already known
if isnull(rsData.fields("test date") or _
rsData.fields("test date") = rsPerson.fields("dateTest") Then
'add new result record
rsResult.addNew
rsResult.fields("id_Result") = ... (your choice of a PK(*))
rsResult.fields("id_Test") = rsTest.fields("id_test")
rsresult.fields("valueResult") = rsData.fields("Test data")
rsResult.update
Else
'add new test record
rsTest.addNew
rsTest.fields("id_Test") = ... (your choice of a PK(*))
rsTest.fields("id_Person") = rsPerson.fields("id_Person")
rsTest.fields("dateTest") = rsData.fields("Test date")
rsTest.update
Endif
Else
'add new person record
rsPerson.addNew
rsPerson.fields("id_Person") = ... (your choice of a PK(*))
rsPerson.fields("namePerson") = rsData.fields("name"
rsPerson.update
Endif
rsData.moveNext
loop
''close your recordsets and you're ok
PK(*): depending on your Primary Key choice, you might not need to generate it via code. If tables are set to have (for example) an autoincrement number as PK, Access will automatically generate the corresponding value at recordset update time.
create view
Your view will allow your users to access the data the 'friendly' way you want:
SELECT table_person.namePerson, table_test.dateTest, table_result.valueResult FROM tbl_person LEFT OUTER JOIN ...
At a minimum you need to have an autonumber column on the table. This is the only way you will be able to end up realting the records. Then you will want to add another column for parent id. YOu will then go through and populate this by finding the next record up with a name inthe name field and putting its autogenrated id number into the field. Then you will need to update the parent record with data in the child records joining on the id field in the parent to the parentid field in the child. All of this assumes the data went into the table inthe correct order which I'm not sure it did without seeing waht the data looked like or how you put it in.
Just for starters, if you have this in a DB, you should have incrementing row_id
to keep the order of records. It may be actually easier to put this into Excel and "copy-down".
Running this VBA code in an Excel spreadsheet formmatted like your first table will produce your second table. Perhaps, export from Access to csv, import into Excel, run the macro then export to csv and re-import into Access.
Try it out, put your sample table above in Sheet1 of an Excel workbook, run the macro and the result will appear in sheet2. If you run the real table replace the constant '16' in the code below with the actual number of rows in the table.
Sub Normalize()
Dim row, row2 As Integer
Dim name, dt, test As String
Dim wname, wdt, wtest As String
row = 2
row2 = 1
While row < 16
name = Sheet1.Cells(row, 1).value
dt = Sheet1.Cells(row, 2).value
test = Sheet1.Cells(row, 3).value
If name <> "" Then
wname = name
wdt = ""
wtest = ""
End If
If dt <> "" Then
wdt = dt
wtest = ""
End If
If test <> "" Then wtest = test
If wname <> "" And wdt <> "" And wtest <> "" Then
Sheet2.Cells(row2, 1).value = wname
Sheet2.Cells(row2, 2).value = wdt
Sheet2.Cells(row2, 3).value = wtest
row2 = row2 + 1
End If
row = row + 1
Wend
End Sub
精彩评论