开发者

How to store data that can be structured or non-structured at the same time?

I have a database with the following table:

PATIENT (PATIENT_ID*, MEDICAL_EXAMINATIONS)

where the field MEDICAL_EXAMINATIONS contains a free-text description of the exams undertaken by the patient.

Recently, it was decided that medical examination can be reported EITHER as free-text (as always) OR in a structured way (divided in exam name, date, results, and so on).

So I thought to change the schema as follow (fields marked with an asterisk compose the key):

PATIENT (PATIENT_ID*, MEDICAL_EXAMINATIONS)
MEDICAL_EXAMINATION (PATIENT_ID*, NUMBER*, NAME, DATE, RESULT)

but I found this solution a little disturbing, because I have the SAME information (the medical examinations) stored in TWO tables. In this case, the result of t开发者_如何学Gohe query "select all medical examinations undertaken by a patient" is not so "elegant".

I don't really know how to express my question, but this situation seems STRANGE to me. I wonder if the problem originates intrinsecally from the specifications (that I can't change) or if there is a better method to model the "two versions" of data.


Personally, I would seperate out the concept of medical examinations completely from the patient into two seperate tables, like so:

PATIENT(PATIENT_ID)
MEDICAL_EXAMINATION(PATIENT_ID,NAME,DATE,RESULT)
MEDICAL_EXAMINATION_NOTES(PATIENT_ID,NOTES)

"Notes" is a rough guess at the table name, there may be a more appropriate name for this based on what the use cases are.

This allows you some added flexibility as you could have multiple "free-form" examinations at some point in the future if you chose.

Selecting out both of these is always going to be troublesome as you have a different data strucuture. You'd probably be limited to the lowest-common-denominator and pull them out as strings if you wanted to get them together, like so:

SELECT 'Name ' + NAME + ', Date ' + DATE + ', Result: ' + RESULT AS EXAM
FROM MEDICAL_EXAMINATION WHERE PATIENT_ID = @PATIENT_ID

UNION ALL

SELECT NOTES AS EXAM FROM MEDICAL_EXAMINATION_NOTES WHERE PATIENT_ID = @PATIENT_ID

Better yet, if this database is backing some sort of business objects, have a seperate class for "free-form" and "structured" examinations and then a common interface that gives a string representation of the medical examination. That way your business layer has the option of treating them seperately or using them together.


I would do:

  • Patient: ID, Name, Date of Birth, Examinations, etc
  • Medical Examination: ID, Patient ID (FK), Name, Date, Result

Think of the Patient.Examination free-text field as basically unprocessed or not-yet transcribed examinations. The idea is that as you transcribe data from the free text field you remove it from there and add it to the other table.

This brings all sorts of error detection and control issues however. Medical transcription is a delicate area (understandably).

Arguably you could normalize this further and describe each possible Examination, give it an ID and other data and then put Examination ID into the Medical Examination entity instead of a simple Name column.

But it all depends on your requirements.


It's not a great situation. One approach that might be a little cleaner would be to leave the medical examination out of the patient table (it doesn't belong there anyway), and have the medical examination table have patient_id, name, date, result, and free_text. If the free_text value for a given row is entered, the others are ignored. It means you can't, for example, make the date a required field in the DB, but it's still better than the current version, I think.

It also would give you a path to transition from worse to better data:

Phase 1: most patients have a single associated medical_examination row with free text that describes multiple exams.

Phase 2: most patients have multiple associated medical_examination rows with free text that describes each separate exam.

Phase 3: most patients have multiple associated medical_examination rows with structured data for each separate exam.


One of the primary rules of relational databases has been expressed by Joe Celko as "One Fact, One Place, One Way" (with "One Time" sometimes added in). Having the data--very important data, from the looks of it--present twice in the database, stored in two very different fashions, is not a good idea. Could you do something like this:

  • If there are key facts that must be present for an examination, create columns for them (as you do for Name, Date, Result)
  • Given that, what else might be included in the descriptions? I'd try to get this presented separately and stored in it's own column (say, Comments)
  • With this, you could build a "standardized" free-text description on the fly, based on the relevant data.

Anything else, and you'll have to sort through two different and potentially disagreeing sources for your information.


You could add the NAME, DATE, RESULT columns to the PATIENT table. If the condition "have either free-form or structured data, but not both" must hold, you can add a trigger that prevents violations of the condition.


This is a difficult problem, and you have several good answers here that I am in the process of upvoting as I interpret them.

My personal path would be to split the free text examinations column out of the patient rows. In most physical models, it's going to be ntext, text or varchar(MAX) or similar, and you don't want it having taking up space in the row and these types usually store their data outside of the row, but in any case, it's good to get it out. Typically, I would with have a 1-1 on the patient. It makes your patient rows smaller and more manageable.

Then I would make a separate table where the data is interpreted, extracted and normalized into columns and rows, many-1 with the patient.

You say that the data is the SAME. If so, the free text is not necessary to retain, and you can use the normalized table of examinations (and even make a view to reconstruct the original "free text")

In reality, I typically would treat the free text as legacy and restrict access to it, and drive all views and updates from the normalized data. If the free text needs to be maintained in sync with the normalized version, there are a number of techniques to handle this like triggers, but they can be extremely messy if individual transactions are allowed to change and the "free text" needs to have some parts altered.


"Recently, it was decided that medical examination can be reported EITHER as free-text (as always) OR in a structured way (divided in exam name, date, results, and so on)."

This strikes me as a non-decision (probably made by non-cognoscenti). From what I can make of it, this so-called "decision" still leaves the information provider the liberty to provide the information in whatever way he wants, structured or non-structured. (Note : my reply doesn't apply if the information provider is forced to make a definitive choice between "structured" and "non-structured".)

"Structured" means that there are "layout rules" (e.g. CSV) and "content rules" (e.g. "examName must be the name of a known course/exam") to which information provider must conform.

But "non-structured" by definition will still keep meaning that "whatever information that information provider provides, that information will always at least satisfy "non-structured", so any information provided is, by definition, always acceptable under the "non-structured" interpretation.

Therefore, this so-called "decision to allow structured too" is of no damn use at all to you. And (taking into account the caveat I mentioned), the logical conclusion is that "not doing anything at all" with this decision (which seems like a totally fake one) is your best option.

No doubt you will think "But I just can't do that". You may be right, if your management is completely insensitive to logically well-founded reasoning.

PS

As to the remark that has been made ""One Fact, One Place, One Way" : Cases such as these might be illustrative of why it is sometimes necessary to relax that (TEMPORARILY !) to "One fact, One place, One OF TWO POSSIBLE ways". Just to facilitate transition-at-the-users'-pace.


What we have here is an example of semi-structured data, one way to deal with this is to use XML data type field for ExamDetails. You could have:

<root>
 <ExamName></ExamName>
 <ExamResult></ExamResult>
 <FreeText></FreeText>
</root>

Not all elements would have to be present in every record. You would use your DB XML features to query the field. All mayor DBs (MS SQl Server, Oracle, DB2) can store and query XML.

Few more notes:
I would have minimum three tables: Patient, Doctor, Exam

TABLE Patient (ID (PK), Name, other patient details...)
TABLE Doctor (ID (PK), Name, other doctor details...)
TABLE Exam (ID (PK), PatientID (FK), DoctorID (FK), Date, ExamDetails XML, more here...)

If both doctor and patient happen to be people (as opposed to vet clinic or house inspection) you could add a table Person and sub-type Patient and Doctor tables to the Person table -- this way it is easy to have a doctor in the clinic as a patient too. For example:

TABLE Person (ID (PK), FirstName, LastName, Phone, Address, other details common to people...)
TABLE Patient (PersonID (PK, FK), ...specific patient details only)
TABLE Doctor (PersonID (PK, FK), ...specific doctor details only)
TABLE Exam (ID (PK), PatientID (FK), DoctorID (FK), Date, ExamDetails XML, more here...)

Because Patient and Doctor are type of person, PersonID should be the same number as the ID in the Person table.

How to store data that can be structured or non-structured at the same time?


You can add column comment to MEDICAL_EXAMINATION table.

It will look like

MEDICAL_EXAMINATION (PATIENT_ID, NAME, DATE, RESULT,comment)

So you can store your unstructured data in comment column.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜