开发者

opinions and advice on database structure

I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:

a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e

And I have a list of categories to break these rows up into, for example:

Original   Cat1  Cat2  Cat3  Cat4  Cat5
---------------------------------------
a:b:c:d:e  a     b     c     d     e

As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).

Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.

Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:

Table: Generated
generated_id        int           - unique id for each row generated
generated_timestamp datetime      - timestamp of when row was generated
last_updated        datetime      - timestamp of when row last updated
generated_method    varchar(6)    - method in which row was generated (manual or auto)
original_string     varchar (255) - the original string

Table: Categories
category_id         int           - unique id for category
category_name       varchar(20)   - name of category

Table: Category_Values
category_map_id     int           - unique id for each value (not sure if I actually need this)
category_id         int           - id value to link to table Categories
generated_id        int           - id value to link to table Generated
category_value      varchar (255) - value for the category

Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.

What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and l开发者_开发百科ist previously generated values, or update previously generated entries with new values or whatever.

Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like

select * from Generated where original_string = '$string'
// id is put into $id

and then

select * from Category_Values where generated_id = '$id'

...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?


My suggestion:

Table: Generated
id                  unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated        timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method    ENUM('manual','auto')
original_string     varchar (255)

Table: Categories
id                  unsigned int autoincrement primary key
category_name       varchar(20)   

Table: Category_Values
id                  unsigned int autoincrement primary key
category_id         int           
generated_id        int           
category_value      varchar (255) - value for the category
  FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
  FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id

Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html


I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)

Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜