开发者

Update MySQL database from XML file

I got 2 tables in mySQL database:

Persons:

id    name     address .....

and Items:

id   person_id  param1  param2 ....

items are referenced by person_id to Persons. There is about 5000 persons and 40,000 items and growing. This data need to be updated every few days from big XML file what look like this:

.....
<person id='100016' lastname='....' firstname='.....' ....  >
<item param1='...' startdate='2006-01-07' enddate='2006-12-09' ... />
<item param1='...' startdate='2007-01-04' enddate='2007-12-08' ... />
<item param1='...' startdate='2008-01-04' enddate='2009-01-03' ... />
<item param1='...' startdate='2009-01-06' enddate='2009-12-31' ... />
</person>
......

Some attributes of person and item can change, new persons, and new items could be added.

What is best way to keep it easy to update ? I do not think that truncating tables and loading them again would most effective way. Shoul开发者_运维知识库d I consider some staging - size of XML file start to have meaning 60MB and growing -> it not effective to have it parsed to SimpleXML and comparing with SQL...

What would be a best practice in situation like this ?

I have mySQL and PHP with Zend Framework as my environment


You could use diffxml on the old and new XML file and then parse the changes in the XML to reflect them into MySQL.

But I think truncating and loading the full file would also work in a reasonable time. And sometimes such a simple solution is the best - even if it is not super fast.


I think the most effective solution would be to simply reprocess the entire file, and for each line perform an INSERT ... ON DUPLICATE KEY UPDATE. You would need to find a value that identifies the user which will never change and create a UNIQUE index (or use the primary key, which looks like it would be the most effective approach in your case as long as the id field is the one that never changes), and put any fields that may change over time into the update section.

INSERT INTO persons SET id='1234', lastname='a', firstname = 'b' ON DUPLICATE KEY UPDATE lastname='a', firstname='b';

As long as the fields are not variable in length (no varchar or test fields in the table) this should allow you to keep records updated quickly without needing to truncate the tables. Truncating would just increase the disk writes for each record.

This approach would also allow you to create statistics - for example you might add a column that records the first time you see a new person, so you know how many people got added in each update process.


Question 1 - What is best way to keep it easy to update ?

Is there a unique reference you can use within the XML? If so then import this and use it to determine if the XML data has already been imported. Failing that you will have to generate the reference yourself.

Having this reference within the XML would be perfect as you can load the XML document using DOMDocument and then use XPath to easily go to the last imported reference and then import again. This would only work if you knew the data wasn't going to change on previous rows.

If it does change then you would need to iterate over it or use the diff tool suggested by Alex.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜