开发者

Best method to extract XML data in different DBMS?

I'm writing a PHP-based web application that uses PDO to connect to multiple database backends. Currently they are MySQL 5.1+, SQLite 3.6+ and PostgreSQL 8/9. Other systems could possibly be added later, given they understand my SQL queries well enough or the queries can be transformed by my abstraction layer.

Now I'd like to store log data i开发者_如何学Pythonn the database in XML format as the structure of these log events highly depends on the type of the event. I wouldn't want to add a separate column for any possible log information when most columns are NULL most of the time.

But how can I get the data back from the database later? I need a consistent way to extract data from the XML documents in a SELECT query. I know of MySQL's ExtractValue() function and I could add a UDF (user defined function) to SQLite through PDO somehow as well. But I haven't found anything similar for PostgreSQL. And then I don't know which way would be the best to get XML support into all DBMS for use with the same SQL query. Does anybody have a solution or best practices for that already?

Update: Here's an example of such a log record:

Num | Time        | EventId | UserId | Data
  1 | 2011-02-... |       1 |     42 | <data><messageid>123</messageid></data>
  2 | 2011-02-... |       2 |     43 | <data><messageid>123</messageid></data>
  3 | 2011-02-... |      23 |      7 | <data><oldname>006</oldname><newname>007</newname></data>

Where EventId describes the event that happened, like locking or unlocking a message, or renaming a user. I'd possibly want to query for /data/messageid (together with appropriate EventIds that set that XML value) to find all events that tell about a particular message of which I want to plot the moderation history.

It's no big XML documents like entire web pages, just a nice structured way to keep all those values that I have no dedicated table column for. And it's extensible, so if I imagine a new EventId number, I can store any chunk of data with that event. By no means shall a condition on an XML value be the only condition of my queries. The rows shall be narrowed down (largely) by other criteria first, like a time span or a list of EventIds.


If you are striving for database independence, storing XML in your database is going to put quite a burden on your abstraction layer, because every DBMS handles it differently, unfortunately. That isn't to say it can't be done, but you're going to have to do a lot of work researching the XML capabilities (or lack thereof) in each of your supported databases. (And I don't think SQLite has any XML features at all...)

If you absolutely have to use XML, XQuery is pretty much the ideal method for querying raw XML, but it's not supported by relational databases. SQL/XML is one standard for querying XML in these, but it's not widely implemented.

Otherwise, there is really no harm in having a lot of null values in a table. Presumably your application code is going to have to check for null values anyway, whether they come from a database table or from a nonexistent XML element...


I would stay away from DB provided XML functionality, and store any XML as BLOB. This may mean you have to duplicate some of data; basically anything you want to query. If you want all of it queryable, you would be better off with a native XML database.

The reason XML extensions are often just eye candy is that even if server parses XML, it still needs to send it over the wire; to be decoded or parsed again. So it is often better to just stream it down as byte sequence and handle parsing on client side. The only exception would be if you wanted to use XML-based access methods (xpath or xquery lookups); but that's where native XML databases are much better fit than relational DBs ("lipstick on a pig").

Finally, instead of combining XML and relational DB, a more common dual setup is to combine database with a search index system (like Lucene, Elastic Search); this gives you powerful free text search along with storage. Search index is then incrementally updated when data in database is modified.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜