Is it "bad" to store XML in a database?
I have heard from several sources that storing XML in a database is "bad", b开发者_运维问答ut I have never seen/heard an actual explanation of why that is. Is it true? If it is true, can you explain why? Moreover, can you tell me what a "good" case for storing XML in a database is?
There are some really stupid answers here - just because a database supports a data type does not mean you should be using it. These things are invariably added in as features because the competition have them, not because they are the right thing to do. Global variables? Triggers? Would anyone like to defend them too just because you can use them and they're there?
If you have multiple attributes, the best way to handle them in a relational database is with a one to many relationship. Parse out your useful data from the XML overhead. You then just store the ID (primary key) of the parent record with each of the rows stored in a second table, one row per attribute. You can have any number of attributes per parent record. It's database design 101, nothing clever. Storing it as unstructured XML just to store a variable number of attributes is not the way to go, it's a sledgehammer to crack a peanut. A one to many relationship between two tables is simpler, easier to understand, much faster to query, much less effort coding, and less storage (which means faster queries). Everyone wins, apart from the storage vendors.
XML is a data transfer protocol; as GolezTrol rightly said, "It is a way to export (and import) data" - i.e.: it is simply an overhead used to facilitate the communication of the structure of the data between different systems. Once received, the tags should be stripped out and the data (and only the data) stored in your database engine of choice, whatever that might be. Not the XML itself. The overhead for XML is ~10x that of the data it's describing. Want to tell your boss why that 100GB of data is occupying 1TB of space on your hyper expensive SAN? Or taking all night to back up over a saturated network link? Or causing performance problems in production? If you don't parse out the data from the now pointless tags, you will just push the problem and ongoing, daily support costs onto operational support for the next ten years. Sloppy, sloppy, sloppy. This keeps vendors like EMC in business.
XML is metadata. Nothing clever, just a schema descriptor. Once it's transferred and parsed it's lost its usefulness and is just clutter that clogs up whatever database you use. Get rid of it, unless you're compulsively addicted to hording yesterday's pointless crappy description metadata, stored many times over. Wake up. It's typical "Emperor's New Clothes" syndrome, stopped being conned by something simple and disposable. It's only metadata and it should not be stored or worshipped, it's junk once it's parsed. And what's better? To parse it once, or to uselessly parse it every time you need data from it? The answer's pretty darned obvious to me.
It's not bad at all. Microsoft SQL Server has an XML data type. One use case for storing XML is a situation we found ourselves in. For each row in a particular table, we needed to store a variable number of attributes related to that row. And the number of these attributes can change over time, and with each row. We found it more efficient to store these attributes, and their values in an XML format. In the future, each time we adjust the number of attributes, we don't need to make schema changes.
Storing XML, JSON, YAML, comma-seperated lists, binary blobs, or anything else in a database is not bad ... per se.
It can indicate a lack of understanding of what a database is for (storing data that is related to other data) and conjures up visions of databases with single column tables called data1
, data2
, etc. ... with each table row holding a +5 MB entry of XML encoded relational data.
On the other hand, there are many valid cases that can be made for such a structure -- rapidly changing configurations might be represented in JSON and stored in a two column table structured like this:
dbo.good_table
ApplicationID (bigint)
Configuration (varchar(max))
The difference between the above table and a table like this:
dbo.bad_table
ApplicationID (bigint)
ApplicationMembers(xml)
Is that good_table
is enabling rapid access to a piece of data (the configuration), while the bad_table
is using the database as an ofttimes expensive (and slow) hard disk.
XML is itself a kind storage format. It is most practically used for transportation of data, because it provides a common mechanic for structuring data. There are fixed rules for reading and writing XML that allow XML data to be read by anyone. Also validations and transformation to other output formats are relatively easy (using xslt). XML, however, is not the best way to store data in. It is time consuming to read XML files and they take up relatively much space. It is best to store your data in a structured manner in your database, and export the data from certain queries to XML if you need them in reports, on a website or to pass them to other parties.
There are XML databases, but they also don't store there data in XML. They merely provide a way to save and load hierarchical data (XML is an hierarchical structure), instead of the standard table structure.
So it is right to say that storing XML content in a blob in a database is generally not the right way to go, but there are always exceptions ofcourse.
XML is -in contrast with what others say here- not a way to display data. It is a way to export (and import) data. It is a logical choice for transportation of data. That is because you are totally flexible in the way that you want it to export, it can easily be transformed to other formats. Like, if you have a webshop, and you want to export prices and productinformation to other parties, you could choose XML. These other parties can write easy rules to transform this data to their needs. Neither party has to know the way there prices are stored on the other side, and neither party has to write a complex tool to parse some hard to read binary that someone else has made up.
Not, it is not.
Actually several databases already have data types for storing XML documents.
I think storing a database would be bad for perhaps speed reasons (parsing etc). However a good case would be that it fits the semi-structured model there are some advantages of this listed here.
It's neither bad or good to do store XML in a DB. You just have to consider your requirements and how the data is used.
If your data is machine produced and consumed, and it is only in XML to transport between apps, then a DB makes sense. At this point, you may also want to look at JSON instead of XML because it is somewhat better (IMO) at encapsulated data transport between two applications.
If your data is human-produced or document-centric, or may be subject to periodic schema change, or is consumed by being read as a document, then it may make more sense to keep in XML. You may also consider some sort of version control if the XML is sufficiently mission-critical or you need a record of changes.
精彩评论