开发者

Document text in XML

I understand开发者_如何学JAVA the differences between XML and HTML, but one particular aspect is not clear to me. XML is usually described as both a language that describes data, as well as a document markup language. Because of the former definition, XML is often compared to other data-describing formats such as JSON. Because of the latter definition, XML is also often compared to other document-markup languages, such as HTML.

I realize XML can function as both, but if XML is to serve as a document markup language, can document text appear between closing tags, in the same way it can with HTML?

Take the following HTML:

<div>
   Some text, and some <b>more</b> text.
</div>

Ignoring the initial XML Declaration, is the above also valid XML? Note that the fragment text. is not enclosed in any tags: it appears between two closing tags. This is, of course, necessary in a markup language like HTML, where the goal is to format text. But most examples of XML I see use it to describe data, like:

<book>
  <title>Blah blah</title>
  <author>Blah blah</author>
</book>

In the above example, text never appears between closing tags.

So, is text (content) allowed to appear between closing tags in XML?


Yes. That is referred to as "mixed content"

You are correct in noting it as one of the requirements for a document format as opposed to a data format. JSON is probably better as a data format than XML, but because it does not allow mixed content, it cannot replace XML as a document format.


This is valid XML:

<div>
   Some text, and some <b>more</b> text.
</div>

The text. at the end is still enclosed in the div element.

Breakdown:

 Some text, and some  - Text node within div
 <b>more</b>          - b element within div (with own text node)
  text.               - Text node within div

These are all sibling nodes.


Ignoring the initial XML Declaration, is the above also valid XML?

Yes, it is still enclosed within the div tag.

A useful explanation to this is found in W3Schools.

Text is always stored in text nodes. A common error in DOM processing is to navigate to an element node and expect it to contain the text. However, even the simplest element node has a text node under it. For example, in 2005, there is an element node (year), and a text node under it, which contains the text (2005)

So, in your example, the're a text node for text. under the div element.


According to "XML for the World Wide Web" by Elizabeth Castro (2001) the answer is yes with a special tag called CDATA

To prevent a parser from reading the HTML as XML, you could enclose the example above within CDATA like this:

<element>
    <![CDATA[
<div>
Some text, and some <b>more</b> text.
</div>
]]>
</element>

The <![CDATA[ stops the text from being parsed until it reaches the closing ]]>

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜