开发者

XML, S-Expressions, and overlapping scope... What's it called?

I was reading XML is not S-Expressions. XML scoping is kind of strict, as are S-expressions. And in every programming language I've seen, you can't have the following:

<b>BOLD <i>BOTH </b>ITALIC</i> == BOLD BOTH ITALIC

It's not even expressible with S-Expressions:

(bold "BOLD" (italic "BOTH" ) "ITALIC" ) == :(

Does a开发者_运维百科ny programming language support this kind of "overlapping" scoping? Could there be any practical use for it?


Overlapping markup structures has many practical uses. Consider for example applications of concurrent markup for text analysis in the humanities. The International Workshop on Markup of Overlapping Structures noted that:

Overlapping structures are ubiquitous, appearing in applications of textual markup as varied as aircraft maintenance manuals and ancient scriptural and liturgical works. The “overlap issue“ raises its ugly head whenever text encoding looks beyond the snapshot view of a particular hierarchy to represent and process multiple concurrent aspects of a text, including features that reflect the text’s evolution across multiple versions and variants whether typographic or presentational, structural, annotational or referential, taxonomic or topical.

Overlap is a problem in texts as diverse as technical documents and product manuals (versioning), legal codes (effectivity), literary works (prosadic versus dramatic stucture, rhetorical structures, annotation), sacred texts (chapter plus verse reference versus sentence structure and commentary), and language corpora (multiple layers of linguistic annotation).

The Text Encoding Initiative (TEI) publishes Guidelines to handle non-nesting information and provides an XML syntax for overlap. They stated in 2004 that:

[N]o solution has yet been suggested which combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation, and clear identity with the notations needed for simpler cases (i.e. cases where the textual features do nest properly).

Some options to handle overlapping structures include:

SGML has a CONCUR feature that can be used to support overlapping structures, although Goldfarb (the author of the standard) writes that "“I therefore recommend that CONCUR not be used to create multiple logical views of a document".

GODDAG provides a data structure for representing documents with overlapping structures.

XCONCUR is an experimental markup language with the major goal to provide a convenient method to express concurrent hierarchies in an XML-like fashion.


There probably isn't any programming language that supports overlapping scopes in its formal definition. While technically possible, it would make the implementation more complex than it needed to be. It would also make the language ambiguous as to accept as valid what would very likely supposed to be a mistake.

The only practical use I can think of right now is that it's less typing and is written more intuitively, just as writing attributes in mark-up feel more intuitive without uneccessary quotes, as in <foo id=45 /> instead of <foo id="45" />.

I think that enforcing nested structures makes for more efficient processing, too. By enforcing nested structures, the parser can push and pop nodes onto a single stack to keep track of the list of open nodes. With overlapped scopes, you'd need an ordered list of open scopes that you'd have to append to whenever you come across a begin-new-scope token, and then scan each time you come across an end-scope token to see which open scope is most likely to be the one it closes.

Although no programming languages support overlapping scopes, there are HTML parsers that support it as part of their error-recovery algorithms, including the ones in all major browsers.

Also, the switch statement in C allows for constructs that look something like overlapping scopes, as in Duff's Device:

switch(count%8)
  {
   case 0:  do{ *to = *from++;
   case 7:      *to = *from++;
   case 6:      *to = *from++;
   case 5:      *to = *from++;
   case 4:      *to = *from++;
   case 3:      *to = *from++;
   case 2:      *to = *from++;
   case 1:      *to = *from++;

              } while(--n>0);
  } 

So, in theory, a programming language can have similar semantics for scopes in general to allow these kinds of tricks for optimization when needed but readability would be very low.

The goto statement, along with break and continue in some languages also lets you structure programs to behave like overlapped scopes:

BOLD: while (bold)
 { styles.add(bold)
   print "BOLD"

   while(italic) 
    { styles.add(italic)
      print "BOTH";
      break BOLD;
    }
 }

italic-continued: 
    styles.remove(bold)
    print "ITALIC"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜