开发者

What was Tim Sweeney thinking? (How does this C++ parser work?)

Tim Sweeney of Epic MegaGames is the lead developer for Unreal and a programming language geek. Many years ago posted the following screen shot to VoodooExtreme:

What was Tim Sweeney thinking? (How does this C++ parser work?)

As a C++ programmer and Sweeney fan, I was captivated by this. It shows generic C++ code that implements some kind of scripting language where that language itself seems to be generic in the sense that it can define its own grammar.

Mr. Sweeney never explained himself. :-)

开发者_JAVA技巧

It's rare to see this level of template programming, but you do see it from time to time when people want to push the compiler to generate great code or because they want to create generic code (for example, Modern C++ Design).

Tim seems to be using it to create a grammar in Parser.cpp - you can see what look like prioritized binary operators. If that is the case, then why does Test.ae look like it's also defining a grammar?

Obviously this is a puzzle that needs to be solved. Victory goes to the answer with a working version of this code, or the most plausible explanation, or to Tim Sweeney himself if he posts an answer. :-)


I asked Mr. Sweeney via email and received this answer:

The C++ code is using template classes I wrote that implement parser combinators. The idea is to start with some basic parsers, such as literals where PLit<'A'> parses a literal character 'A', PAny<> parses any character but fails if at the end of fail, PEof fails unless we're not at the end of the file, etc. And then we combine them into arbitrary trees using combinators like PAnd which parses a then b, and succeeds only if both succeed -- else it fails with the parse point unmoved. And if it succeeds, the result is a struct containing two fields, one for the result of a, and one for the result of b. And so on.

The implementation is messy in C++ for a number of reasons, including that templates don't support arbitrary variadic parameters, and without first-class lambdas we can't easily process the results inline with the parser.

Here are some snippets of the template code, from which you can probably figure out the details of the framework.

// Parses a literal item.
UBOOL LiteralEvaluate (UClass* C,class FParseInBase& In,class FParseOutBase& Out)
{
 FParseInMark M(In);
 NNode* e = In.GetNextSource();
 if (e && e->IsA(C))
 {
  Out.Callback(e);
  return 1;
 }
 M.Restore(In);
 return 0;
}
// Optional item of the specified type.
template <class U> struct Optional
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  U::Evaluate(In,Out);
  return 1;
 }
};
// Ignore items by absorbing them; retains boolean logic but not callback.
template <class T> struct Ignore
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  return T::Evaluate(In,GIgnore);
 }
};
// Zero or more items of the specified type.
template <class T> struct ZeroOrMore
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  while (T::Evaluate(In,Out));
  return 1;
 }
};
// One or more items of the specified type.
template <class T> struct OneOrMore
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  for( INT i=0; T::Evaluate(In,Out); i++ );
  return i>0;
 }
};
// Always fails.
struct RFalse
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  return 0;
 }
};
// Always succeeds.
struct RTrue
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  return 1;
 }
};
// Parses the first matching items of the specified subtypes of T.
template <class A,class B=RFalse,class C=RFalse,class D=RFalse > struct Or
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  return A::Evaluate(In,Out) || B::Evaluate(In,Out) || C::Evaluate(In,Out) || D::Evaluate(In,Out);
 }
};
// Parses all the specified items.
template <class A,class B=RTrue,class C=RTrue,class D=RTrue> struct And
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  FParseInMark Mark(In);
  Conjunction<NNode> Q;
  if( A::Evaluate(In,Q) && B::Evaluate(In,Q) && C::Evaluate(In,Q) && D::Evaluate(In,Q) )
  {
   Q.Forward(Out);
   return 1;
  }
  Mark.Restore(In);
  return 0;
 }
};
// A separated list.
template <class A,class B> class SeparatedList : public Or<And<A,B,SeparatedList>,A> {};
// Integer comparison.
template <INT A,INT B> struct IsAtLeast
{
 static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
 {
  return A>=B;
 }
};

That Test.ae was an experimental scripting language I was implementing in 1999-2001 -- that color scheme was trendy back then, I swear. :-)

The code shown defines metadata for the language constructs. The language went a long way down the Smalltalk "everything is an object" path, dealing with first-class metaclasses and related issues, but I ultimately abandoned it when I became familiar with advanced type systems work in Haskell, Cayenne, Coq, and other languages.

Nowadays -

I'm not a fan of implementing parsers or compilers in C++, since the code tends to be bloated by 70-80% compared to a similar implementation in a modern functional language like Haskell. Read up on Haskell parser combinators for more detail -- the resulting simplicity and directness is exemplary and it's done in a rigorous, type-safe way.


Can't tell for sure, but the C++ code kinda sorta looks like Spirit, a C++ parser generator that makes extensive use of templates. Test.ae looks like it's metaprogramming (defining language details in the language itself), which is harder to do in C++ (templates are a start, but is error prone and ugly) than it would be in some other target language (e.g., UnrealScript, which is what I assume test.ae is written in).

So - it looks like Parser.cpp defines the base grammar for UnrealScript (using Spirit), and Test.ae is defining extensions to UnrealScript.


I don't know what Sweeney did, and I'll assume that other answers about using Spirit are in, uh, the right spirit. I have no experience with Spirit templates, but my understanding is that if you define a complex grammar with it, it becomes pretty difficult to handle (as well as slow to compile). Other people's actual experience should be used to guage the truth of this.

There are other ways to implement extensions to C++, e.g., using program transformations and extendible grammars. See this SO answer on augmenting the C++ grammar itself with arbitrary extensions, where very complex extensions are possible and were in fact used.

Template metaprogramming generates interesting code where the templates are specifically invoked. Using program transformations you can generate arbitrarily interesting code at any point in the program, e.g, its as if the "templates" (additional syntax) changes the semantics any way you think is useful.


The screenshot is obviously from an MSVC 6.0 or earlier time frame, which did not really appreciate complex templates (and certainly did not support partial template specialization). I've not used spirit. From these screenshots, it's impossible to tell what Sweeney is truly doing beyond defining what looks to be a DSL in Test.ae.

The only full C++ statements you can see are in Parser.cpp - and they don't tell you much of anything except he's declaring 3 types. You really can't tell much of anything - too much is obscured by the 'Test.ae' window.


It is similar to YARD/Biscuit library

http://p-stade.sourceforge.net/biscuit/index.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜