What are the lengths/limits C preprocessor as a language creation tool? Where can I learn more about these?
In his FAQ, Bjarne Stroustrup says:
To build [Cfront, the first C++ compiler], I first used C to write a "C with Classes"-to-C preprocessor. "C with Classes" was a C dialect that became the immediate ancestor to C++... I then wrote the first version of Cfront in "C with Classes".
When I read this, it piqued my interest in the C preprocessor. I'd seen its macro capabilities as suitable for simplifying common expressions but hadn't thought about its ability to significantly add to syntax and semantics on the level that I imagine bringing classes to C took.
So now I have some questions on my mind:
Are there other examples of this approach to bootstrapping a language off of C?
Is the source to Stroustrup's ori开发者_如何学Cginal work available anywhere?
Where could I learn more about the specifics of utilizing this technique?
What are the lengths/limits of that approach? Could one, say, create a set of preprocessor macros that let someone write in something significantly Lisp/Scheme like?
Note that Stroustrup is not saying that he used the C preprocessor (cpp) to create C With Classes -he didn't. He wrote his own preprocessor using C. And Cfront was a true compiler not a preprocessor at all. The C preprocessor is in fact deeply unsuitable for language development, as it has no parsing abilities whatsoever.
For an example of the kind of monstrosity of a "language" you can create using the C preprocessor, have a look at this header file:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh/mac.h
It's from the source code of the original Unix shell written by Steve Bourne and it aims to turn C into an Algol like language. Here is an example of what a piece of code looks like when using it:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh/args.c
That looks kind of bizarre but it is still C. It may look like a different language, but because it is implemented in the preprocessor, there's no syntactic support for it e.g.
WHILE foo
DO
SWITCH
....
ENDSW
OD
is all very fine and compiles nicely, but so does
WHILE foo
DO
SWITCH
....
OD
ENDSW
The C preprocessor isn't what you're looking for. It can't add syntax and semantics like Cfront did.
Cfront was an actual compiler that translated C with Classes, later C++, to C. It was a preprocessor only in that it ran before the C compiler. I used a program called f2c once to translate FORTRAN 77 code to C code. It worked on the same principle.
There are languages like Common Lisp with enough macro power to add new syntax and semantics, and languages like Forth where the system is flexible enough to accomodate changes, but that won't work for most languages.
As mentioned by others, C++ was not created using the C preprocessor (CPP).
That said, you can do insane things with the CPP and recursion; I'm pretty sure it's Turing Complete. The libraries I'm about to link to use a lot of ugly tricks to get interesting behavior. Although you can build a kind of elegance on top, many might consider it a Turing Tarpit.
For a gentler introduction to this stuff, try Cloak.
To go deeper, look at
- Boost -- "cross platform", but uglier for it; part of popular C++ library
- Chaos -- followup by Boost-pp guy, but supports only C99-compliant tools, ergo more elegant
- Order -- from what I can tell, a Lisp-like language, inspired by Chaos, built on pure CPP
E.g. with Order or Chaos, you can write a recursive fibonacci sequence generator in pure CPP.
I think Objective-C started out the same way. It was a preprocessor that built some C code that was then passed to a C compiler. But it was not THE C preprocessor in the sense of #define FOO
, it ran as an additional step before or after the standard C preprocessor. The result of any number of preprocessor steps can then be sent to the C compiler.
It sounds like his "C with Classes"-to-C preprocessor was not the same thing as the standard C preprocessor, since he speaks specifically of writing this preprocessor himself.
The C preprocessor is very limited. It can be used to make shorthands for common expressions, but that's about it. When you attempt to define new language constructs with it, it rapidly becomes more cumbersome and brittle.
I suggest you start with the GCC Macros documentation which provides quite a bit of interesting information about the GCC implementation of the C Preprocessor.
Clay Bridges in his answer provides a couple of examples of using the C Preprocessor. The one about the Order language is interesting with a number of examples. The author of Order does bring up one issue that he/she ran into, C Preprocessor implementations may not fully implement more recent standards.
In general using the C Preprocessor to develop some kind of a bastardized language such as what Steve Bourne did when writing the Bourne Shell for Unix is an activity that I would consider suitable grounds for rendition followed by multiple water boarding sessions.
The main thing to remember about the C Preprocessor is that it is manipulating text tokens. So the C Preprocessor will allow quite a bit of tinkering with syntax. For instance the following macro, which compiles with Visual Studio 2005 without errors, shows the unintuitive text manipulation possible.
#define TESTOP(a,x,y,op,z) a (x) op (y); z
void f(void)
{
int i = 0, j = 5;
TESTOP( ,i,j,+=, );
TESTOP( ,i,(j + 2),+=, );
TESTOP({,i,(j + 2),+=,});
}
However you do need to understand and work around some of the limitations of the C Preprocessor when pushing the boundaries. See the GCC topic Macro Pitfalls for some of the issues to consider.
And you can use the C Preprocessor as a general macro and text preprocessor that targets some tool other than the C compiler. For instance the older imake utility for build automation used the C Preprocessor to provide an extensive macro facility.
Where I have seen the C Preprocessor used most effectively was in simplifying complex code and declarations.
One case that I have seen was using the C Preprocessor to provide a state machine language which was used to create the data structures and data to describe a state machine. The resulting data structures were then used as an argument to a state machine function. This allowed multiple different state machine procedures to be written in the C Preprocessor defined language with the state machine processing done by a single function.
Microsoft, in their Microsoft Foundation Classes (MFC), used the C Preprocessor to hide quite a few of the messaging details of MFC. Once you get used to it something like the following is reasonably easy to read. Since the Visual Studio IDE had tools to generate and modify the code using the macros it was pretty straight forward for the programmer.
BEGIN_MESSAGE_MAP(CFrameworkWndDoc, CWindowDocument)
//{{AFX_MSG_MAP(CFrameworkWndDoc)
ON_WM_CHAR()
ON_WM_TIMER()
ON_MESSAGE(WU_EVS_DFLT_LOAD, OnDefaultWinLoad)
ON_MESSAGE(WU_EVS_POPUP_WINDOW, OnPopupWindowByName)
ON_MESSAGE(WU_EVS_POPDOWN_WINDOW, OnPopdownWindowByName)
ON_MESSAGE(WM_APP_CONNENGINE_MSG_RCVD, OnConnEngineMsgRcvd)
ON_MESSAGE(WM_APP_XMLMSG_MSG_RCVD, OnXmlMsgRcvd)
ON_MESSAGE(WM_APP_BIOMETRIC_MSG_RCVD, OnBiometricMsgRcvd)
ON_MESSAGE(WM_APP_SHUTDOWN_MSG, OnShutdownMsgRcvd)
ON_MESSAGE(WM_POWERBROADCAST, OnPowerMsgRcvd)
ON_MESSAGE(WM_APP_SHOW_HIDE_GROUP, OnShowHideGroupMsgRcvd)
//}}AFX_MSG_MAP
END_MESSAGE_MAP()
Especially when you see what the macros used look like:
#define BEGIN_MESSAGE_MAP(theClass, baseClass) \
PTM_WARNING_DISABLE \
const AFX_MSGMAP* theClass::GetMessageMap() const \
{ return GetThisMessageMap(); } \
const AFX_MSGMAP* PASCAL theClass::GetThisMessageMap() \
{ \
typedef theClass ThisClass; \
typedef baseClass TheBaseClass; \
static const AFX_MSGMAP_ENTRY _messageEntries[] = \
{
#define END_MESSAGE_MAP() \
{0, 0, 0, 0, AfxSig_end, (AFX_PMSG)0 } \
}; \
static const AFX_MSGMAP messageMap = \
{ &TheBaseClass::GetThisMessageMap, &_messageEntries[0] }; \
return &messageMap; \
} \
PTM_WARNING_RESTORE
// for Windows messages
#define ON_MESSAGE(message, memberFxn) \
{ message, 0, 0, 0, AfxSig_lwl, \
(AFX_PMSG)(AFX_PMSGW) \
(static_cast< LRESULT (AFX_MSG_CALL CWnd::*)(WPARAM, LPARAM) > \
(memberFxn)) },
#define ON_WM_TIMER() \
{ WM_TIMER, 0, 0, 0, AfxSig_vw, \
(AFX_PMSG)(AFX_PMSGW) \
(static_cast< void (AFX_MSG_CALL CWnd::*)(UINT_PTR) > ( &ThisClass :: OnTimer)) },
精彩评论