开发者

What separates a "private language WTF" from merely bad library/API/DSL design?

Some of 开发者_C百科the most entertaining Daily WTF stories feature private languages run amok. However, domain-specific languages can be quite powerful and seem to be increasingly popular. And of course we can't program at all without good libraries, but as the adage goes, "library design is language design".

Nobody likes a bad API, but is the difference just one of degree, or are the WTFs a completely different species? Obviously this is subjective, so I made it a community wiki post. (The Stackoverflow co-founders famously had different opinions about whether one particular in-house language was even a WTF or not.)

My own hunch is that it's the attempt at generality that makes the WTF come out, but I'd like to see what other people think.

(This question was triggered by reading the comments to an answer by JaredPar to this question: https://stackoverflow.com/questions/901320/anti-joel-test/901361#901361)

(To clarify a little more, the term "private language" is often used with negative connotations, whereas "DSL" or "library" are neutral. What lines, if any, does an "in-house" tool cross on the way to being derided as a horrible "private language" besides just the usual things that might make it a bad tool? This doesn't have to be about a language; it could be a library or framework.)

FINAL EDIT: I've accepted Roger Pate's answer of "In essence? Nothing." because I think it's actually correct for the question I was asking. I'd like to highlight Aaronaught's answer about DSLs, though, because I think it's particularly good. Thanks.


I don't build a lot of DSLs, but I've had a bit of experience with them and I believe that there is a general answer for this, although the truth is that every situation is different.

  • Remember that the acronym DSL stands for Domain-Specific Language.

A DSL stops being useful when it is no longer specific. I believe that the majority of DSL horror stories (AKA "private languages") revolve around DSLs that simply try to do too many things. In some cases they may even try to be Turing-complete, at which point they're not much more than dysfunctional programming languages.

I'm including some real-life examples below the fold; skip to the end for the tl;dr version.


One example from my own experience is that of a messaging system between devices, or betweeen a PC and an external device. If you imagine an object-oriented API, you might end up with code that looks like this:

public abstract class Message
{
    public byte[] GetBytes()
    {
        using (MemoryStream ms = new MemoryStream())
        {
            byte[] result = new byte[ms.Length + 3];
            result[0] = 0xFF;
            result[1] = (byte)ms.Length;
            WriteMessageData(result, 2);
            result[result.Length - 1] = GetChecksum(result, 0,
                result.Length - 2);
            return result;
        }
    }

    protected abstract void WriteMessageData(byte[] buffer, int offset);
}

Don't get too hung up on the specifics of this, or how beautiful the code is(n't). The idea is that we have, I don't know, 30 different types of messages to send that are all completely different but share some common functionality, like a content length header and a checksum. Now we have to start building the messages:

public class AddMessage : Message
{
    private const byte id = 0x9F;

    protected override void WriteMessageData(byte[] buffer, int offset)
    {
        buffer[offset] = id;
        MessageUtil.WriteInt32(buffer, offset + 1, Num1);
        MessageUtil.WriteInt32(buffer, offset + 5, Num2);
    }

    public int Num1 { get; set; }
    public int Num2 { get; set; }
}

Again, don't think too hard about the details of the message. It doesn't really matter what it does. The point is, we had to write a class for it. We had to override some functionality. We didn't have to write a lot of code, but we had to write some code. I don't know about you, but the thought of writing 30 of these little one-off classes seems mind-numbing to me.

But we're not even done yet. We have to create the message, send it, and receive the result:

public int Add(int num1, int num2)
{
    AddMessage msg = new AddMessage();
    msg.Num1 = num1;
    msg.Num2 = num2;
    MessagingSystem.SendMessage(msg);
    AddResultMessage result = MessageSystem.Receive<AddResultMessage>();
    if (result == null)
    {
        throw new InvalidResultException("AddResultMessage");
    }
    return result.Sum;
}

Blah blah blah, whatever. This is kind of a best-case scenario. We're exposing a convenient little API but we have to keep writing these classes and methods to do it. As the number of messages grows to 10, 20, 50, 100, 1000... it starts to become a little ridiculous.

Wouldn't it be nice if instead of writing all of this boilerplate, we could just write down some "message definitions" somewhere?

Message(Add)
    Send: Num1 int, Num2 int
    Receive: Sum int

Message(Multiply)
    Send: Num1 int, Num2 int
    Receive: Product int

Message(Divide)
    Send: Divisor int, Dividend int
    Receive: Quotient int, Remainder int

OK, sure, you can define this in a data file and use some kludgy code where most of the validation and actual logic happens at runtime. But what we really want is to compile this data into something we can actually write code against, compile an application against, get compile-time type safety and testability. We want to go directly from the code above to the code below without doing any additional work:

MyMessagingSystem ms = new MyMessagingSystem();
int sum = ms.Add(3, 4).Sum;
int product = ms.Multiply(5, 6).Product;
DivideResult = ms.Divide(10, 5);  // Contains Quotient and Remainder properties

Now if we wave our hands a little and forget about how the DSL is compiled (and it's not really that difficult, I've done it), we've eliminated about 20 lines of tedious error-prone OO code in favour of about 3 lines of easy-to-understand DSL code.

I've worked on a project like this. There were a lot of messages. It took a little while to perfect the DSL and code generation, but once it was done, it saved me hours - no, days of effort, of writing and debugging tedious useless code that just does the same thing over and over again.

So why is this (in my opinion) a "good" DSL? Because it's specific. It does exactly one thing: It defines the format of a series of similar but still independent messages that I want to be able to generate strongly-typed classes for.

A key aspect of this DSL is that there is no user-defined logic. It's defining a very narrow aspect of the overall application, specifically, what goes in a message, and some send/receive pairings. It says nothing about how to encode the message or how to send it. It says nothing about the semantics of a message or what order specific messages should be sent in. It says nothing about the valid values for any given message field or how errors should be handled.

Of course, all of these aforementioned "additional features" can be implemented in the DSL; but the more we add, the more we take away. The more complicated the language becomes, the less "domain-specific" it really is. A lousy DSL (again, IMO) looks like this:

Event: PaymentReceived(Payment)
    Validation:
        Condition: Amount > 0, "Invalid payment amount"
        Condition: Date > Today - 7d, "Cannot backdate > 7 days"
    Actions:
        Update: Account(AccountID)
            SetProperty: Account.Balance, Account.Balance - Payment.Amount
            SetProperty: Account.LastPaymentDate, Payment.Date
        Notify: Billing
            Template: PaymentReceived.xlt
                Field: CustName, CustomerName
                Field: PaymentAmount, Amount
                Field: PaymentDate, Date

And so on and so forth, I'm not going to belabor the point. This looks deceptively simple and seductively powerful. Hey, look how easy it is to change the validation!

But is it easy? Is it really? Tomorrow, some manager determines that some customers never have money in the bank; their cheques always bounce and we want to reject payments of that type from them. Easy, just add a flag, right? But how do we add this type of validation? We have to look up some piece of information about the customer, and as it stands, the Validation grammar is only equipped to handle validation on the Payment itself. So we have to come up with some sort of hackish update to the DSL in order to accommodate it:

Event: PaymentReceived(Payment)
    Validation:
        Condition: All(
            PaymentType = Cheque,
            Account(Payment.AccountID).DelinquentFlag = False
        ), "Cheques no longer allowed for this customer"

Cute, although some people who have experienced this before are probably already starting to get that "uh oh..." feeling. The next day, the manager says: Hey, validation's working great, but we want a notification to get sent for this as well.

Well, we didn't really build conditional notifications into the DSL, but I guess we can add them:

        Notify: Management
        Condition: All(
            PaymentType = Cheque,
            Account(Payment.AccountID).DelinquentFlag = False
        )
            Template: DelinquentCheque.xlt
                Field: CustName, CustomerName,
                ...

What's going on here? This "simple" set of conditions and actions is starting to look pretty ugly. Not only that, but we're now copying and pasting. We're trying to handle these complex conditions in areas that were never designed to handle them and the DSL really has no facility for re-use.

But that's not the whole story. What's the real problem here?

The real problem is that this DSL is describing a complex process. It doesn't read like a collection of attributes, it reads like a set of instructions, and we already have a tool for writing general-purpose instructions, it's called a programming language. And I'll leave the details as an exercise to the reader, but it should be pretty obvious at this point that after a few more "revisions" to our spec above, it's probably going to be easier to just rewrite in a normal, general-purpose language.

The other real problem is that this DSL appears to be intended for consumption by non-technical users, not programmers, and yet it will eventually become far too complicated for anyone but a programmer to maintain. Processes aren't simple. That's why people hire us to analyze and code them and work out all the little inconsistencies. From what I've seen and read, DSLs intended for use by non-technical users usually end up not only not being used by said users, but also being very difficult for programmers to maintain, because they aren't sophisticated enough for the kinds of things a programmer needs to do.

Sure, this example above is technically a "domain specific language", but it's not adding any value over just having a well-documented domain model and API. It's mashing together all sorts of different concepts and massively violating the principle of cohesion. Every time we need to add a new feature, we need to start screwing around with the DSL grammar instead of just adding a few lines of code. It's really making our lives harder, not easier. The "generic business process language" seems destined to grow and grow until it becomes a pale imitation of Turbo Pascal 1.0.


tl;dr version:

So, what makes a Domain-Specific Language a "WTF?" In my experience, it's:

  • Not being truly domain-specific. The design appears to employ a very liberal definition of either "domain", "specific", or both.

  • Targeting end-users rather than developers. It's very tempting to think of a DSL as a "front-end" API and many DSL tutorials even seem to hint at this being an appropriate use case. Maybe it is, but if so, I personally haven't witnessed it.

  • Defining an abstract process. DSLs only really work for process definitions when the potential conditions and actions are part of a rigidly-defined superset. Most business processes are not like this at all; they are full of highly complex conditional and/or sequential logic. They reflect the fickle, impulsive thoughts and behaviour of humans, not the concrete specifications of a computer system.

  • Adding programming idioms. If you find yourself even thinking about concepts like loops, subroutines, inheritance, that sort of thing, it's time to take a step back and ask what the DSL is really accomplishing.

Wow, that was a lot of writing. Congratulations to anyone who got this far!


In essence? Nothing. (I've understood you as "what's the difference between a bad language and a bad library/API" rather than "what's the difference between a bad language/library/API and a good language/library/API.)

A decently-large enough library engenders using it like a dialect of the language, especially in certain languages (consider MFC, Qt, GTK, ...).

A library below that size threshold either isn't used often or by enough people for someone to get around to labeling it, or it's just silly to describe as an API, even when that term is technically true. (Imagine a library with exactly one function, or no functions and just serves as a collection of useful types like <stdint.h>.)

The only distinction you can draw is a library/API which doesn't try to break the language won't be called a "private language" or "DSL". For example Qt requires (does it still? it used to do so very strongly) a special preprocessor to add its additions to C++, and it exceeds "dialect" status at that point. MFC also walked that same line using macros.


I think it has to do with the principle of least surprise. Well designed DSLs and APIs do exactly what you expect them too (or do so a large percentage of the time). If you are using a good API (and you are clever and experienced) you will find yourself saying, "It should have a built-in way of doing such and such" and lo and behold, the API developer was thinking the same thing. Terrible APIs / private languages have unexpected or inconsistent behavior and make easy things difficult.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜