开发者

How to write good Unit Tests in Functional Programming

I'm using functions instead of classes, and I find that I can't tell when another function that it relies on is a dependency that should be individually unit-tested or an internal implementation detail that should not. How can you tell which one it is?

A little context: I'm writing a very simple Lisp interpreter which has an eval() function. It's going to have a lot of responsibilities, too many actually, such as evaluating symbols differently than lists (everything else evaluates to itself). When evaluating symbols, it has its own complex workflow (environment-lookup), and when evaluating lists, it's even more complicated, since the list can be a macro, function, or special-form, each of which have their own complex workflow and set of responsibilities.

I can't tell if my eval_symbol() and eval_list() functions should be considered internal implementation details of eval() which should be tested through eval()'s own unit tests, or genuine dependencies in their own right which should be unit-t开发者_StackOverflowested independently of eval()'s unit tests.


A significant motivation for the "unit test" concept is to control the combinatorial explosion of required test cases. Let's look at the examples of eval, eval_symbol and eval_list.

In the case of eval_symbol, we will want to test contingencies where the symbol's binding is:

  • missing (i.e. the symbol is unbound)

  • in the global environment

  • is directly within the current environment

  • inherited from a containing environment

  • shadowing another binding

  • ... and so on

In the case of eval_list, we will want to test (among other things) what happens when the list's function position contains a symbol with:

  • no function or macro binding

  • a function binding

  • a macro binding

eval_list will invoke eval_symbol whenever it needs a symbol's binding (assuming a LISP-1, that is). Let's say that there are S test cases for eval_symbol and L symbol-related test cases for eval_list. If we test each of these functions separately, we could get away with roughly S + L symbol-related test cases. However, if we wish to treat eval_list as a black box and to test it exhaustively without any knowledge that it uses eval_symbol internally, then we are faced with S x L symbol-related test cases (e.g. global function binding, global macro binding, local function binding, local macro binding, inherited function binding, inherited macro binding, and so on). That's a lot more cases. eval is even worse: as a black box the number of combinations can become incredibly large -- hence the term combinatorial explosion.

So, we are faced with a choice of theoretical purity versus actual practicality. There is no doubt that a comprehensive set of test cases that exercises only the "public API" (in this case, eval) gives the greatest confidence that there are no bugs. After all, by exercising every possible combination we may turn up subtle integration bugs. However, the number of such combinations may be so prohibitively large as to preclude such testing. Not to mention that the programmer will probably make mistakes (or go insane) reviewing vast numbers of test cases that only differ in subtle ways. By unit-testing the smaller internal components, one can vastly reduce the number of required test cases while still retaining a high level of confidence in the results -- a practical solution.

So, I think the guideline for identifying the granularity of unit testing is this: if the number of test cases is uncomfortably large, start looking for smaller units to test.

In the case at hand, I would absolutely advocate testing eval, eval-list and eval-symbol as separate units precisely because of the combinatorial explosion. When writing the tests for eval-list, you can rely upon eval-symbol being rock solid and confine your attention to the functionality that eval-list adds in its own right. There are likely other testable units within eval-list as well, such as eval-function, eval-macro, eval-lambda, eval-arglist and so on.


My advice is quite simple: "Start somewhere!"

  • If you see a name of some def (or deffun) that looks like it might be fragile, well, you probably want to test it, don't you?
  • If you're having some trouble trying to figure out how your client code can interface with some other code unit, well, you probably want to write some tests somewhere that let you create examples of how to properly use that function.
  • If some function seems sensitive to data values, well, you might want to write some tests that not only verify it can handle any reasonable inputs properly, but also specifically exercise boundary conditions and odd or unusual data inputs.
  • Whatever seems bug-prone should have tests.
  • Whatever seems unclear should have tests.
  • Whatever seems complicated should have tests.
  • Whatever seems important should have tests.

Later, you can go about increasing your coverage to 100%. But you'll find that you will probably get 80% of your real results from the first 20% of your unit test coding (Inverted "Law of the Critical Few").

So, to review the main point of my humble approach, "Start somewhere!"

Regarding the last part of your question, I would recommend you think about any possible recursion or any additional possible reuse by "client" functions that you or subsequent developers might create in the future that would also call eval_symbol() or eval_list().

Regarding recursion, the functional programming style uses it a lot and it can be difficult to get right, especially for those of us who come from procedural or object-oriented programming, where recursion seems rarely encountered. The best way to get recursion right is to precisely target any recursive features with unit tests to make certain all possible recursive use cases are validated.

Regarding reuse, if your functions are likely to be invoked by anything other than a single use by your eval() function, they should probably be treated as genuine dependencies that deserve independent unit tests.

As a final hint, the term "unit" has a technical definition in the domain of unit testing as "the smallest piece of code software that can be tested in isolation.". That is a very old fundamental definition that may quickly clarify your situation for you.


This is somewhat orthogonal to the content of your question, but directly addresses the question posed in the title.

Idiomatic functional programming involves mostly side effect-free pieces of code, which makes unit testing easier in general. Defining a unit test typically involves asserting a logical property about the function under test, rather than building large amounts of fragile scaffolding just to establish a suitable test environment.

As an example, let's say we're testing extendEnv and lookupEnv functions as part of an interpreter. A good unit test for these functions would check that if we extend an environment twice with the same variable bound to different values, only the most recent value is returned by lookupEnv.

In Haskell, a test for this property might look like:

test = 
  let env = extendEnv "x" 5 (extendEnv "x" 6 emptyEnv)
  in lookupEnv env "x" == Just 5

This test gives us some assurance, and doesn't require any setup or teardown other than creating the env value that we're interested in testing. However, the values under test are very specific. This only tests one particular environment, so a subtle bug could easily slip by. We'd rather make a more general statement: for all variables x and values v and w, an environment env extended twice with x bound to v after x is bound to w, lookupEnv env x == Just w.

In general, we need a formal proof (perhaps mechanized with a proof assistant like Coq, Agda, or Isabelle) in order to show that a property like this holds. However, we can get much closer than specifying test values by using QuickCheck, a library available for most functional languages that generates large amounts of arbitrary test input for properties we define as boolean functions:

prop_test x v w env' =
  let env = extendEnv x v (extendEnv x w env')
  in lookupEnv env x == Just w

At the prompt, we can have QuickCheck generate arbitrary inputs to this function, and see whether it remains true for all of them:

*Main> quickCheck prop_test
+++ OK, passed 100 tests.
*Main> quickCheckWith (stdArgs { maxSuccess = 1000 }) prop_test
+++ OK, passed 1000 tests.

QuickCheck uses some very nice (and extensible) magic to produce these arbitrary values, but it's functional programming that makes having those values useful. By making side effects the exception (sorry) rather than the rule, unit testing becomes less of a task of manually specifying test cases, and more a matter of asserting generalized properties about the behavior of your functions.

This process will surprise you frequently. Reasoning at this level gives your mind extra chances to notice flaws in your design, making it more likely that you'll catch errors before you even run your code.


I'm not really aware of any particular rule of thumb for this. But it seems like you should be asking yourself two questions:

  1. Can you define the purpose of eval_symbol and eval_list without needing to say "part of the implementation of eval?
  2. If you see a test fail for eval, would it be useful to to see whether any tests for eval_symbol and eval_list also fail?

If the answer to either of those is yes, I would test them separately.


Few months ago I wrote a simple "almost Lisp" interpreter in Python for an assignment. I designed it using Interpreter design pattern, unit tested the evaluation code. Then I added the printing and parsing code and transformed the test fixtures from abstract syntax representation (objects) to concrete syntax strings. Part of the assignment was to program simple recursive list processing functions, so I added them as functional tests.

To answer your question in general, the rules are pretty same like for OO. You should have all your public functions covered. In OO public methods are part of a class or an interface, in functional programming you most often have visibility control based around modules (similar to interfaces). Ideally, you would have full coverage for all functions, but if this isn't possible, consider TDD approach - start by writing tests for what you know you need and implement them. Auxilliary functions will be result of refactoring and as you wrote tests for everything important before, if tests work after refactoring, you are done and can write another test (iterate).

Good luck!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜