Help with TDD approach to a real world problem: linker
I'm trying to learn TDD. I've seen examples and discussions about how it's easy to TDD a coffee vending machine firmware from smallest possible functionality up. These examples are either primitive or very well thought-out, it's hard to tell right away. But here's a real world problem.
Linker.
A linker, at its simplest, reads one object file, does magic, and writes one executable file. I don't think I can simplify it further. I do believe the linker design may be evolved, but I have absolutely no idea where to start. Any ideas on how to approach this?
Well, probably the whole linker is too big a problem fo开发者_如何学Pythonr the first unit test. I can envision some rough structure beforehand. What a linker does is:
- Represents an object file as a collection of segments. Segments contain code, data, symbol definitions and references, debug information etc.
- Builds a reference graph and decides which segments to keep.
- Packs remaining segments into a contiguous address space according to some rules.
- Relocates references.
My main problem is with bullet 1. 2, 3, and 4 basically take a regular data structure and convert it into a platform-dependent mess based on some configuration. I can design that, and the design looks feasible. But 1, it should pick a platform-dependent mess, in one of the several supported formats, and convert it into a regular structure.
The task looks generic enough. It happens everywhere you need to support multiple input formats, be it image processing, document processing, you name it. Is it possible to TDD ? It seems like either test is too simple and I easily hack it to green, or it's a bit more complex and I need to implement the whole object/image/document format reader which is a lot of code. And there is no middle ground.
First, have a look at "Growing Object Oriented Software Guided By Tests" by Freeman & Pryce.
Now, my attempt to answer a difficult question in a few lines.
TDD does require you to think (i.e. design) what you're going to do. You have to:
- Think in small steps. Very small steps.
- Write a short test, to prove that the next small piece of behaviour works.
- Run the test to show that it fails
- Do the simplest thing possible to get the test to pass
- Refactor ruthlessly to remove duplication and improve the structure of the code
- Run the test(s) again to make sure it all still works
- Go back to 1.
An initial idea (design) of how your linker might be structured will guide your initial tests. The tests will enforce a modular design (because each test is only testing a single behaviour, and there should be minimal dependencies on other code you've written).
As you proceed you may find your ideas change. The tests you've already written will allow you to refactor with confidence.
The tests should be simple. It is easy to 'hack' a single test to green. But after each 'hack' you refactor. If you see the need for a new class or algorithm during the refactoring, then write tests to drive out its interface. Make sure that the tests only ever test a single behaviour by keeping your modules loosely coupled (dependency injection, abstract base classes, interfaces, function pointers etc.) and use fakes, stubs and mocks to isolate the code under test from the rest of your system.
Finally use 'customer' tests to ensure that you have delivered functional features.
It's a difficult change in mind-set, but a lot of fun and very rewarding. Honest.
You're right, a linker seems a bit bigger than a 'unit' to me, and TDD does not excuse you from sitting down and thinking about how you're going to break down your problem into units. The Sudoku saga is a good illustration of what goes wrong if you don't think first!
Concentrating on your point 1, you have already described a good collection of units (of functionality) by listing the kinds of things that can appear in segments, and hinting that you need to support multiple formats. Why not start by dealing with a simple case like, say, a file containing just a data segment in the binary format of your development platform? You could simply hard-code the file as a binary array in your test, and then check that it interprets just that correctly. Then pick another simple case, and test for that. Keep going.
Now the magic bit is that pretty soon you'll see repeated structures in your code and in your tests, and because you've got tests you can be quite aggressive about refactoring it away. I suspect this is the bit that you haven't experienced yet, because you say "It seems like either test is too simple and I easily hack it to green, or it's a bit more complex and I need to implement the whole object/image/document format reader which is a lot of code. And there is no middle ground." The point is that you should hack them all to green, but as you're doing that you are also searching out the patterns in your hacks.
I wrote a (very simple) compiler in this fashion, and it mostly worked quite well. For each syntactic construction, I wrote the smallest program that I could think of which used it in some observable way, and had the test compile the program and check that it worked as expected. I used a proper parser generator as you can't plausibly TDD your way into one of them (you need to use a little forethought!) After about three cycles, it became obvious that I was repeating the code to walk the syntax tree, so that was refactored into something like a Visitor.
I also had larger-scale acceptance tests, but in the end I don't think these caught much that the unit tests didn't.
This is all very possible.
A sample from the top of my head is NHAML.
This is ASP.NET ViewEngine that converts plain text to the .NET native code.
You can have a look at source code and see how it is tested.
I guess what I do is come up with layers and blocks and sub-divide to the point where I might be thinking about code and then start writing tests.
I think your tests should be quite simple: it's not the individual tests that are the power of TDD but the sum of the tests.
One of the principles I follow is that a method should fit on a screen - when that's the case, the tests are usually simple enough.
Your design should allow you to mock out lower layers so that you're only testing one layer.
TDD is about specification, not test.
From your simplest spec of a linker, your TDD test has just to check whether an executable file has been created during the linker magic if you feed it with an object file.
Then you write a linker that makes your test succeed, e.g.:
- check whether input file is an object file
- if so, generate a "Hello World!" executable (note that your spec didn't specify that different object files would produce different executables)
Then you refine your spec and your TDD (these are your four bullets).
As long as you can write a specification you can write TDD test cases.
精彩评论