How to Test Functions w/ Complex Data Interactions
Currently, I am working on system that does quite a bit of reporting-style functions that consumes many different data points and transforms them into larger, sometimes flattened outputs. Most of my app is built upon a variation of the repository pattern. Due to this, I have a suite of mock-repositories开发者_运维知识库 that I use for testing scenarios. The problem that I am running into is that the interaction between these data points is so complex that it is quickly become a maintenance nightmare to maintain the "mock data". Here is a mock example:
public class SomeReportingEntity
{
private IProductRepo ProductRepo;
private IManagerRepo ManagerRepo;
private ILocationRepo LocationRepo;
private IOrdersService OrdersService;
private IEmployeeRepo EmployeeRepo;
public ReportingEntity(IProductRepo ipr, IManagerRepo imr, ILocationRepo ilr, IOrdersService ios,
IEmployeeRepo ier){
//Load these to private vars...
}
//This is the function that I want to test...
public SomeReportingEntity GetManagerSalesByRegionReport()
{
//Make a complex join on all sub collections. These
//sub collections are all under test individually.
var MangerSalesByRegionItems = From x in ProductRepo.CurrentProducts()
Join y in OrdersService.FutureOrders() On ...
Join z in EmployeeRepo.ActiveEmployees() On ...
Join a in LocationRepo.GetAllRegions() On ...
Join b In ManagerRepo.GetActiveManagers On ...
Select new SomeReportingEntity() With { ... }
return MangerSalesByRegionItems.ToList();
}
}
Admittedly, this is a very contrived example but the basic idea that I want to emphasize is that I have several repositories that I am joining and I need to create many tests to ensure that this complex query does as expected. Due to the fact that the joining operations are so complex, it makes the mock data VERY difficult to keep in line - especially as I have to add more associations and test additional points. In addition, I need to be able to enter specific record states into the mocks (such as an employee lacking an assigned manager) to verify that query handles those situations appropriately.
So here are my questions:
- What is the best way to "mock" this data so that it is not such a matinenance nightmare? I have had many people suggest building an in-memory database to support this.
- Am I really suffering from an architecture issue here? In reporting scenarios, I find myself in this pattern quite a bit where I take many disassociated data points and merge them into a new, hybrid entity. With the onset of Linq, it is very easy to do and has high clarity of intent, but sometimes it feels like I am cheating a little.
The first thing you want to do is make centralized object that knows how to retrieve the data for different repositories. Since this is reporting only, it's easier because you don't have to worry about change tracking.
From a logistical standpoint, one thing I would consider is making a local database to hold the remote data (update periodically using agents). This would remove some of the issues of calling remote services and aggregating their data on the fly. You would also be able to pre-process some of the data at the start.
When I use the repository pattern, I couple it with the Unit Of Work pattern. The Unit of Work is the guy that does all the legwork for you. Theoretically, your UoW could bring in the data from the multiple services and present it to the repositories based on configuration.
For testing, you can use the InMemoryUnitOfWork to provide all the data in one single place.
I've been working on data-heavy project myself. What has worked for us is to use the Repository itself to hydrate objects and then serialize them to XML. We pull the XML file into our test project and use that as the starting point for our automated tests. It's nice because it ensures that your mock data looks like real data.
Our tests tend to look like this...
var object1 = XmlUtil.LoadObject1("filename1");
var object2 = XmlUtil.LoadObject2("filename2");
var result = SomeConverter.Convert(object1, object2);
Assert("somevalue", result.Property1);
If you need to do inline lookups, you can add a mock repository that would provide the same level of dependency injection.
The downside of this approach is if the data schema changes. Sometimes, a test can become obsolete if the data schema has changed. If your schema is still under a lot of flux, I would keep your automated test small until the schema settles down. Focus on unit tests until you know that the schema is relaitively stable.
You have to decide exactly what you want to test.
One way to do this might be to pretend you're using TDD. Pretend that your GetManagerSalesByRegionReport
method does not exist (or actually delete it). You'll have to:
- Write a failing unit test. What's the simplest thing for it to test: that you can call the method and that it doesn't throw an exception when there's nothing wrong with the data.
- You'll need to create the method, empty. It should return
void
since your test doesn't need it to return anything. - Your test should now pass.
- Add a test to ensure that a
List
of the appropriate type is returned, even if none of the sub-repositories have data. - You'll have to change the method to return your list type, and you'll have to change it to return
null
. Your test will still fail, so change it to return an emptyList
and it will pass.
What's left? Those are INNER
joins, so you won't get any data back unless all the repositories contain at least one row. So, test for that: create a test where each repo contains one row and ensure the returned list contains the appropriate number of rows. Then, test for the appropriate properties per returned row. Then test that no data is returned if any of the repos contain no rows.
Then, maybe test what happens if some of the repos contain more than one row.
Then, I don't know what would be left to test.
精彩评论