开发者

database, requests, performance, cache

I need some input on how to design a database layer.

In my application I have a List of T. The information in T have information from multiple database tables.

There are of course multiple ways to do this. Two ways that I think of is :

chatty database layer and cacheable:

List<开发者_高级运维;SomeX> list = new List<SomeX>();
foreach(...) {
    list.Add(new SomeX() {
        prop1 = dataRow["someId1"],
        prop2 = GetSomeValueFromCacheOrDb(dataRow["someId2"])
    });
}

The problem that I see with the above is that if we want a list of 500 items, it could potentially make 500 database requests. With all the network latency and that. Another problem is that the users could have been deleted after we got the list from the database but before we are trying to get it from cache/db, which means that we will have null-problems. Which we have to handle manually. The good thing is that it's highly cacheable.

non chatty but not cacheable:

List<SomeX> list = new List<SomeX>();
foreach(...) {
    list.Add(new SomeX() {
        prop1 = dataRow["someId1"],
        prop2 = dataRow["someValue"]
    });
}

The problem that I see with the above is that its hard to cache, since potentially all users have unique lists. The other problem is that it will be a lot of joins which could result in a lot of reads against the database. The good thing is that we know for sure that all information exists after the query is run (inner join etc)

non so chatty, but still cacheable

A third option could be to first loop through the data rows, and collect all necessary someId2 and then make one more database request to get all the SomeId2 values.


"The problem that I see with the above is that if we want a list of 500 items, it could potentially make 500 database requests. With all the network latency and that."

True. Could also create unnecessary contention and consume server resources maintaining locks as you iterate over a query.

"Another problem is that the users could have been deleted after we got the list from the database but before we are trying to get it from cache/db, which means that we will have null-problems."

If I take that quote, then this quote:

"The good thing is that it's highly cacheable."

Is not true, because you've cached stale data. So strike off the only advantage so far.

But to directly answer your question, the most efficient design, which seems to be what you are asking, is to use the database for what it is good for, enforcing ACID compliance and various constraints, most notably pk's and fk's, but also for returning aggregated answers to cut down on round trips and wasted cycles on the app side.

This means you either put SQL into your app code, which has been ruled to be Infinite Bad Taste by the Code Thought Police, or go to sprocs. Either one works. Putting the code into the App makes it more maintainable, but you'll never be invited to any more elegant OOP parties.


Some suggestions:

SQL is a set based language, so don't design things for iterating over loops. Even with stored procedures, still see cursors now and then when a set based query will solve the issue. So, always try and get the information with 1 query. Now sometimes this isn't possible but in the majority this will be. You can also design Views to make your querying easier if you have a schema with many tables to pull the information that is needed with one statement.

Use proxies. Let's say I have an object with 50 properties. At first you display a list of objects to the user. In this case, I would create a proxy of the most important properties and display that to the user, maybe 2 or three important ones like name, ID, etc. This cuts down on amount of information sent initially. When the user actually wants to edit or change the object, then make a second query to get the "full" object. Only get what you need. This is especially important over the web when serialization XML between the layers.

Come up with a paging strategy. Most systems work fine until they get a lot of data and then the query comes to a halt because it is reurning 1000s of data rows/records. Page early and often. If you are doing a web application, probably paging directly in the database will be the most performant because only the paged data is being sent between the layers.

Data caching depends on the data. For highly volatile data (changing all the time) caching isn't worth it. By for semi-volatile or non-volatile data, caching can be worth it, but you have to manage the cache either directly or indirectly if you are using a built in framework.

A good place to use a cache is say you have a zip codes table. Certianly, those don't change that often and you could cache those to boost performance if you had a zip code drop down in your application. This is just an example, but caching IMO depends on the type of data.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜