开发者

Splitting list into multiple lists to gain a speedup?

Let's say my list is about 1,000,000 entries long. To acess an item, the time would be O(500,000), which seems very long to me.

What happens when I split the list up into multiple lists ? Let's look at an example:

Splitting the list into 10 parts, I'd have a list as follows:

splitted_list = [
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries],
    [list with 100,000 entries]
]

The time to acess an item would be O(5) + O(50,000) = O(50,005) and gives a speedup about 1000% !

When spl开发者_如何学JAVAitting the original list about it's root, which is 1000 in this case, this would give us a list containg 1000 lists having another 1000 entries.

splitted_list = [
    [list with 1000 entries],
    [list with 1000 entries],
    [list with 1000 entries],
    [list with 1000 entries],
    ...
]

Now take a look at the time to acess an item:

O(500) + O(500) = O(1000)
O(1000) < O(50,005) < O(500,000)

This is the optimal speed up about 1000 times ! Incredible to belive, I think, so my question:

Does this apply in practice, too, or is this just theory ?


Get item by index from list is O(1) whatever the size of the list.


The answer to your question is that you are thinking of linked lists, in which each element holds a pointer to the next. These have O(n) indexing, since the only way to get the nth element is to walk through the list from the beginning.

Your idea is related to various data structures, of which the closest is probably a skip list. This is a data structure based on a linked list but with "highways" of nodes that skip ahead more than one element of the list. The advantage is that you can run down the highway to get to the middle of the list, and then drop down to the "slower lanes" when you need individual-element precision, giving O(log n) indexing efficiency -- the same as a binary tree. Of course, the downside is that it's more complicated (and slower) to perform other linked-list operations such as random inserts.

Python lists, however, are implemented under-the-hood as dynamically growing arrays. These have O(1) indexing, since to get the third element you can simply add three (units) to the memory address of the first element, without having to traverse all elements in between.

You may be interested in the Wikipedia article on data structures.


I am assuming that you are talking about finding an element in the lists.

If you are talking about splitting a sorted list into many sorted lists with pointers to their heads, congratulations, you have almost discovered the B-tree.

If those lists are really arrays (i.e. you have constant-time random access), you can also do a binary search.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜