A data structure problem

2023-01-08 04:49 问答作者：

Given a sequence of integers, there are a number of queries. Each query has a range [l, r], and you are to find the median of the given range [l, r]

The number of queries can be as large as 100,000 The length of the sequence can be as large as 100,000

I wonder if there is any data structure can support such query

My solution:

I consult my partner today and he tells to use partition tree.

We can build a partition tree in nlog(n) time and answer each query in log(n) time

The partition tree actually is the process of merge sort, but for each node in the tree, it saves the number of integers that go to the left subtree. Thus, we can use this information to deal with the query.

here is my code:

This program is to find the x in a given interval [l, r], that minimize the following equation.

alt text http://acm.tju.edu.cn/toj/3556_01.jpg

Explanation:

seq saves the sequence

pos saves the position after sort

ind saves the index

cntL saves the number of integers that go to the left tree in a given range

#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
#define N 100008
typedef long long LL;
int n, m, seq[N], ind[N], pos[N], next[N];
int cntL[20][N];
LL sum[20][N], sumL, subSum[N];

void build(int l, int r, int head, int dep)
{
    if (l == r)
    {
        cntL[dep][l] = cntL[dep][l-1];
        sum[dep][l] = sum[dep][l-1];
        return ;
    }
    int mid = (l+r)>>1;
    int hl = 0, hr = 0, tl = 0, tr = 0;
    for (int i = head, j = l; i != -1; i = next[i], j++)
    {
        cntL[dep][j] = cntL[dep][j-1];
        sum[dep][j] = sum[dep][j-1];
        if (pos[i] <= mid)
        {
            next[tl] = i;
            tl = i;
            if (hl == 0) hl = i;
            cntL[dep][j]++;
            sum[dep][j] += seq[i];
        }
        else
        {
            next[tr] = i;
            tr = i;
            if (hr == 0) hr = i;
        }
    }
    next[tl] = -1;
    next[tr] = -1;
    build(l, mid, hl, dep+1);
    build(mid+1, r, hr, dep+1);
}

int query(int left, int right, int ql, int qr, int kth, int dep)
{
    if (left == right)
    {
        return ind[left];
    }
    int mid = (left+right)>>1;
    if (cntL[dep][qr] - cntL[dep][ql-1] >= kth)
    {
        return query(left, mid, left+cntL[dep][ql-1]-cntL[dep][left-1], left+cntL[dep][qr]-cntL[dep][left-1]-1, kth, dep+1);
    }
    else
    {
        sumL += sum[dep][qr]-sum[dep][ql-1];
        return query(mid+1, right, mid+1+ql-left-(cntL[dep][ql-1]-cntL[dep][left-1]), mid+qr+1-left-(cntL[dep][qr]-cntL[dep][left-1]), \
                kth-(cntL[dep][qr]-cntL[dep][ql-1]), dep+1);
    }
}

inline int cmp(int x, int y)
{
    return seq[x] < seq[y];
}

int main()
{
    int ca, t, i, j, middle, ql, qr, id, tot;
    LL ans;
    scanf("%d", &ca);
    for (t = 1; t <= ca; t++)
    {
        scanf("%d", &n);
        subSum[0] = 0;
        for (i = 1; i <= n; i++) 
        {
            scanf("%d", seq+i);
            ind[i] = i;
            subSum[i] = subSum[i-1]+seq[i];
        }
        sort(ind+1, ind+1+n, cmp);
        for (i = 1; i <= n; i++)
        {
            pos[ind[i]] = i;
            next[i] = i+1;
        }
        next[n] = -1;
        build(1, n, 1, 0);
        printf("Case #%d:\n", t);
        scanf("%d", &m);
        while (m--)
        {
            scanf("%d%d", &ql, &qr);
            ql++, qr++;
            middle = (qr-ql+2)/2;
            sumL= 0;
            id = query(1, n, ql, qr, middle, 0);
            ans = subSum[qr]-sub开发者_JAVA百科Sum[ql-1]-sumL;
            tot = qr-ql+1;
            ans = ans-(tot-middle+1)*1ll*seq[id]+(middle-1)*1ll*seq[id]-sumL;
            printf("%lld\n", ans);
        }
        puts("");
    }
}

This is called the Range Median Query problem. The following paper might be relevant: Towards Optimal Range Medians. (Free link, thanks to belisarius).

From the abstract of the paper:

We consider the following problem: Given an unsorted array of n elements, and a sequence of intervals in the array, compute the median in each of the subarrays defined by the intervals. We describe a simple algorithm which needs O(nlogk+klogn) time to answer k such median queries. This improves previous algorithms by a logarithmic factor and matches a comparison lower bound for k=O(n). The space complexity of our simple algorithm is O(nlogn) in the pointer machine model, and O(n) in the RAM model. In the latter model, a more involved O(n) space data structure can be constructed in O(nlogn) time where the time per query is reduced to O(logn/loglogn). We also give efficient dynamic variants of both data structures, achieving O(log^2n) query time using O(nlogn) space in the comparison model and O((logn/loglogn)^2) query time using O(nlogn/loglogn) space in the RAM model, and show that in the cell-probe model, any data structure which supports updates in O(log^O(1)n) time must have Ω(logn/loglogn) query time.

Our approach naturally generalizes to higher-dimensional range median problems, where element positions and query ranges are multidimensional—it reduces a range median query to a logarithmic number of range counting queries.

Of course, you could preprocess the whole array in O(n^3) time (or perhaps even O(n^2logn) time) and O(n^2) space to be able to return the median in O(1) time.

Additional constraints might help simplify the solution. For instance, do we know that r-l will lesser than a known constant? etc...

继续阅读：algorithm data-structures

A data structure problem

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？