How to create a stored function updating rows in Postgres?
I have used Postgres with my Django project for some time now but I never needed to use stored functions. It is very important for me to find the most efficient solution for the following problem:
I have a table, which contains the following columns: number | last_update | growth_per_second
An开发者_StackOverflowd I need an efficient solution to update the number based on the last_update and the growth factor, and set the last_update value to current time. I will probably have 100, maybe 150k rows. I need to update all rows in the same time, if possible, but if it will take too long I can split it in smaller parts.
Store what you can't calculate quickly.
Are you sure you need to maintain this information? If so, can you cache it if querying it is slow? You're setting yourself up for massive table thrash by trying to keep this information consistent in the database.
First if you want to go this route, start with the PostgreSQL documentation on server programming, then come back with a question based on what you have tried. You will want to get familiar with this area anyway because depending on what you are doing....
Now, assuming your data is all inserts and no updates, I would not store this information in your database directly. If it is a smallish amount of information you will end up with index scans anyway and if you are returning a smallish result set you should be able to calculate this quickly.
Instead I would do this: have your last_update column be a foreign key to the same table. Suppose your table looks like this:
CREATE TABLE hits (
id bigserial primary key,
number_hits bigint not null,
last_update_id bigint references hits(id),
....
);
Then I would create the following functions. Note the caveats below.
CREATE FUNCTION last_update(hits) RETURNS hits IMMUTABLE LANGUAGE SQL AS $$
SELECT * FROM hits WHERE id = $1.last_update_id;
$$;
This function allows you, on a small result set, to traverse to the last update record. Note the immutable designation here is only safe if you are guaranteeing that there are no updates or deletions on the hits table. If you do these, then you should change it to stable, and you lose the ability to index output. If you make this guarantee and then must do an update, then you MUST rebuild any indexes that use this (reindex table hits), and this may take a while....
From there, we can:
CREATE FUNCTION growth(hits) RETURNS numeric immutable language sql as $$
SELECT CASE WHEN ($1.last_update).number_hits = 0 THEN NULL
ELSE $1.number_hits / ($1.last_update).number_hits
END;
$$;
Then we can:
SELECT h.growth -- or alternatively growth(h)
FROM hits
WHERE id = 12345;
And it will automatically calculate it. If we want to search on growth, we can index the output:
CREATE INDEX hits_growth_idx ON hits (growth(hits));
This will precalculate for searching purposes. This way if you want to do a:
SELECT * FROM hits WHERE growth = 1;
It can use an index scan on predefined values.
Of course you can use the same techniques to precalculate and store, but this approach is more flexible and if you have to work with a large result set, you can always self-join once, and calculate that way, bypassing your functions.
精彩评论