开发者

How does this not make varchar2 inefficient?

Supp开发者_StackOverflowose I have a table with a column name varchar(20), and I store a row with name = "abcdef".

INSERT INTO tab(id, name) values(12, 'abcdef');

How is the memory allocation for name done in this case?

There are two ways I can think of:

a)

20 bytes is allocated but only 6 used. In this case varchar2 does not have any significant advantage over char, in terms of memory allocation.

b)

Only 6 bytes is allocated. If this is the case, and I addded a couple of more rows after this one,

INSERT INTO tab(id, name) values(13, 'yyyy');
INSERT INTO tab(id, name) values(14, 'zzzz');

and then I do a UPDATE,

UPDATE tab SET name = 'abcdefghijkl' WHERE id = 12;

Where does the DBMS get the extra 6 bytes needed from? There can be a case that the next 6 bytes are not free (if only 6 were allocated initially, next bytes might have been allotted for something else).

Is there any other way than shifting the row out to a new place? Even shifting would be a problem in case of index organized tables (it might be okay for heap organized tables).


There may be variations depending on the rdbms you are using, but generally:

Only the actual data that you store in a varchar field is allocated. The size is only a maximum allowed, it's not how much is allocated.

I think that goes for char fields also, on some systems. Variable size data types are handled efficiently enough that there is no longer any gain in allocating the maximum.

If you update a record so that it needs more space, the record inside the same allocation block are moved down, and if the records no longer fit in the block, another block is allocated and the records are distributed between the blocks. That means that records are continous inside the allocation blocks, but the blocks doesn't have to be continous on the disk.


It certainly doesn't allocate more space then needed, this would defeat the point of using the variable length type.

In the case you mention I would think that the rows below would have to be moved down on the page, perhaps this is optimized somehow. I don't really know the exact details, perhaps someone else can comment further.


This is probably heavily database dependent.

A couple of points though: MVCC observing databases don't actually update data on disk or in memory cache. They insert a new row with the updated data and mark the old row as deleted from a certain transaction on. After a while the deleted row is not visible to any transactions and it's reclaimed.

For the space storage issue, it's usually in the form of 1-4 bytes of header + data (+ padding)

In the case of chars, the data is padded to reach the sufficient length. In the case of varchar or text, the header stores the length of the data that is following.


Edit For some reason I thought this was tagged Microsoft SQL Server. I think the answer is still relevant though

That's why the official recommendation is

  • Use char when the sizes of the column data entries are consistent.
  • Use varchar when the sizes of the column data entries vary considerably.
  • Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.

It's a trade off you need to consider when designing your table structure. Probably you would need to consider the frequency of updates vs reads in this calculation too

Worth noting that for char a NULL value still uses all the storage space. There is an addin for Management Studio called SQL Internals Viewer that allows you to see easily how your rows are stored.


Given the VARCHAR2 in the question title, I assume your question is focused around Oracle. In Oracle, you can reserve space for row expansion within a data block with the use of the PCTFREE clause. That can help mitigate the effects of updates making rows longer.

However, if Oracle doesn't have enough free space within the block to write the row back, what it does it is called row migration; it leaves the original address on disk alone (so it doesn't necessarily need to update indexes), but instead of storing the data in the original location, it stores a pointer to that row's new address.

This can cause performance problems in cases where a table is heavily accessed by indexes if a significant number of the rows have migrated, as it adds additional I/O to satisfy queries.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜