About Oracle: SYS

Wednesday, December 9, 2009

SYS_GUID()

A colleague of mine sent me this link, and wanted me to have a look at point 5:

Choose sys_guid() instead of sequences for populating meaningless PK-columns.
Traditional sequences are used for populating an ID-column.
Sequences have several disadvantages:
• Sequences have to be created, maintained and deployed.
• Sequences need PL/SQL-code to be fetched (the ‘select into’ construction is obsolete in 11g, but defaulting a column with sequence.nextval still does not work).
• When you introduce triggers to fetch the .nextval value, you introduce significant undesired performance penalties
• When moving your data from data from one environment to another (e.g. copy PRODUCTION data to ACCEPTANCE), you have to synchronize the .LASTVAL of all sequences.
• Fetching sequences can be a performance-critical in big environment (bulk fetch in Data warehouse for instance) which has to be fixed by increasing the CACHE property
Use guid's to overcome the disadvantages of sequences:
create table employees
( guid raw(32) default sys_guid() not null primary key);

I knew about the SYS_GUID() function, but didn't realize you can use it as a default value for a primary key column. That sure is convenient. Where other RDBMS all have some sort of autonumber/identity column to automatically fill an ID column, Oracle uses sequences, which is a little more cumbersome to implement. However, I found the story not really balanced. Before addressing each point and mentioning some of the disadvantages of SYS_GUID, let's see how it works. Using sequences, a simple setup can be like this:

rwijk@ORA11GR1> create sequence seq cache 1000
  2  /

Sequence created.

Here I create the sequence with "cache 1000", because I want a session to fetch 1000 numbers at once, instead of one each time. This reduces the overhead for Oracle to maintain the SYS.SEQ$ table. Gaps in the numbers might be large this way, but this shouldn't bother you. If it does bother you, you likely shouldn't be using a sequence, as they are never gap-free. A table is created without any reference to the sequence:

rwijk@ORA11GR1> create table t_seq
  2  ( id     number(9) primary key
  3  , filler varchar2(1000)
  4  )
  5  /

Table created.

To fill the id column with the sequence value, I prefer to just use the nextval function directly in my api. Another option is to use database triggers, but I'm on a mission to avoid those whenever possible. The next procedure represents such an api:

rwijk@ORA11GR1> create procedure p_seq
  2  is
  3  begin
  4    insert into t_seq (id,filler)
  5     select seq.nextval
  6          , lpad('*',1000,'*')
  7       from dual
  8    connect by level <= 20000
  9    ;
 10  end;
 11  /

Procedure created.

Using the SYS_GUID() function, a similar setup would look like this:

rwijk@ORA11GR1> create table t_sysguid
  2  ( id     raw(16) default sys_guid() primary key
  3  , filler varchar2(1000)
  4  )
  5  /

Table created.

rwijk@ORA11GR1> create procedure p_sysguid
  2  is
  3  begin
  4    insert into t_sysguid (filler)
  5     select lpad('*',1000,'*')
  6       from dual
  7    connect by level <= 20000
  8    ;
  9  end;
 10  /

Procedure created.

One object less (the sequence), and you don't need to bother about the ID column in the insert statement. The mentioned disadvantages of sequences per point:

Sequences have to be created, maintained and deployed
While obviously true, I don't think it is a major point. A sequence never comes alone, so we are already creating, maintaining and deploying database objects. But it adds up a little indeed.

Sequences need PL/SQL-code to be fetched (the ‘select into’ construction is obsolete in 11g, but defaulting a column with sequence.nextval still does not work)
You don't necessarily need PL/SQL to use a sequence, as shown above. So this is untrue. Although it is true that you cannot default a column with sequence.nextval.

When you introduce triggers to fetch the .nextval value, you introduce significant undesired performance penalties
So don't introduce database triggers to fetch the .nextval value :-)

When moving your data from data from one environment to another (e.g. copy PRODUCTION data to ACCEPTANCE), you have to synchronize the .LASTVAL of all sequences
This is true. Very minor disadvantage though, in my opinion. If we refresh the acceptance database, we don't only refresh the data, but also the objects itself.

Fetching sequences can be a performance-critical in big environment (bulk fetch in Data warehouse for instance) which has to be fixed by increasing the CACHE property.
The solution to the possible problem is already presented here.

No advantages of sequences over SYS_GUID are mentioned, so here are some of mine:

A RAW(16) column takes up more space than a NUMBER column.
The SYS_GUID is a RAW(16) taking up 16 bytes, whereas as an ID populated by a sequences typically is a NUMBER(6), NUMBER(9) or NUMBER(12). A reasonable average is NUMBER(9) which takes up only 4 bytes. When you have foreign key columns in your table, this effect is multiplied.

Ad-hoc querying has become more cumbersome.
Which one would you rather type. This:

select id from t_sysguid where id = '48000DA3C76E41A789C7E8925A91E28A'

Or this:

select id from t_seq where id = 1234

You can no longer see the order of inserts by the ID column.
When using the cache property and multiple sessions, this is neither necessarily true for sequence populated ID columns, but there is a strong correlation between the number and the time it was populated. This "information" is not there with SYS_GUID() populated columns.

Performance seems to be slightly less when using SYS_GUID()
The results of my test varied a lot. But generally it favoured the sequence approach. Here is the output of one of the more representative tests:

rwijk@ORA11GR1> begin
  2    runstats_pkg.rs_start;
  3    p_seq;
  4    runstats_pkg.rs_middle;
  5    p_sysguid;
  6    runstats_pkg.rs_stop(1000);
  7  end;
  8  /
Run1 draaide in 1670 hsecs
Run2 draaide in 2220 hsecs
Run1 draaide in 75.23% van de tijd

Naam                                                      Run1        Run2    Verschil
STAT.recursive calls                                     1,162          49      -1,113
LATCH.enqueue hash chains                                1,626         508      -1,118
STAT.hot buffers moved to head of LRU                    1,213          75      -1,138
STAT.dirty buffers inspected                             5,613       3,266      -2,347
STAT.free buffer requested                               6,215       3,482      -2,733
STAT.free buffer inspected                               6,678       3,577      -3,101
STAT.redo subscn max counts                              4,147         753      -3,394
STAT.heap block compress                                 3,678       7,119       3,441
LATCH.cache buffer handles                               4,922           0      -4,922
STAT.Heap Segment Array Inserts                         12,512       7,156      -5,356
STAT.HSC Heap Segment Block Changes                     12,582       7,156      -5,426
STAT.calls to get snapshot scn: kcmgss                   6,752         152      -6,600
LATCH.checkpoint queue latch                            17,879       9,818      -8,061
LATCH.cache buffers lru chain                           27,793      13,845     -13,948
LATCH.object queue header operation                     40,308      25,913     -14,395
STAT.consistent gets                                    26,810       7,302     -19,508
STAT.consistent gets from cache                         26,810       7,302     -19,508
STAT.consistent gets from cache (fastpath)              19,687         145     -19,542
STAT.redo entries                                       42,552      71,492      28,940
STAT.db block gets from cache (fastpath)                21,080      57,202      36,122
STAT.session logical reads                              98,920     144,900      45,980
LATCH.simulator hash latch                               5,752      56,028      50,276
STAT.db block gets from cache                           72,110     137,598      65,488
STAT.db block gets                                      72,110     137,598      65,488
STAT.db block changes                                   63,379     129,357      65,978
LATCH.sequence cache                                   150,100           0    -150,100
LATCH.cache buffers chains                             396,737     572,383     175,646
STAT.undo change vector size                         2,564,000   4,921,872   2,357,872
STAT.physical read total bytes                      17,776,640  23,355,392   5,578,752
STAT.physical read bytes                            17,776,640  23,355,392   5,578,752
STAT.cell physical IO interconnect bytes            17,776,640  23,355,392   5,578,752
STAT.physical IO disk bytes                         17,776,640  23,355,392   5,578,752
STAT.redo size                                      59,361,376  66,097,504   6,736,128

Run1 latches totaal versus run2 -- verschil en percentage
Run1      Run2  Verschil     Pct
679,069   707,672    28,603  95.96%

PL/SQL procedure successfully completed.

So while I am certainly not saying no to SYS_GUID(), I thought there was a little more to the story than the all-positive points in the link.

If you know some more advantages or disadvantages from one over the other, please say so in a comment.

15 comments:

SydOracleDecember 9, 2009 at 12:57 AM
Indexes ! Sequence generated values will always increment and be added at the high end of the index.
SYS_GUID values will be all over the place. On the plus side, that may reduce contention for hot blocks , but on the minus side your index inserts will be all over the place. All the arguments for/against reverse key indexes would apply.
ReplyDelete
Replies
oraclenerdDecember 9, 2009 at 4:12 AM
I've had a few people trying to convince me to use SYS_GUIDs, I'm just not buying it yet. I'm not convinced as to any real or perceived advantage.

"You can no longer see the order of inserts by the ID column."
You probably know this, but I thought it worth mentioning, on a RAC system you can't count on the sequence numbers to be in order. That quickly stopped me from using the ID for any type of ordering.

chet
ReplyDelete
Replies
Tim...December 9, 2009 at 7:54 AM
Hi.

I've used SYS_GUID in a project and it worked fine. Actually, the project in question used a combination of SYG_GUID and a Java stored procedure to generate UUIDs.

http://www.oracle-base.com/articles/9i/UUID9i.php

I wouldn't say either method (sequences or GUIDs) are a clear winner. As always, both methods have their pros and cons.

If you know you will be transferring data between systems, then avoiding sequences is actually a very big plus.

Cheers

Tim...
ReplyDelete
Replies
NigelDecember 9, 2009 at 1:03 PM
I can vouch for comparative slowness of guid over sequences; see Unique IDs for Multi-master Replication.

Regards Nigel
ReplyDelete
Replies
UnknownDecember 9, 2009 at 1:20 PM
Hi Rob,

Interesting post! There also must be a reasons why GUIDs are used in for example BPEL and ESB schemas.
A quick performance comparison can be found here.
Other links on the subject.

Regards,
Peter Paul
ReplyDelete
Replies
Rob van WijkDecember 9, 2009 at 9:33 PM
Excellent comments, all. Thanks for the extra (dis)advantages, twists (I forgot about RAC!), thoughts and the links!

Regards,
Rob.
ReplyDelete
Replies
Connor McDonaldDecember 23, 2009 at 1:32 PM
You *can* guarantee sequence order in a RAC system (but you do incur some cross-instance traffic to do so).

Cheers
Connor
ReplyDelete
Replies
Rob van WijkDecember 23, 2009 at 2:48 PM
Thanks Connor.

I just looked up some information and found this article about sequences and RAC: http://www.pythian.com/news/383/sequences-in-oracle-10g-rac/.
ReplyDelete
Replies
Rohan DeshmukhJanuary 29, 2010 at 11:31 AM
Is the SYS_GUID is incremental. I.e. can it's output be in sequencial order? or its just a unique string?
ReplyDelete
Replies
NigelJanuary 29, 2010 at 11:35 AM
@Rohan: it's not sequential, it's hust a string:

SQL> select sys_guid() from dual;

SYS_GUID()
--------------------------------
9B1085FF611A4518A3211A495E36B703

SQL> /

SYS_GUID()
--------------------------------
9BAE7CE712D24D90A289861086B450C7

SQL>

HTH, Nigel
ReplyDelete
Replies
NigelJanuary 29, 2010 at 11:37 AM
I should add, it isn't monotonic either; next two selects are:

SQL> /

SYS_GUID()
--------------------------------
A72FA8729A38444B82AEDEB875E6DF47

SQL> /

SYS_GUID()
--------------------------------
886B805C723340F3BD37F2C2A322ECC3

SQL>

Regards
ReplyDelete
Replies
NihimonJune 13, 2012 at 10:58 PM
FYI, I just had opposite results testing 1 million inserts with RAW(16) GUID vs NUMBER(16) sequence-based key.

About 13 seconds (on multiple runs) for GUID.
About 30 seconds (on multiple runs) for Sequence.

Connected to Oracle Database 11g Enterprise Edition Release 11.1.0.7.0

drop table mat;
create table mat (
mat_id raw(16) default sys_guid() primary key,
mat_val number
);
insert into mat (
mat_val
)
select level as mat_val
from dual
connect by level < 1000000
;
commit;

drop table mat;
drop sequence mat_seq;
create table mat (
mat_id number(16) primary key,
mat_val number
);
create sequence mat_seq;
insert into mat (
mat_id,
mat_val
)
select mat_seq.nextval as mat_id,
level as mat_val
from dual
connect by level < 1000000
;
commit;
ReplyDelete
Replies
SeanNovember 7, 2012 at 7:29 PM
Another con to sequences in a web environment is security. Sequence IDs can be easily guessed and allow access to data in nearby rows in the database.
ReplyDelete
Replies