Monday, August 11, 2008

Implementing entity rules

Previous post in this serie: Implementing attribute and tuple rules



Attribute rules and tuple rules were fairly easy. The fun part starts with the entity rules and - in a next post - the inter-entity rules. According to the picture above, entity rules come in two flavours: unique identifier rules and other entity rules. However, since the other entity rules can be implemented in two clearly distinct ways, I will describe a separate subclass which I will call aggregation entity rules. But first things first:


Unique Identifier Rules

The implementation of rules belonging to this subclass is trivial at first glance: by a unique constraint and the accompanying unique index. However, not all unique identifier rules are simply bound to one column or a set of columns. But with the help of function based unique indexes you can implement the complexer ones as well. An example:

Business rule: A customer can only have one non-processed order

rwijk@ORA11G> create table orders
2 ( id number(9)
3 , customer_id number(9)
4 , processed_indicator varchar2(1)
5 , constraint inv_processed_domain_ck check (processed_indicator in ('Y','N'))
6 )
7 /

Tabel is aangemaakt.

rwijk@ORA11G> create unique index non_processed_orders_uk on orders
2 ( decode(processed_indicator,'N',customer_id)
3 )
4 /

Index is aangemaakt.

rwijk@ORA11G> insert into orders values (1, 101, 'Y')
2 /

1 rij is aangemaakt.

rwijk@ORA11G> insert into orders values (2, 101, 'Y')
2 /

1 rij is aangemaakt.

rwijk@ORA11G> insert into orders values (3, 101, 'N')
2 /

1 rij is aangemaakt.

rwijk@ORA11G> insert into orders values (4, 101, 'N')
2 /
insert into orders values (4, 101, 'N')
*
FOUT in regel 1:
.ORA-00001: Schending van UNIQUE-beperking (RWIJK.NON_PROCESSED_ORDERS_UK).


rwijk@ORA11G> insert into orders values (4, 101, 'Y')
2 /

1 rij is aangemaakt.


It is even possible to create a deterministic function and use it in a unique function based index. So this subclass can be implemented entirely declarative.


Aggregation rules

All former other entity rules that check for some aggregated value, will be called aggregation rules here. The "No more than 20 departments are allowed" example mentioned in the picture is an aggregation rule. These rules can be implemented by a check constraint on a on commit refreshable materialized view. This means they will be validated at the transaction level instead of at statement level. This is just perfect for these kind of rules, because for an aggregation rule to stay valid, you'll sometimes have to modify more than one row. Here is an example:

Business rule: The sum of all percentages per month for a customer has to be 100%.

rwijk@ORA11G> create table payment_percentages
2 ( customer_id number(9)
3 , month number(2,0)
4 , percentage number(3,0)
5 )
6 /

Tabel is aangemaakt.

rwijk@ORA11G> create materialized view log on payment_percentages
2 with rowid (customer_id,percentage) including new values
3 /

Snapshotlog is aangemaakt.

rwijk@ORA11G> create materialized view mv
2 refresh fast on commit
3 as
4 select customer_id
5 , sum(percentage) total_percentage
6 , count(*) count_all
7 , count(percentage) count_percentage
8 from payment_percentages
9 group by customer_id
10 /

Snapshot is aangemaakt.

rwijk@ORA11G> alter materialized view mv
2 add constraint mv_total_percentage_100_ck
3 check (total_percentage = 100)
4 /

Snapshot is gewijzigd.

rwijk@ORA11G> desc mv
Naam Null? Type
----------------------------------------- -------- ----------------------------
CUSTOMER_ID NUMBER(9)
TOTAL_PERCENTAGE NUMBER
COUNT_ALL NUMBER
COUNT_PERCENTAGE NUMBER

And that's all there is to it. Note that some extra aggregates were added to the materialized views. This is mandatory as can be read here in the documentation about restriction on fast refresh on materialized views with aggregates. The materialized view log will capture all changes, the materialized will be refreshed on commit time, and the check constraint will automatically validate the new data. Here is an example of how to deal with inserting and updating:

rwijk@ORA11G> insert into payment_percentages
2 select 1
3 , level
4 , 10
5 from dual
6 connect by level <= 8
7 /

8 rijen zijn aangemaakt.

rwijk@ORA11G> insert into payment_percentages
2 select 1
3 , 8+ level
4 , 5
5 from dual
6 connect by level <= 4
7 /

4 rijen zijn aangemaakt.

rwijk@ORA11G> commit
2 /

Commit is voltooid.


This commit is when the check constraint mv_total_percentage_100_ck on the materialized view gets validated. The sum of all percentages is calculated and it is 100%, so no error is raised.

rwijk@ORA11G> select customer_id
2 , month
3 , sum(percentage)
4 from payment_percentages
5 group by rollup ((customer_id,month))
6 order by month
7 /

CUSTOMER_ID MONTH SUM(PERCENTAGE)
----------- ---------- ---------------
1 1 10
1 2 10
1 3 10
1 4 10
1 5 10
1 6 10
1 7 10
1 8 10
1 9 5
1 10 5
1 11 5
1 12 5
100

13 rijen zijn geselecteerd.

rwijk@ORA11G> update payment_percentages
2 set percentage = 15
3 where customer_id = 1
4 and month = 11
5 /

1 rij is bijgewerkt.

rwijk@ORA11G> commit
2 /
commit
*
FOUT in regel 1:
.ORA-12008: Fout in pad voor vernieuwen van snapshot.
ORA-02290: CHECK-beperking (RWIJK.MV_TOTAL_PERCENTAGE_100_CK) is geschonden.


But now the sum is more than 100% and so the above error is raised. To successfully change a few values, several separate update statements are needed, like this:

rwijk@ORA11G> update payment_percentages
2 set percentage = 15
3 where customer_id = 1
4 and month = 11
5 /

1 rij is bijgewerkt.

rwijk@ORA11G> update payment_percentages
2 set percentage = 8
3 where customer_id = 1
4 and month = 2
5 /

1 rij is bijgewerkt.

rwijk@ORA11G> update payment_percentages
2 set percentage = 2
3 where customer_id = 1
4 and month = 5
5 /

1 rij is bijgewerkt.

rwijk@ORA11G> commit
2 /

Commit is voltooid.

rwijk@ORA11G> select customer_id
2 , month
3 , sum(percentage)
4 from payment_percentages
5 group by rollup ((customer_id,month))
6 order by month
7 /

CUSTOMER_ID MONTH SUM(PERCENTAGE)
----------- ---------- ---------------
1 1 10
1 2 8
1 3 10
1 4 10
1 5 2
1 6 10
1 7 10
1 8 10
1 9 5
1 10 5
1 11 15
1 12 5
100

13 rijen zijn geselecteerd.

And the sum is 100% again.


Other entity rules

Because of the split-off of the aggregation rules, this subclass has become a little smaller, but it still represents a good proportion of all business rules. The most known example of an other entity rule is an overlap check on several periods (startdate and enddate) belonging to a same group. This subclass is one we cannot solve declaratively, or at least it is not known by me. So we have to make a choice here: implement them using database triggers or implement them inside your api.

Business rule: The periods of one customer are not allowed to overlap.

Implementing other entity rules with database triggers

You can see an example of a period overlap check - amidst the implementation of two other entity rules - in this post about a test with RuleGen. Check for the "check_period_overlap" procedure. As you can see, the trickier parts are to serialize access by customer_id and circumventing the mutating table error. Trickier, but standard. The advantage of using database triggers to enforce your other entity rules, is that once it is in place, you are covered. And although I generally think that database triggers are evil, the static data constraint rules are the only place they can be acceptable if you are willing to lose performance.

Implementing other entity rules inside your api

If you are not willing to compromise performance then you should take this approach. In the previously mentioned post you'll also see an example of validating this rule using an api. It is implemented here as part of that one api procedure. But what to do when more than one api procedure can violate the business rule? You'll want to prevent double coding, but you'll also want it to be optimized for single row online inserts and updates, as well as in batch inserts and updates at night. An answer to this problem is to code a business rule validation procedure only once and have it accept a table of values/records by creating a SQL type. The SQL type will be the input parameter of the validation procedure. The single row update procedure will call it with a table with 1 value/record, and the batch update procedure will deliver many more records. An example:

rwijk@ORA11G> create table subscriptions
2 ( id number(9) primary key
3 , customer_id number(9) not null
4 , startdate date not null
5 , enddate date
6 , amount number(6)
7 , unique (customer_id,startdate)
8 )
9 /

Tabel is aangemaakt.

rwijk@ORA11G> insert /*+ append */ into subscriptions
2 select level
3 , ceil(level/2)
4 , case mod(level,2) when 0 then date '2006-01-01' else date '2007-01-01' end
5 , case mod(level,2) when 0 then date '2007-01-01' else date '2008-01-01' end
6 , 0
7 from dual
8 connect by level <= 10000
9 /

10000 rijen zijn aangemaakt.

rwijk@ORA11G> create type tt_customer_ids is table of number(9);
2 /

Type is aangemaakt.

rwijk@ORA11G> create package subscriptions_api
2 as
3 procedure single_row_update
4 ( p_id in number
5 , p_startdate in date
6 , p_enddate in date
7 );
8 procedure batch_update
9 ;
10 end subscriptions_api;
11 /

Package is aangemaakt.

rwijk@ORA11G> create package body subscriptions_api
2 as
3 procedure lock_customers(p_customer_ids tt_customer_ids)
4 is
5 l_lock_dummy integer;
6 begin
7 for i in 1 .. p_customer_ids.count
8 loop
9 l_lock_dummy := dbms_lock.request
10 ( release_on_commit => true
11 , id => dbms_utility.get_hash_value
12 ( name => 'contracts_pkg.lock_customer' || to_char(p_customer_ids(i))
13 , base => 1
14 , hash_size => power(2,30)
15 )
16 );
17 end loop;
18 end lock_customers
19 ;
20 function exists_overlap(p_customer_ids tt_customer_ids) return boolean
21 is
22 l_customer_id_dummy subscriptions.customer_id%type;
23 begin
24 select customer_id
25 into l_customer_id_dummy
26 from ( select s.customer_id
27 , sign
28 ( lead(s.startdate,1,date '9999-12-31')
29 over
30 (partition by s.customer_id order by s.startdate)
31 - nvl(s.enddate,date '9999-12-31')
32 ) sgn
33 from subscriptions s
34 where s.customer_id in
35 ( select /*+ cardinality (c 1) */
36 column_value
37 from table(p_customer_ids) c
38 )
39 )
40 where sgn = -1
41 ;
42 return true;
43 exception
44 when no_data_found then
45 return false;
46 when too_many_rows then
47 return true;
48 end exists_overlap
49 ;
50 procedure check_period_overlap (p_customer_ids tt_customer_ids)
51 is
52 begin
53 if exists_overlap(p_customer_ids)
54 then
55 raise_application_error
56 ( -20001
57 , 'Periods of subscription of one customer cannot overlap.'
58 );
59 end if;
60 end check_period_overlap
61 ;
62 procedure single_row_update
63 ( p_id in number
64 , p_startdate in date
65 , p_enddate in date
66 )
67 is
68 l_customer_ids tt_customer_ids;
69 begin
70 update subscriptions
71 set startdate = p_startdate
72 , enddate = p_enddate
73 where id = p_id
74 returning customer_id bulk collect into l_customer_ids
75 ;
76 lock_customers(l_customer_ids);
77 check_period_overlap(l_customer_ids)
78 ;
79 end single_row_update
80 ;
81 procedure batch_update
82 is
83 l_customer_ids tt_customer_ids;
84 begin
85 update subscriptions
86 set enddate = enddate + interval '1' year
87 where amount = 0
88 returning customer_id bulk collect into l_customer_ids
89 ;
90 lock_customers(l_customer_ids);
91 check_period_overlap(l_customer_ids)
92 ;
93 end batch_update
94 ;
95 end subscriptions_api;
96 /

Package-body is aangemaakt.

rwijk@ORA11G> begin
2 subscriptions_api.single_row_update
3 ( p_id => 1234
4 , p_startdate => date '2006-01-01'
5 , p_enddate => date '2008-01-01'
6 );
7 end;
8 /
begin
*
FOUT in regel 1:
.ORA-20001: Periods of subscription of one customer cannot overlap.
ORA-06512: in "RWIJK.SUBSCRIPTIONS_API", regel 55
ORA-06512: in "RWIJK.SUBSCRIPTIONS_API", regel 77
ORA-06512: in regel 2


rwijk@ORA11G> begin
2 subscriptions_api.batch_update;
3 end;
4 /
begin
*
FOUT in regel 1:
.ORA-20001: Periods of subscription of one customer cannot overlap.
ORA-06512: in "RWIJK.SUBSCRIPTIONS_API", regel 55
ORA-06512: in "RWIJK.SUBSCRIPTIONS_API", regel 91
ORA-06512: in regel 2

A few things to note:
  1. All customer records are validated in this implementation, which is good when only few records per customer are expected. When lots of records per customer are expected, this will be less feasible.

  2. The cardinality hint is used to not use the default blocksize cardinality (8K here), but 1. The resulting plan will therefore favour the use of an index on customer_id (and startdate).

  3. You can extend this example by using error logging and capturing all records that violate the business rule, and even report all customer_id's and periods in the error message.

  4. When implementing several business rules in your api, you will want to use only one SQL type, so the resulting type will have to accomodate for all columns needed in all business rules.

  5. I have not much experience with this method yet, but I will use and test it in the near future. If anything noteworthy will occur, it will be on this blog of course.


Next post: Implementing inter-entity rules

6 comments:

  1. I discovered your blog via The Database Programming blog, and follow it ever since. Thanks for your carefully written posts. My only complain is that Oracle's messages are in Dutch (I suppose). Can it be configured to output in English?

    ReplyDelete
  2. I had a comment about the "other entity rules"...

    It would not surprise me if you were aware of this already and chose not to bring it up for the sake of clarity, but...

    DBMS_LOCK will not protect serializable transactions against updates or inserts by other transactions that committed after the serializable transaction started. This is an issue for any approach based on DBMS_LOCK, whether implemented in triggers or an API.

    Consider the following sequence of events:

    1. Transaction A starts with serializable isolation level.

    2. Transaction B starts.

    3. Transaction B acquires a DBMS_LOCK and makes a change to the data.

    4. Transaction B commits. This releases the DBMS_LOCK (assuming you used "release_on_commit => true").

    5. Transaction A acquires the same DBMS_LOCK and makes a conflicting change to the data. Transaction A can acquire the lock because Transaction B already released it. Transaction A can make a conflicting change because Transaction A can't "see" the change that Transaction B made.

    6. Transaction A commits.

    To illustrate (sorry for the lack of formatting), let's take the row in your example:

    SQL> select * from subscriptions where id = 1234
    2 /

    ID CUSTOMER_ID STARTDATE ENDDATE AMOUNT
    ---------- ----------- --------- --------- ----------
    1234 617 01-JAN-06 01-JAN-07 0

    Now, let's insert another row for this customer for January 1 to July 1, 2005.

    SQL> insert into subscriptions
    2 values(10001, 617, date '2005-01-01', date '2005-07-01', 0)
    3 /

    1 row created.

    SQL> commit
    2 /

    Commit complete.

    Now, let's start a serializable transaction (let's call it A).

    SQL> set transaction isolation level serializable
    2 /

    Now, in a different transcation (let's call it B), let's move the start date for the row with ID 1234 back to July 1, 2005.

    SQL> begin
    2 subscriptions_api.single_row_update
    3 ( p_id => 1234
    4 , p_startdate => date '2005-07-01'
    5 , p_enddate => date '2007-01-01'
    6 );
    7 end;
    8 /

    PL/SQL procedure successfully completed.

    SQL> commit
    2 /

    Now, in Transaction A, let's move the end date for the new row forward to January 1, 2006.

    SQL> begin
    2 subscriptions_api.single_row_update
    3 ( p_id => 10001
    4 , p_startdate => date '2005-01-01'
    5 , p_enddate => date '2006-01-01'
    6 );
    7 end;
    8 /

    PL/SQL procedure successfully completed.

    SQL> commit
    2 /

    Commit complete.

    The two rows we updated now overlap.

    SQL> select * from subscriptions where customer_id = 617
    2 /

    ID CUSTOMER_ID STARTDATE ENDDATE AMOUNT
    ---------- ----------- --------- --------- ----------
    10001 617 01-JAN-05 01-JAN-06 0
    1234 617 01-JUL-05 01-JAN-07 0
    1233 617 01-JAN-07 01-JAN-08 0


    It won't completely solve the problem to use "release_on_commit => false" either. With "release_on_commit => false", the lock will still be released when Transaction B's session ends, so the problem will still exist if this happens before Transaction A attempts the update. If Transaction B's session does not end, it will eventually need to release the locks it's holding in order to avoid excessive resource consumption and serialization, and potentially deadlocks.

    I realize that the use of serializable isolation level is not that common. Still, it might be worth modifying the code to at least raise an error if a serializable transaction attempts to acquire the lock.

    Incidentally, SELECT...FOR UPDATE doesn't have the same problem with serializable transactions (although it has its own limitations), in that it will raise an "ORA-08177: can't serialize access for this transaction" if a serializable transaction tries to lock a row that was locked by another transaction that committed after the serializable transaction started. Still, you need to have a physical "parent" row (say, in a CUSTOMERS table) to lock.

    For some discussion of the issue, see:

    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:474221407101
    http://technology.amis.nl/blog/?p=1375


    Hope this helps.

    ReplyDelete
  3. Bruno,

    Thanks for your comment.

    I agree that English error messages and feedback messages would be more appropriate for this blog. However, I prefer Dutch messages for everything else I do on my databases.

    I have just experimented a bit, and it seems that if I change my NLS_LANG registry setting prior to opening my database, this results in English messages. Now I only need to have the discipline to do that prior to executing the script that leads to the blog output. I'll probably make a .bat file in the near future to do just that.

    In the meantime you'll have to google the ORA-codes for the messages in my older posts, I'm afraid.

    ReplyDelete
  4. Brian,

    Yes I was already aware of this, thanks to your comments on Lucas' AMIS-post. And indeed I decided not to bring it up because it is so rare.

    However, I did not read the entire AskTom thread before, and I also wasn't aware of how a serializable transaction can be detected. So I just did some catching up.

    After all the reading, my first conclusion is that we shouldn't try to make it fullproof in serializable transactions and - like you said - we should just throw an error after detecting the transaction is in serializable mode.

    It's a very interesting topic, and I'd like to thank you very much for bringing this up. Meanwhile we will just ignore this phenomenon, like we all do in all production systems :-)

    Regards,
    Rob.

    ReplyDelete
  5. Rob, why do you do a two column index?

    rwijk@ORA11G> create unique index non_processed_orders_uk on orders
    2 ( decode(processed_indicator,'N',customer_id)
    3 , decode(processed_indicator,'N','N')
    4 )
    5 /

    One column should be sufficient

    rwijk@ORA11G> create unique index non_processed_orders_uk on orders
    2 ( decode(processed_indicator,'N',customer_id) )
    3 /

    Alternatively:

    rwijk@ORA11G> create unique index non_processed_orders_uk on orders
    2 ( case processed_indicator when 'N' then customer_id else null end )
    3 /

    Did I overlook something?

    Greetings

    ReplyDelete
  6. Robert,

    No, you did not overlook something. You are right, the second column is completely unnecessary. I've edited the post accordingly. Thanks!

    Regards,
    Rob.

    ReplyDelete