Wednesday, April 17, 2013

OGh APEX World 2013

Last Tuesday, the fourth OGh APEX World day took place, again at Figi Zeist. Several people have already written about this day:

Here is a short writeup about my experiences that day.

For me, this year was a little different. In the first three editions of the OGh APEX day, Learco Brizzi, Marti Koppelmans and myself "just" invite the speakers and put together the program. This year, I was also a speaker and Ciber had a demo booth which was my base location for that day.

No changes, fortunately, at the keynote. Each year, we are very grateful that the APEX product development team sends a member to our event to deliver a presentation, mostly about upcoming features. This year, David Peake, the product manager of APEX himself, delivered the keynote. His talk consisted of two parts. The first part was about how APEX will fit in the new 12c pluggable database architecture and the second part was about several 5.0 features that may or may not make it to the final product. For more details about this keynote, I'll not repeat Christian Rokitta, but simply refer you to his blogpost.

After the keynote, we had programmed a total of 15 sessions in 3 tracks. The first track were customer cases. A track designed to learn how other companies use APEX in their organizations. This year we had lots of interesting cases to choose from, of which unfortunately only 5 could be selected. The second track was the international track, where all sessions were in English, especially for our foreign guests, so they too can have a day full of interesting sessions. The third track was an allround technical track, in Dutch, with sessions for beginners as well as expert developers. It was my impression that during each of the five slots, the 300 participants were spread out pretty evenly across the three tracks.

The first session I chose was Carsten Czarski's "APEX Hosting In The Own Company: apex.mycompany.com". He had a very lively story about what you need to setup when hosting an APEX development environment in your own company. This is a great way to spread the adoption of APEX: if the development environment is in place, the applications will soon follow. Especially the importance and usage of the resource manager was well explained.

After the break, I presented my paper about professional software development using APEX. On the main stage, due to a pretty high number of preregistrations. This meant a fantastic theater-style room, but also meant two spotlights shining in my face for 50 minutes. Last time I did this talk, at UKOUG, I was not too happy with how it went. This time, I had made several improvements, focused much more on why each step is so important, which made the story much better, in my opinion. I had lots of in-depth conversations afterwards, which showed me the topic is indeed an important one where people are interested in. At the beginning of the session I also did a little poll to see whether the people in the audience just put the large APEX export file under version control, or the individual components, and whether they have self-contained development environments. The results were only a few raised hands in these cases. No hands at all for one-step builds and continuous integration. So there's still a lot to win almost everywhere :-)!

The third session was the customer case "CB Goes Responsive". CB Logistics is the client I'm currently working for and I've participated at the project that was presented here, so obviously I liked to see this one. Chris Amelink and Mark Rooijakkers from CB and Stijn van Raes from iAdvise explained how we created a responsive web application with APEX and Twitter Bootstrap. The business case was a wish of the publishers of e-books to be able to see the number of sales, one day after the sales were made. It's very interesting for them to see if any promotional actions have a measurable effect and this application does that.

My fourth session was Roel Hartman's session "My First Mobi" where he showed how to start with mobile development. Not every demo went as smooth as Roel would have wanted, but he still managed to get his points across. And in case you missed it and want to know more about building mobile web applications with APEX, you can find his (and Christian's and David's) book here.

The last session of the day for me was Christian Rokitta's session about PhoneGap. This session was on top my list as I had only heard from PhoneGap and was very curious to see how it works. With PhoneGap you can build hybrid apps, using HTML, CSS and Javascript and still use the abilities of a native app. A nice demonstration was when he took a photo from the audience with his tablet and uploaded it into the database.

Again I enjoyed a day full of APEX content and speaking with everyone. I'm already looking forward to a fifth edition.

Wednesday, March 20, 2013

Paper "Professional Software Development using APEX"

As announced on Twitter yesterday, my paper titled "Professional Software Development Using Oracle Application Express" has been put online. I'm copying the first two paragraphs of the summary here, so you can decide if you want to read the rest as well:

Software development involves much more than just producing lines of code. It is also about version control, deployment to other environments, integrating code and unit testing. It is appropriate to the profession of software development to have a framework in place that handles all of this, in order for the developers to focus on the creative and fun part of their job: producing excellent and well-designed code. Without such a framework, you need to be cautious and deployments become more difficult and more error-prone. Which means more time and money needlessly spent and a development process which is less fun than it should be. Fortunately, these problems are well known and solutions are already widely adapted. However, in database application development in general and in APEX development specific, these practices are not so common, unfortunately. It is our belief that database and APEX development should not be treated different and deserve a similar framework. So that’s what we set out to do.

This paper describes how we develop new APEX applications in such a way that most of the effort does gets spent on actually developing the code. If you can take advantage of version control, if you can build and deploy your entire application in one step, can make a daily build to continuously integrate all developed code, and can make sure your developers have their own self-contained development environment, then the benefits are many. You’ll experience less errors, higher productivity and seamless software delivery from your team. Refactoring code becomes painless since you can easily be assured the changes won’t break the application. Overall quality goes up, ensuring a much longer lifetime of the application.


You can download the PDF here.

Tuesday, January 29, 2013

Dummy output parameters

Yesterday I encountered a code snippet which taught me something I did not think was possible. But it is, as I'll show in this blogpost. It's not spectacular in any way, just convenient at most.

When you need a function or procedure to retrieve some value, you'll start examining the existing code base if a function already exists that does the job. Probably you won't find an exact match, but there is this procedure with retrieves a lot of values including the value you're interested in. Sounds familiar? You now have two choices depending on whether you want to minimize code or execution time.

If you want to minimize execution time, you don't want to reuse this procedure as it does more work than you need, and you'll end up writing your own specific function.

If you want to minimize code, you'll end up reusing the procedure and ignore all output parameters you're not interested in. It's this last variant I will be talking about in the rest of this post. Here is an example of the api procedure I want to reuse:

SQL> create or replace procedure api
  2  ( p1 out number
  3  , p2 out number
  4  , p3 out number
  5  )
  6  as
  7  begin
  8    p1 := 1;
  9    p2 := 2;
 10    p3 := 3;
 11  end api;
 12  /

Procedure created.

And this is how I would reuse this code:

SQL> declare
  2    l_number_i_am_interested_in number;
  3    l_p2_dummy                  number;
  4    l_p3_dummy                  number;
  5  begin
  6    api
  7    ( p1 => l_number_i_am_interested_in
  8    , p2 => l_p2_dummy
  9    , p3 => l_p3_dummy
 10    );
 11  end;
 12  /

PL/SQL procedure successfully completed.

For each output parameter I would declare a variable with the suffix _dummy to make my intent clear that I'm not going to really use these variables. The code snippet I saw, was of this form:

SQL> declare
  2    l_number_i_am_interested_in number;
  3    l_dummy                     number;
  4  begin
  5    api
  6    ( p1 => l_number_i_am_interested_in
  7    , p2 => l_dummy
  8    , p3 => l_dummy
  9    );
 10  end;
 11  /

PL/SQL procedure successfully completed.

Just one variable for the two dummy output parameters. I thought this could not work, but clearly it does. What value is assigned to l_dummy? The actual parameters l_dummy and l_dummy get a value assigned equal to the final values of the formal parameters p2 and p3. So l_dummy will get value 2 and l_dummy will get value 3. The order of assigning these values will now determine the real value of l_dummy. Fortunately, in this situation, it doesn't matter which value is assigned to the l_dummy variable, since we're not interested in it. But of course I'm a bit curious, so:

SQL> declare
  2    l_number_i_am_interested_in number;
  3    l_dummy                     number;
  4  begin
  5    api
  6    ( p1 => l_number_i_am_interested_in
  7    , p2 => l_dummy
  8    , p3 => l_dummy
  9    );
 10    dbms_output.put_line(l_dummy);
 11  end;
 12  /
3

PL/SQL procedure successfully completed.

It appears to be the last value in this case. So after the invocation of api ends, p1 gets the value 1 assigned, then l_dummy will get the value 2 assigned, and then -overwriting the previous step- l_dummy will get the value 3 assigned. This is confirmed when extending the example with a lot more dummy output parameters:

SQL> create or replace procedure api
  2  ( p1 out number
  3  , p2 out number
  4  , p3 out number
  5  , p4 out number
  6  , p5 out number
  7  , p6 out number
  8  , p7 out number
  9  , p8 out number
 10  , p9 out number
 11  )
 12  as
 13  begin
 14    p1 := 1;
 15    p2 := 2;
 16    p3 := 3;
 17    p4 := 4;
 18    p5 := 5;
 19    p6 := 6;
 20    p7 := 7;
 21    p8 := 8;
 22    p9 := 9;
 23  end api;
 24  /

Procedure created.

SQL> declare
  2    l_number_i_am_interested_in number;
  3    l_dummy                     number;
  4  begin
  5    api
  6    ( p1 => l_number_i_am_interested_in
  7    , p2 => l_dummy
  8    , p3 => l_dummy
  9    , p4 => l_dummy
 10    , p5 => l_dummy
 11    , p6 => l_dummy
 12    , p7 => l_dummy
 13    , p8 => l_dummy
 14    , p9 => l_dummy
 15    );
 16    dbms_output.put_line(l_dummy);
 17  end;
 18  /
9

PL/SQL procedure successfully completed.

Less variables declared makes the code cleaner in my opinion, so I like this newly learned construct.


UPDATE

And in case you mix the order of the actual parameters, it's still the last formal parameter that gets the value:

SQL> declare
  2    l_number_i_am_interested_in number;
  3    l_dummy                     number;
  4  begin
  5    api
  6    ( p1 => l_number_i_am_interested_in
  7    , p2 => l_dummy
  8    , p3 => l_dummy
  9    , p7 => l_dummy
 10    , p8 => l_dummy
 11    , p9 => l_dummy
 12    , p4 => l_dummy
 13    , p5 => l_dummy
 14    , p6 => l_dummy
 15    );
 16    dbms_output.put_line(l_dummy);
 17  end;
 18  /
9

PL/SQL procedure successfully completed.

Saturday, December 22, 2012

UKOUG 2012

My third time visiting the annual UKOUG conference in Birmingham started all wrong. At Schiphol Airport, the usual luggage check routine took place: laptop out of the suitcase, wallet/keys/belt apart, toothpaste apart. And afterwards putting everything back in. But I forgot to close the wheeled suitcase and when putting it on the ground, my MacBook Pro fell out. A quick inspection revealed that it still worked, but the screen just stayed black. When I arrived at the hotel, I also noticed that my international power adapter set included everything except the British one. The hotel didn't have a spare one and the local store did not sell adapters. Fortunately, Roel Hartman offered his MacBook for my presentation at Wednesday, and I could borrow a British power adapter from Luc Bors, who was leaving Birmingham monday afternoon. And when I told the story to Hans Forbrich, he also offered me a set. And then I learned that this year's speakers gift was .... an international power adapter set.

So, back to the conference itself. Again, it was full of excellent presentations. These were the presentations I chose to hear:

Monday:
Opening Keynote by Dermot O'Kelly and Andrew Sutherland
Keynote by Tom Kyte: Oracle's Latest Generation of Database Technology
David Peake: Oracle APEX 4.2 Unplugged
Dimitri Gielis: Moving to the APEX Listener
Hilary Farrell: RESTful Web Services in Oracle Application Express 4.2
Roel Hartman: Pump Up the Volume! The APEX Data Loader Inside Out
Martin Corry with funny anecdotes about rugby and team play.

Tuesday:
Anthony Rayner: Build a Great User Experience with Oracle Application Express
David Peake: Deploying and Developing Application Express with Oracle Pluggable Databases
John Scott: Apex Error Handling Enhancements
Tanel Poder: Exadata Performance Method
James Morle: Building a Winning Oracle Database Architecture
John Scott: Oracle APEX: Websockets (or When Push Comes to Shove)
Aino Andriessen: Deploy with Joy: Using Hudson to Build and Deploy your ADF Fusion Application
Tom Kyte: What's New in Oracle Database Application Development

Wednesday:
Me: Professional Software Development Using APEX
Tony Hasler: The MODEL Clause Explained
Anthony Rayner: Building Mobile Web Applications with Oracle Application Express
Paul Broughton: APEX: Why Not Google It?


Although most presentations were very good, the one that stood out, for me, was John Scott's Websockets presentation. With illustrative and very entertaining demo's, he not only showed how Websockets work, but especially what's possible if you let your imagination run wild. If you ever get the chance to see this presentation, please do yourself a favour.

My own presentation was on Wednesday morning, called "Professional Software Development Using APEX". It is a talk about how to do version control, parallel development, one-step builds, daily builds and unit tests with APEX. In a small room, Hall 7b, but well attended with 40-50 people. Using Roel's MacBook Air, the presentation itself went fine, but I was not entirely satisfied. Although I have some experience with presenting, this time I was more nervous than usual. Probably due to the changes I had made the weekend before and not being able to prepare those last-minute changes well enough because of not having access to my MacBook Pro. And it showed. All presenters talk about a topic they are passionate about, but you need to get that passion across. And nerves do exactly the opposite. Usually, my presentations score a tiny bit above conference average, this time my presentation scored a bit below. Still good (above 4), by the way, especially considering the other amazing presentations out there. The benefit is that I think I've learned a lot more from this experience than usual and I'm looking forward to improve on the next occasion.

On the social front, I visited the ACE dinner at Sunday evening, organized by Debra Lilley (thanks Debra!). Had some nice conversations at the dinner table with Sten Vesterli, Michael Abbey, Killian Evers and Piet de Visser among others. Next to the presentations, one of the benefits of such a conference is that you get to meet a lot of Oracle friends again. And you meet some new ones. This year for example, I had the pleasure of meeting Timo Raitalaakso, also known as "rafu", Finnish SQL guru, who came to me at the end of my own session. Together with Tuomas Pystynen and Jacco Landlust, we were on the same plane back home.

After the conference, I took my MacBook Pro for repair and I got it back last weekend. So if you are wondering why I waited almost three weeks with this post: that's the reason. Next year, the conference will be split up to get an even more technology focussed event in Manchester. Hopefully I'll get an abstract accepted again. For this year, a big "thank you" to the UKOUG team for organizing a great event, developing a good event app and even responding to tweets.

Monday, November 12, 2012

Ciber knowledge session November 28

On Wednesday evening November 28, my colleague Marcel Hoefs and I will both do a one-hour knowledge session at Ciber Nieuwegein. What's new about it, is that the knowledge session is not only for Ciber colleagues, but for anybody who would like to attend. Both sessions will be in Dutch, so for the remainder of this post, I'll switch to Dutch and copy Marcel's invitation text:

Graag nodig ik jullie uit voor de 2e Oracle kennissessie van dit jaar woensdag 28 november Ciber kantoor Nieuwegein vanaf 17:30u.

Rob van Wijk:
Professioneel software ontwikkelen met APEX
 
We hebben in de Oracle SL een APEXSoFa opgetuigd. In deze sessie hoor je hoe we de database en APEX geconfigureerd hebben samen met Subversion, Hudson, APEXExport en APEXExportSplitter en SQL*Developer's UtUtil. Hiermee doen we software configuratiebeheer, bouwen we dagelijks opnieuw de gehele applicatie op in een enkele stap, ontwikkelen we parallel in onze eigen APEX werkruimtes en database schema's, integreren we ons werk continu, en integreren we componenttesten in dit proces.

Marcel Hoefs: 
Anydata  

Ooit op zoek geweest naar een Oracle datatype waarmee je alles op kan slaan zonder de karakteristieken van de opgeslagen data kwijt te raken? Dan is anydata het zelf beschrijvende datatype dat standaard in de Oracle database hiervoor beschikbaar is. In mijn presentatie zal ik wat dieper in gaan op de mogelijkheden en onmogelijkheden van dit datatype. Tevens wat praktijkervaring en een usecase hiervoor.

Aanmelden, i.v.m. broodjes, uiterlijk dinsdag 27 november bij replace('rob van wijk',' ','.') || '@ciber.com'

Saturday, September 15, 2012

Keep clause

You may have seen an aggregate function like this in SQL queries:

max(value) keep (dense_rank first order by mydate)
or this analytic variant:
max(value) keep (dense_rank last order by mydate) over (partition by relation_nr)
Unfortunately, when you start searching for the "keep" clause, you won't find anything in the Oracle documentation (and hopefully because of this blogpost, people will now have a reference). Of course Oracle documents such functions. You only have to know that they are called FIRST and LAST in the SQL Language Reference.

Even though these functions were already introduced in version 9, I've seen lots of code that could have used these functions, but didn't. And that's a pity because it's a wasted opportunity to write shorter and faster code. The common use case I'm talking about is when you have a detail table with a validity period. Typically with a column startdate, and optionally an enddate. For such a table, you often have to know the values of the currently valid row. An example: suppose we have a table RELATIONS and for each relation we want to know his address at a certain point in time:
SQL> create table relations
  2  ( id   number       not null primary key
  3  , name varchar2(30) not null
  4  )
  5  /

Table created.

SQL> insert into relations
  2  select 1, 'Oracle Nederland' from dual union all
  3  select 2, 'Ciber Nederland' from dual
  4  /

2 rows created.

SQL> create table relation_addresses
  2  ( relation_id number       not null
  3  , startdate   date         not null
  4  , address     varchar2(30) not null
  5  , postal_code varchar2(6)  not null
  6  , city        varchar2(30) not null
  7  , constraint ra_pk primary key (relation_id,startdate)
  8  , constraint ra_r_fk foreign key (relation_id) references relations(id)
  9  )
 10  /

Table created.

SQL> insert into relation_addresses
  2  select 1, date '1995-01-01', 'Rijnzathe 6', '3454PV', 'De Meern' from dual union all
  3  select 1, date '2011-01-01', 'Hertogswetering 163-167', '3543AS', 'Utrecht' from dual union all
  4  select 2, date '2000-01-01', 'Frankrijkstraat 128', '5622AH', 'Eindhoven' from dual union all
  5  select 2, date '2006-01-01', 'Meerkollaan 15', '5613BS', 'Eindhoven' from dual union all
  6  select 2, date '2010-01-01', 'Burgemeester Burgerslaan 40b', '5245NH', 'Den Bosch' from dual union all
  7  select 2, date '2015-01-01', 'Archimedesbaan 16', '3439ME', 'Nieuwegein' from dual
  8  /

6 rows created.

SQL> begin
  2    dbms_stats.gather_table_stats(user,'relations');
  3    dbms_stats.gather_table_stats(user,'relation_addresses');
  4  end;
  5  /

PL/SQL procedure successfully completed.
Relation "Oracle Nederland" has two addresses, and its current address being at the Hertogswetering. And fictively, relation "Ciber Nederland" has four addresses. The current address is the Den Bosch one. And I've also recorded a future address in Nieuwegein. Note that, in real life, the latter three are all Ciber offices currently in use. To get the active relation addresses on October 1st, 2012, I can use this query:
SQL> var REFERENCE_DATE varchar2(10)
SQL> exec :REFERENCE_DATE:='2012-10-01'

PL/SQL procedure successfully completed.

SQL> select ra.relation_id
  2       , max(ra.startdate) startdate
  3    from relation_addresses ra
  4   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
  5   group by ra.relation_id
  6  /

RELATION_ID STARTDATE
----------- -------------------
          1 01-01-2011 00:00:00
          2 01-01-2010 00:00:00

2 rows selected.
But what if I want to retrieve the current address belonging to these rows? In fact, this is frequently being asked in Oracle forums. Prior to Oracle8, you would have used a query like below:
SQL> select ra.relation_id
  2       , ra.startdate
  3       , ra.address
  4       , ra.postal_code
  5       , ra.city
  6    from relation_addresses ra
  7   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
  8     and not exists
  9         ( select 'a relation_address with a more recent startdate'
 10             from relation_addresses ra2
 11            where ra2.relation_id = ra.relation_id
 12              and ra2.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
 13              and ra2.startdate > ra.startdate
 14         )
 15  /

RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
----------- ------------------- ------------------------------ ------ ------------------------------
          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht
          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch

2 rows selected.
This uses a correlated subquery accessing the table (or index belonging to) table RELATION_ADDRESSES twice. Which can be prevented from Oracle8 onwards by using an analytic function:
SQL> select relation_id
  2       , startdate
  3       , address
  4       , postal_code
  5       , city
  6    from ( select ra.relation_id
  7                , ra.startdate
  8                , ra.address
  9                , ra.postal_code
 10                , ra.city
 11                , row_number() over (partition by ra.relation_id order by ra.startdate desc) rn
 12             from relation_addresses ra
 13            where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
 14         )
 15   where rn = 1
 16  /

RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
----------- ------------------- ------------------------------ ------ ------------------------------
          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht
          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch

2 rows selected.
Here you compute the row_number when you partition the result set per relation_id ordered by startdate in descending order. Meaning the most recent date starting before the reference date, gets row_number 1 assigned per relation_id. By using an inline view, we can filter on the outcome of the analytic function, and only select the rows with row_number 1. In forums, you'll see this solution often being adviced. Compared to the correlated subquery, this query selects only once from table RELATION_ADDRESSES. However, you can do even better by just adding three "keep clause" functions to the original query:
SQL> select ra.relation_id
  2       , max(ra.startdate) startdate
  3       , max(ra.address) keep (dense_rank last order by ra.startdate) address
  4       , max(ra.postal_code) keep (dense_rank last order by ra.startdate) postal_code
  5       , max(ra.city) keep (dense_rank last order by ra.startdate) city
  6    from relation_addresses ra
  7   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
  8   group by ra.relation_id
  9  /

RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
----------- ------------------- ------------------------------ ------ ------------------------------
          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht
          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch

2 rows selected.
The three extra aggregate functions all do a "dense_rank last order by startdate", meaning "sort the rows by startdate, and pick only those rows which have the most recent startdate". If you have more rows with the same startdate, the max function at the start tells Oracle to pick the value with the maximum address/postal_code/city. However, (relation_id,startdate) is unique, so ties are impossible and thus the max function is a dummy. I also could have used min.

The query is shorter and -to me- clearer at first glance. However, the main reason for my enthusiasm for the aggregate functions FIRST and LAST is because it's just faster. To show this, let's execute those queries against a table with 300,000 rows, 100,000 relations with 3 addresses each:
SQL> create table relations
  2  ( id   number       not null primary key
  3  , name varchar2(30) not null
  4  )
  5  /

Table created.

SQL> create table relation_addresses
  2  ( relation_id number       not null
  3  , startdate   date         not null
  4  , address     varchar2(30) not null
  5  , postal_code varchar2(6)  not null
  6  , city        varchar2(30) not null
  7  , constraint ra_pk primary key (relation_id,startdate)
  8  , constraint ra_r_fk foreign key (relation_id) references relations(id)
  9  )
 10  /

Table created.

SQL> insert into relations
  2   select level
  3        , dbms_random.string('a',30)
  4     from dual
  5  connect by level <= 100000
  6  /

100000 rows created.

SQL> insert into relation_addresses
  2   select 1 + mod(level-1,100000)
  3        , date '2013-01-01' - numtodsinterval(level,'hour')
  4        , dbms_random.string('a',30)
  5        , dbms_random.string('a',6)
  6        , dbms_random.string('a',30)
  7     from dual
  8  connect by level <= 300000
  9  /

300000 rows created.

SQL> begin
  2    dbms_stats.gather_table_stats
  3    ( user
  4    , 'relations'
  5    , cascade=>true
  6    , method_opt=>'FOR ALL INDEXED COLUMNS SIZE 254'
  7    , estimate_percent=>100
  8    );
  9    dbms_stats.gather_table_stats
 10    ( user
 11    , 'relation_addresses'
 12    , cascade=>true
 13    , method_opt=>'FOR ALL INDEXED COLUMNS SIZE 254'
 14    , estimate_percent=>100
 15    );
 16  end;
 17  /

PL/SQL procedure successfully completed.
Note that I created histograms with 254 buckets just to make the optimizer see that it should full scan the table, despite the "startdate <= :REFERENCE_DATE" predicate. This next query should give a clue what's in the table:
SQL> select *
  2    from relation_addresses
  3   where relation_id in (1,2,99999,100000)
  4   order by relation_id
  5       , startdate
  6  /

RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
----------- ------------------- ------------------------------ ------ ------------------------------
          1 09-03-1990 15:00:00 tKgXePxuAIdhFBNJLIRRjodrlJzGOl vPIAbL pNkbFHTJPrVuDIYLxsCfUfetBsKJIE
          1 05-08-2001 07:00:00 LybVzfpzoQzXjpCAdkSZrkYrwUtZtL cWJwFe IczTRyjITWCJIOErccfITVvsqRVyMF
          1 31-12-2012 23:00:00 lNEwsdYhbwdqRxHTSCTCykgICxiXKL oXzHQF YfyKFmiboCWfmNLjVLZoKmUDoMFaDu
          2 09-03-1990 14:00:00 svOylQPkbyfympSXRMeyudfFErFvlO MLFdpG LTtAKdrpUmCwFgqEmoKxnUtWecwgcV
          2 05-08-2001 06:00:00 BsRCUviBiLHaAEjyRVnIedRAWzuVSe DlBlZW ErQmCkDgNDTMOdZzceFYrMXnZmmjxg
          2 31-12-2012 22:00:00 wqdFdXoBdmmCooLtGfWOMKukIMrDlI geRRHz DaPpWHOOdWgbjLaRkxfFDUIPgVgvEt
      99999 12-10-1978 01:00:00 FsXOjUdNIgjjGjnWpJjTTscbcuqsxa PdhVtm qOskmLwRlngSEihmlpYhmNHhvtrpBc
      99999 09-03-1990 17:00:00 sqoKYNeDntZtAUSmSDMtIQZloTSVeD uGPszi GIDctptEomcGzYGYhUGhKHgDRZJCmY
      99999 05-08-2001 09:00:00 fhHGwuGPIHSOaKdjDvDcqTzsbHZzqR tpaLAP rVYCmijzqJmhlnZZLXkHpgFmLAEiTS
     100000 12-10-1978 00:00:00 WwxfHcVfkFfItgcXfjPnKTiATlHjao nSOjSn vZNRsRySNPlmQKgCJjcpiEOhQIxzoy
     100000 09-03-1990 16:00:00 cGcVPMsFyxCBrnsZtMYBnaAflXiNff NVKRIr SseFWkWyUDgaPpbxdmENdLjurGbJPK
     100000 05-08-2001 08:00:00 dRfCmqdmbhcmaMvyYBpewPsFBCVdlG BMQWLY YPaAGnKKUkfdnAeAyLYeUBfXwezsEo

12 rows selected.
So there are a couple of rows that are filtered because they're in the future, but for most rows, the latest row is the current one. This is the plan of the first query with the correlated subquery:
SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  d6p5uh67h65yb, child number 0
-------------------------------------
select ra.relation_id      , ra.startdate      , ra.address      ,
ra.postal_code      , ra.city   from relation_addresses ra  where
ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')    and not exists
       ( select 'a relation_address with a more recent startdate'
     from relation_addresses ra2           where ra2.relation_id =
ra.relation_id             and ra2.startdate <=
to_date(:REFERENCE_DATE,'yyyy-mm-dd')             and ra2.startdate >
ra.startdate        )

Plan hash value: 3749094337

---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |      1 |        |    100K|00:00:00.66 |   15071 |   3681 |
|*  1 |  HASH JOIN RIGHT ANTI |                    |      1 |   2978 |    100K|00:00:00.66 |   15071 |   3681 |
|*  2 |   INDEX FAST FULL SCAN| RA_PK              |      1 |    297K|    297K|00:00:00.05 |    1240 |     35 |
|*  3 |   TABLE ACCESS FULL   | RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.12 |   13831 |   3646 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("RA2"."RELATION_ID"="RA"."RELATION_ID")
       filter("RA2"."STARTDATE">"RA"."STARTDATE")
   2 - filter("RA2"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))
   3 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))


30 rows selected.
A HASH JOIN ANTI for the not exists, and a total of .66 seconds. Next, the plan for the query with the analytic row_number function:
SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1zd4wqtxkc2vz, child number 0
-------------------------------------
select relation_id      , startdate      , address      , postal_code
   , city   from ( select ra.relation_id               , ra.startdate
            , ra.address               , ra.postal_code               ,
ra.city               , row_number() over (partition by ra.relation_id
order by ra.startdate desc) rn            from relation_addresses ra
       where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
  )  where rn = 1

Plan hash value: 2795878473

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                    |      1 |        |    100K|00:00:00.97 |    7238 |   3646 |
|*  1 |  VIEW                    |                    |      1 |    297K|    100K|00:00:00.97 |    7238 |   3646 |
|*  2 |   WINDOW SORT PUSHED RANK|                    |      1 |    297K|    200K|00:00:00.93 |    7238 |   3646 |
|*  3 |    TABLE ACCESS FULL     | RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.09 |    7238 |   3646 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "RA"."RELATION_ID" ORDER BY
              INTERNAL_FUNCTION("RA"."STARTDATE") DESC )<=1)
   3 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))


29 rows selected.
Note that this query takes longer than the correlated subquery above: .97 seconds versus .66 seconds. The HASH JOIN ANTI took .49 seconds (.66 - .05 -.12) where computing the ROW_NUMBER took .84 seconds (.93 - .09). So here, on my laptop, I have avoided .05 seconds for the INDEX FAST FULL SCAN, but spend .35 (.84 - .49) seconds more for the computation. Likely, when I/O is more expensive than on my laptop, the time of the first query will go up and the times will be closer to each other. Now the keep clause variant:
SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  dcw8tyyqtu2kk, child number 0
-------------------------------------
select ra.relation_id      , max(ra.startdate) startdate      ,
max(ra.address) keep (dense_rank last order by ra.startdate) address
  , max(ra.postal_code) keep (dense_rank last order by ra.startdate)
postal_code      , max(ra.city) keep (dense_rank last order by
ra.startdate) city   from relation_addresses ra  where ra.startdate <=
to_date(:REFERENCE_DATE,'yyyy-mm-dd')  group by ra.relation_id

Plan hash value: 2324030966

------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                    |      1 |        |    100K|00:00:00.55 |    7238 |   3646 |
|   1 |  SORT GROUP BY     |                    |      1 |    100K|    100K|00:00:00.55 |    7238 |   3646 |
|*  2 |   TABLE ACCESS FULL| RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.09 |    7238 |   3646 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))


24 rows selected.
The shortest query, the shortest plan and the fastest execution. The SORT GROUP BY immediately reduces the number of intermediate rows from 297K to 100K, whereas the WINDOW SORT PUSHED RANK had to compute the row_number for all 297K rows.

PS: this topic and much more is covered in an upcoming Live Virtual Seminar for Oracle University on October 2nd

Saturday, May 19, 2012

Much Ado About Nothing?

I was reading this presentation PDF of Hugh Darwen recently, called How To Handle Missing Information Without Using NULL. Several great thinkers and founders of the relational theory consider NULL as the thing that should not be. For example, one slide in the above mentioned PDF is titled SQL's Nulls Are A Disaster. And I found a paper with the amusing title The Final Null In The Coffin.

I can understand the critique. The introduction of NULL leads to three valued logic, which makes programs much more complex and harder to prove correct. All database professionals likely have been bitten by NULLs several times during their career, myself included. And a NULL can have several interpretations. By using NULL, you are not making clear what is meant. If the value for column hair_colour is NULL, does it mean the person is bald? Or do you know the person has hair, but you just don't know what colour? Or can the person be bald or have hair, but you just don't know which one applies? Or is the person in the midst of a hair colouring exercise and you only temporarily don't know the colour? If you're creative, I'm sure you can come up with other interpretations as well.

On the other hand, the theorists don't have to build database applications for end users who like reasonable response times, and I do. Avoiding nulls at all cost typically leads to a data model that has more tables than needed, requiring more joins and therefore making queries slower. So I have to make a trade off. In general I try to avoid nullable columns as much as possible, for example by chosing subtype implementations instead of supertype implementations, and by modelling entity subtypes in the first place, but I will never let it noticeably slow down my application. At my current job, I'm making a data model right now. Having read all use cases, I know how the data will be used and so I know where in the model there is room to avoid an extra nullable column. One thing I'll never voluntarily do though, is make up strange outlier values just to get rid of the null.

Any way, I was curious to see how Hugh Darwen handles missing information without using nulls. In his paper, he has a concise example, which I'll translate to Oracle syntax in this blogpost to see what practically needs to happen to avoid nulls in his example. He starts with this table:

SQL> select *
  2    from pers_info
  3  /

        ID NAME       JOB            SALARY
---------- ---------- ---------- ----------
      1234 Anne       Lawyer         100000
      1235 Boris      Banker
      1236 Cindy                      70000
      1237 Davinder

4 rows selected.
Which contains four NULL values. The meaning of those NULL values can't be seen from this table, but this is what they are meant to be:
  • Boris earns something, but we don't know how much
  • Cindy does some job, but we don't know what it is
  • Davinder doesn't have a job
  • Davinder doesn't have a salary
So he applies a technique called vertical decomposition and on top of those results horizontal decomposition, to arrive at the seven tables below, where everything has a clear meaning.
SQL> select *
  2    from called
  3  /

        ID NAME
---------- --------
      1234 Anne
      1235 Boris
      1236 Cindy
      1237 Davinder

4 rows selected.

SQL> select *
  2    from does_job
  3  /

        ID JOB
---------- ------
      1234 Lawyer
      1235 Banker

2 rows selected.

SQL> select *
  2    from job_unk
  3  /

        ID
----------
      1236

1 row selected.

SQL> select *
  2    from unemployed
  3  /

        ID
----------
      1237

1 row selected.

SQL> select *
  2    from earns
  3  /

        ID     SALARY
---------- ----------
      1234     100000
      1236      70000

2 rows selected.

SQL> select *
  2    from salary_unk
  3  /

        ID
----------
      1235

1 row selected.

SQL> select *
  2    from unsalaried
  3  /

        ID
----------
      1237

1 row selected.
Here we achieved a data model where every NULL has been banned out.

Now what if we'd like to simulate a query against the PERS_INFO table? Darwen uses this expression to transform the seven tables back to the PERS_INFO table:
WITH (EXTEND JOB_UNK ADD ‘Job unknown’ AS Job_info) AS T1,
     (EXTEND UNEMPLOYED ADD ‘Unemployed’ AS Job_info) AS T2,
     (DOES_JOB RENAME (Job AS Job_info)) AS T3,
     (EXTEND SALARY_UNK ADD ‘Salary unknown’ AS Sal_info) AS T4,
     (EXTEND UNSALARIED ADD ‘Unsalaried’ AS Sal_info) AS T5,
     (EXTEND EARNS ADD CHAR(Salary) AS Sal_info) AS T6,
     (T6 { ALL BUT Salary }) AS T7,
     (UNION ( T1, T2, T3 )) AS T8,
     (UNION ( T4, T5, T7 )) AS T9,
     (JOIN ( CALLED, T8, T9 )) AS PERS_INFO :
PERS_INFO
Translated to Oracle syntax, this becomes:
SQL> with t1 as
  2  ( select id
  3         , 'Job unknown' as job_info
  4      from job_unk
  5  )
  6  , t2 as
  7  ( select id
  8         , 'Unemployed' as job_info
  9      from unemployed
 10  )
 11  , t3 as
 12  ( select id
 13         , job as job_info
 14      from does_job
 15  )
 16  , t4 as
 17  ( select id
 18         , 'Salary unknown' as sal_info
 19      from salary_unk
 20  )
 21  , t5 as
 22  ( select id
 23         , 'Unsalaried' as sal_info
 24      from unsalaried
 25  )
 26  , t6 as
 27  ( select id
 28         , salary
 29         , to_char(salary,'fm999G999') as sal_info
 30      from earns
 31  )
 32  , t7 as
 33  ( select id
 34         , sal_info
 35      from t6
 36  )
 37  , t8 as
 38  ( select id
 39         , job_info
 40      from t1
 41     union all
 42    select id
 43         , job_info
 44      from t2
 45     union all
 46    select id
 47         , job_info
 48      from t3
 49  )
 50  , t9 as
 51  ( select id
 52         , sal_info
 53      from t4
 54     union all
 55    select id
 56         , sal_info
 57      from t5
 58     union all
 59    select id
 60         , sal_info
 61      from t7
 62  )
 63  , pers_info as
 64  ( select c.id
 65         , c.name
 66         , j.job_info
 67         , s.sal_info
 68      from called c
 69           inner join t8 j on (c.id = j.id)
 70           inner join t9 s on (c.id = s.id)
 71  )
 72  select *
 73    from pers_info
 74  /

        ID NAME     JOB_INFO    SAL_INFO
---------- -------- ----------- --------------
      1235 Boris    Banker      Salary unknown
      1237 Davinder Unemployed  Unsalaried
      1234 Anne     Lawyer      100,000
      1236 Cindy    Job unknown 70,000

4 rows selected.
Very elaborate, but the optimizer does a great job at simplifying the query under the covers, as can be seen in this execution plan:
SQL> select *
  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  3  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  bmrtdy0jad18p, child number 0
-------------------------------------
with t1 as ( select id        , 'Job unknown' as job_info     from
job_unk ) , t2 as ( select id        , 'Unemployed' as job_info
from unemployed ) , t3 as ( select id        , job as job_info     from
does_job ) , t4 as ( select id        , 'Salary unknown' as sal_info
 from salary_unk ) , t5 as ( select id        , 'Unsalaried' as
sal_info     from unsalaried ) , t6 as ( select id        , salary
  , to_char(salary,'fm999G999') as sal_info     from earns ) , t7 as (
select id        , sal_info     from t6 ) , t8 as ( select id        ,
job_info     from t1    union all   select id        , job_info
from t2    union all   select id        , job_info     from t3 ) , t9
as ( select id        , sal_info     from t4    union all   select id
     , sal_info     from t5    union all   select id        , sal_info
   from t7 ) , pers_info as ( select c.id        , c.name        ,
j.job_info        , s.sal_info     from called c          inner join t8
j on (c.id = j.id)

Plan hash value: 583520090

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |      1 |        |      4 |00:00:00.01 |      14 |       |       |          |
|*  1 |  HASH JOIN            |            |      1 |      4 |      4 |00:00:00.01 |      14 |  1011K|  1011K|  550K (0)|
|*  2 |   HASH JOIN           |            |      1 |      4 |      4 |00:00:00.01 |       8 |  1180K|  1180K|  548K (0)|
|   3 |    TABLE ACCESS FULL  | CALLED     |      1 |      4 |      4 |00:00:00.01 |       2 |       |       |          |
|   4 |    VIEW               |            |      1 |      4 |      4 |00:00:00.01 |       6 |       |       |          |
|   5 |     UNION-ALL         |            |      1 |        |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |      TABLE ACCESS FULL| JOB_UNK    |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS FULL| UNEMPLOYED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|   8 |      TABLE ACCESS FULL| DOES_JOB   |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|   9 |   VIEW                |            |      1 |      4 |      4 |00:00:00.01 |       6 |       |       |          |
|  10 |    UNION-ALL          |            |      1 |        |      4 |00:00:00.01 |       6 |       |       |          |
|  11 |     TABLE ACCESS FULL | SALARY_UNK |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|  12 |     TABLE ACCESS FULL | UNSALARIED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|  13 |     TABLE ACCESS FULL | EARNS      |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("C"."ID"="S"."ID")
   2 - access("C"."ID"="J"."ID")


45 rows selected.
If I had to build the PERS_INFO table back again with a query myself, I'd use this shorter query with six left outer joins:
SQL> select c.id
  2       , c.name
  3       , coalesce(j.job,nvl2(ju.id,'Job unknown',null),nvl2(ue.id,'Unemployed',null)) job_info
  4       , coalesce(to_char(e.salary,'fm999G999'),nvl2(su.id,'Salary unknown',null),nvl2(us.id,'Unsalaried',null)) salary_info
  5    from called c
  6         left outer join does_job j on (c.id = j.id)
  7         left outer join job_unk ju on (c.id = ju.id)
  8         left outer join unemployed ue on (c.id = ue.id)
  9         left outer join earns e on (c.id = e.id)
 10         left outer join salary_unk su on (c.id = su.id)
 11         left outer join unsalaried us on (c.id = us.id)
 12  /

        ID NAME     JOB_INFO    SALARY_INFO
---------- -------- ----------- --------------
      1234 Anne     Lawyer      100,000
      1236 Cindy    Job unknown 70,000
      1235 Boris    Banker      Salary unknown
      1237 Davinder Unemployed  Unsalaried

4 rows selected.
Although, as you can see below, the plan doesn't really improve:
SQL> select *
  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  3  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6x45b27mvpb1m, child number 0
-------------------------------------
select c.id      , c.name      , coalesce(j.job,nvl2(ju.id,'Job
unknown',null),nvl2(ue.id,'Unemployed',null)) job_info      ,
coalesce(to_char(e.salary,'fm999G999'),nvl2(su.id,'Salary
unknown',null),nvl2(us.id,'Unsalaried',null)) salary_info   from called
c        left outer join does_job j on (c.id = j.id)        left outer
join job_unk ju on (c.id = ju.id)        left outer join unemployed ue
on (c.id = ue.id)        left outer join earns e on (c.id = e.id)
 left outer join salary_unk su on (c.id = su.id)        left outer join
unsalaried us on (c.id = us.id)

Plan hash value: 3398518218

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |      1 |        |      4 |00:00:00.01 |      15 |       |       |          |
|*  1 |  HASH JOIN OUTER        |            |      1 |      4 |      4 |00:00:00.01 |      15 |   955K|   955K|  528K (0)|
|*  2 |   HASH JOIN OUTER       |            |      1 |      4 |      4 |00:00:00.01 |      12 |  1000K|  1000K|  523K (0)|
|*  3 |    HASH JOIN OUTER      |            |      1 |      4 |      4 |00:00:00.01 |      10 |  1035K|  1035K|  536K (0)|
|*  4 |     HASH JOIN OUTER     |            |      1 |      4 |      4 |00:00:00.01 |       8 |  1063K|  1063K|  536K (0)|
|*  5 |      HASH JOIN OUTER    |            |      1 |      4 |      4 |00:00:00.01 |       6 |  1114K|  1114K|  537K (0)|
|*  6 |       HASH JOIN OUTER   |            |      1 |      4 |      4 |00:00:00.01 |       4 |  1180K|  1180K|  538K (0)|
|   7 |        TABLE ACCESS FULL| CALLED     |      1 |      4 |      4 |00:00:00.01 |       2 |       |       |          |
|   8 |        TABLE ACCESS FULL| JOB_UNK    |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|   9 |       TABLE ACCESS FULL | UNEMPLOYED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|  10 |      TABLE ACCESS FULL  | SALARY_UNK |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|  11 |     TABLE ACCESS FULL   | UNSALARIED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|  12 |    TABLE ACCESS FULL    | DOES_JOB   |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
|  13 |   TABLE ACCESS FULL     | EARNS      |      1 |      2 |      2 |00:00:00.01 |       3 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("C"."ID"="E"."ID")
   2 - access("C"."ID"="J"."ID")
   3 - access("C"."ID"="US"."ID")
   4 - access("C"."ID"="SU"."ID")
   5 - access("C"."ID"="UE"."ID")
   6 - access("C"."ID"="JU"."ID")


43 rows selected.
But the two plans above are really complex, compared with a simple query against the PERS_INFO table with nullable columns:
SQL> select *
  2    from pers_info
  3  /

        ID NAME       JOB            SALARY
---------- ---------- ---------- ----------
      1234 Anne       Lawyer         100000
      1235 Boris      Banker
      1236 Cindy                      70000
      1237 Davinder

4 rows selected.

SQL> select *
  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  3  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  016x9f106gj27, child number 1
-------------------------------------
select *   from pers_info

Plan hash value: 1584579034

-----------------------------------------------------------------------------------------
| Id  | Operation         | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |      1 |        |      4 |00:00:00.01 |       7 |
|   1 |  TABLE ACCESS FULL| PERS_INFO |      1 |      4 |      4 |00:00:00.01 |       7 |
-----------------------------------------------------------------------------------------


13 rows selected.
If queries like this are not very frequent in your database, you might want to take this extra work for granted and avoid the NULL. But you need to consider something else as well: the new schema requires much more constraints. Using just the PERS_INFO table, a single primary key constraint on the Id column is all you need. But for the new model, Darwen describes 9, but really 15 constraints:
  1. No two CALLED rows have the same Id. (Primary key)
  2. Every row in CALLED has a matching row in either DOES_JOB, JOB_UNK, or UNEMPLOYED.
  3. No row in DOES_JOB has a matching row in JOB_UNK.
  4. No row in DOES_JOB has a matching row in UNEMPLOYED.
  5. No row in JOB_UNK has a matching row in UNEMPLOYED.
  6. Every row in DOES_JOB has a matching row in CALLED. (Foreign key)
  7. Every row in JOB_UNK has a matching row in CALLED. (Foreign key)
  8. Every row in UNEMPLOYED has a matching row in CALLED. (Foreign key)
  9. Constraints 2 through 8 repeated, mutatis mutandis, for CALLED with respect to EARNS, SALARY_UNK and UNSALARIED.
Implementing constraint 1 is easy:
SQL> alter table called add primary key (id)
  2  /

Table altered.
And so are constraints 6, 7 and 8:
SQL>alter table does_job add foreign key (id) references called (id)
  2  /

Table altered.

SQL> alter table job_unk add foreign key (id) references called (id)
  2  /

Table altered.

SQL> alter table unemployed add foreign key (id) references called (id)
  2  /

Table altered.
But constraint 2 says that the Id in table CALLED is a foreign distributed key. And constraints 3, 4 and 5 say the Id's of tables DOES_JOB, JOB_UNK and UNEMPLOYED are a distributed key. Oracle doesn't have declarative support for distributed keys or for foreign distributed keys. We could write database trigger code to implement this, which is very hard to do correct or we could use the materialized view trick to have the condition validated at the end of a transaction, instead of at the end of the statement, which also has its downsides. And such deferred constraint checking is explicitly ruled out by The Third Manifesto as well. Nevertheless, here is how it can be done.

The distributed key (constraints 3, 4 and 5):
SQL> create materialized view log on does_job with rowid
  2  /

Materialized view log created.

SQL> create materialized view log on job_unk with rowid
  2  /

Materialized view log created.

SQL> create materialized view log on unemployed with rowid
  2  /

Materialized view log created.

SQL> create materialized view distributed_key_vw
  2    refresh fast on commit
  3  as
  4  select d.rowid rid
  5       , d.id    id
  6       , 'D'     umarker
  7    from does_job d
  8   union all
  9  select j.rowid
 10       , j.id
 11       , 'J'
 12    from job_unk j
 13   union all
 14  select u.rowid
 15       , u.id
 16       , 'U'
 17    from unemployed u
 18  /

Materialized view created.

SQL> alter table distributed_key_vw
  2    add constraint distributed_key_check
  3    primary key (id)
  4  /

Table altered.
And to show that the distributed key implementation works:
SQL> insert into job_unk values (1234)
  2  /

1 row created.

SQL> commit
  2  /
commit
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view "RWIJK"."DISTRIBUTED_KEY_VW"
ORA-00001: unique constraint (RWIJK.DISTRIBUTED_KEY_CHECK) violated
And the foreign distributed key ("Every row in CALLED has a matching row in either DOES_JOB, JOB_UNK, or UNEMPLOYED.") can be implemented like this:
SQL> create materialized view log on does_job with rowid
  2  /

Materialized view log created.

SQL> create materialized view log on job_unk with rowid
  2  /

Materialized view log created.

SQL> create materialized view log on unemployed with rowid
  2  /

Materialized view log created.

SQL> create materialized view foreign_distributed_key_vw
  2    refresh fast on commit
  3  as
  4  select c.rowid  c_rowid
  5       , dj.rowid dj_rowid
  6       , ju.rowid ju_rowid
  7       , ue.rowid ue_rowid
  8       , c.id     id
  9       , dj.id    dj_id
 10       , ju.id    ju_id
 11       , ue.id    ue_id
 12    from called c
 13       , does_job dj
 14       , job_unk ju
 15       , unemployed ue
 16   where c.id = dj.id (+)
 17     and c.id = ju.id (+)
 18     and c.id = ue.id (+)
 19  /

Materialized view created.

SQL> alter table foreign_distributed_key_vw
  2    add constraint foreign_distributed_key_check
  3    check (coalesce(dj_id,ju_id,ue_id) is not null)
  4  /

Table altered.
And some proof that this implementation works:
SQL> insert into called values (1238,'Elise')
  2  /

1 row created.

SQL> commit
  2  /
commit
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-02290: check constraint (RWIJK.FOREIGN_DISTRIBUTED_KEY_CHECK) violated
Would I go through the extra trouble of an implementation with 6 more tables, 14 extra constraints and worse performance like above? It depends. It depends on how often the data is queried, and on how often it is updated concurrently. And on whether the distinction between the possible multiple meanings of NULL is relevant in my case. And whether I have sufficient extra time to implement it. Using Oracle, probably most often, I won't.




PS: For all Dutch DBA's: here is a symposium you don't want to miss.