Wednesday, December 17, 2008

Plan a schedule with the SQL model clause

My DBA colleague Ronald Rood asked an interesting question on OTN's SQL and PL/SQL forum. I'll repeat the question here. First the setup of the table:

rwijk@ORA11GR1> create table sschedule (
2 item varchar2(10)
3 , days varchar2(30)
4 , fixed_start_time varchar2(5)
5 , minutes number
6 )
7 /

Tabel is aangemaakt.

rwijk@ORA11GR1> insert into sschedule (item,days, fixed_start_time, minutes)
2 select 'lunch', 'mon,tue,wed,thu,fri','12:00',60 from dual union all
3 select 'a', 'tue,fri','10:00',60 from dual union all
4 select 'b', 'mon,wed',null, 120 from dual union all
5 select 'opening','mon,tue,wed,thu,fri','08:55', 5 from dual union all
6 select 'close','mon,tue,wed,thu','20:00', 5 from dual union all
7 select 'close','fri','16:00', 5 from dual union all
8 select 'c', 'mon,wed,fri',null, 20 from dual union all
9 select 'd', 'mon,wed,fri',null, 20 from dual union all
10 select 'diner','mon,tue,wed,thu','18:00', 60 from dual union all
11 select 'e','tue,thu,fri',null, 40 from dual union all
12 select 'keynote', 'mon','09:00', 120 from dual union all
13 select 'bye', 'fri','14:00', 120 from dual union all
14 select 'f','tue,thu,fri',null, 40 from dual union all
15 select 'g', 'mon,wed',null, 120 from dual
16 /

14 rijen zijn aangemaakt.

rwijk@ORA11GR1> select * from sschedule
2 /

ITEM DAYS FIXED MINUTES
---------- ------------------------------ ----- ----------
lunch mon,tue,wed,thu,fri 12:00 60
a tue,fri 10:00 60
b mon,wed 120
opening mon,tue,wed,thu,fri 08:55 5
close mon,tue,wed,thu 20:00 5
close fri 16:00 5
c mon,wed,fri 20
d mon,wed,fri 20
diner mon,tue,wed,thu 18:00 60
e tue,thu,fri 40
keynote mon 09:00 120
bye fri 14:00 120
f tue,thu,fri 40
g mon,wed 120

14 rijen zijn geselecteerd.

And the accompanying question:
"I have a table that contains a list of items that are to be discussed in a meeting, all taking place in the same room. For some meetings a guest speaker is invited, those meetings have a fixed start time. All meetings have a fixed duration in minutes. The days all start with opening and end with close. How can we generate the start times of the items that have no fixed start time set?

for friday this gives:
ITEM    FIXED MINUTES
------- ----- -------
opening 08:55 5
a 10:00 60
lunch 12:00 60
bye 14:00 120
close 16:00 5
c 20
d 20
e 40
f 40

e can start at 09:00 to fill the gap between opening and a.
c can start at 09:40 to fill the gap between e and a.
f can start at 11:00 to fill the gap between a and lunch.
d can start at 11:40 to fill the gap between f and lunch.

more combinations are very legal, as long as the fixed_start_times are honoured.
How can this list be generated in 1 sql?
When a day becomes overbooked the items that fall must get something like 'does not fit' to signal it's too busy that day."


In real life I would never implement a SQL only solution for such a problem, because the resulting SQL will be quite hard to maintain, even when properly documented. But it sure is fun to do it using only SQL.

If you clicked the link in the beginning of the post, you might have seen a solution already. In this post I will try to explain how I solved this problem which may look very difficult at first. I think it demonstrates the power of the SQL model clause very well.

The first thing to do here, is to normalize the table, by splitting the comma separated string into separate rows, one row for each day. Without such a normalized set, a solution would be much harder. To split the string into several rows, I use the technique that resembles the one described in this post, particularly the SQL model clause variant. This query does this part of the job:

rwijk@ORA11GR1> select item
2 , cast(days as varchar2(3)) day
3 , to_date(fixed_start_time,'hh24:mi') start_time
4 , minutes
5 , to_date(fixed_start_time,'hh24:mi')
6 + numtodsinterval(minutes,'minute') end_time
7 from sschedule
8 model
9 return updated rows
10 partition by (item,fixed_start_time,minutes)
11 dimension by (0 i)
12 measures (days)
13 ( days [for i from 1 to regexp_count(days[0],',') + 1 increment 1]
14 = regexp_substr(days[0],'[^,]+',1,cv(i))
15 )
16 order by decode(day,'mon',1,'tue',2,'wed',3,'thu',4,'fri',5)
17 , start_time
18 /

ITEM DAY START_TIME MINUTES END_TIME
---------- --- ------------------- ---------- -------------------
opening mon 01-12-2008 08:55:00 5 01-12-2008 09:00:00
keynote mon 01-12-2008 09:00:00 120 01-12-2008 11:00:00
lunch mon 01-12-2008 12:00:00 60 01-12-2008 13:00:00
diner mon 01-12-2008 18:00:00 60 01-12-2008 19:00:00
close mon 01-12-2008 20:00:00 5 01-12-2008 20:05:00
b mon 120
d mon 20
c mon 20
g mon 120
opening tue 01-12-2008 08:55:00 5 01-12-2008 09:00:00
a tue 01-12-2008 10:00:00 60 01-12-2008 11:00:00
lunch tue 01-12-2008 12:00:00 60 01-12-2008 13:00:00
diner tue 01-12-2008 18:00:00 60 01-12-2008 19:00:00
close tue 01-12-2008 20:00:00 5 01-12-2008 20:05:00
f tue 40
e tue 40
opening wed 01-12-2008 08:55:00 5 01-12-2008 09:00:00
lunch wed 01-12-2008 12:00:00 60 01-12-2008 13:00:00
diner wed 01-12-2008 18:00:00 60 01-12-2008 19:00:00
close wed 01-12-2008 20:00:00 5 01-12-2008 20:05:00
b wed 120
g wed 120
d wed 20
c wed 20
opening thu 01-12-2008 08:55:00 5 01-12-2008 09:00:00
lunch thu 01-12-2008 12:00:00 60 01-12-2008 13:00:00
diner thu 01-12-2008 18:00:00 60 01-12-2008 19:00:00
close thu 01-12-2008 20:00:00 5 01-12-2008 20:05:00
e thu 40
f thu 40
opening fri 01-12-2008 08:55:00 5 01-12-2008 09:00:00
a fri 01-12-2008 10:00:00 60 01-12-2008 11:00:00
lunch fri 01-12-2008 12:00:00 60 01-12-2008 13:00:00
bye fri 01-12-2008 14:00:00 120 01-12-2008 16:00:00
close fri 01-12-2008 16:00:00 5 01-12-2008 16:05:00
c fri 20
d fri 20
e fri 40
f fri 40

39 rijen zijn geselecteerd.

This query puts every row in its own partition, and indexes this row with dimension i set to 0. These rows are the original ones, which we won't return, because of the use of the keyword RETURN UPDATED ROWS. The model creates new cells with dimension values from 1 and upwards. In each partition the number of commas is calculated with regexp_count(days[0],','). The number of commas plus one is the number of rows that have to be generated. For each new row, regexp_substr(days[0],'[^,]+',1,cv(i)) gives the cv(i)-th element in the string.

The less trickier part of the query is to change some datatypes: days is set to varchar2(3) for a prettier layout only, and the start_time and end_time should be dates for easier calculating. You see that a default date is chosen, being the first day of the current month. This part of the date is irrelevant. Using intervals (numtodsinterval) I can now easily add the number of minutes with the start_time to calculate the end_time.

This was the easy part :-). Next challenge is to process all rows with an empty start_time and allocate a time slot to them. The question does not describe an algorithm how to do this, so I choose an easy one: process all rows with an empty start time and start with the longest one. For each of these rows, assign the row to the largest gap in time, at the beginning of the interval. A few challenges here: determining the gaps, adjusting the gaps after an item has been assigned in a gap and detecting when an item doesn't fit in any gap.

For these challenges I put up another model, partitioned by day. The rows inside each partition are indexed by a number with the row_number analytic function, the rows with an empty start_time first. Like this I have a nice dimension I can iterate over AND I can use the UNTIL clause to only iterate over the ones with a empty start_time. For this I have to introduce the cnt measure, that counts the number of empty start_times in a partition. So in each partition there will be as many iterations as there are cells with empty start_times.

Let's first show the resulting query and then explain what's going on:

rwijk@ORA11GR1> with schedule_normalized as
2 ( select item
3 , cast(days as varchar2(3)) day
4 , to_date(fixed_start_time,'hh24:mi') start_time
5 , minutes
6 , to_date(fixed_start_time,'hh24:mi')
7 + numtodsinterval(minutes,'minute') end_time
8 from sschedule
9 model
10 return updated rows
11 partition by (item,fixed_start_time,minutes)
12 dimension by (0 i)
13 measures (days)
14 ( days [for i from 1 to regexp_count(days[0],',') + 1 increment 1]
15 = regexp_substr(days[0],'[^,]+',1,cv(i))
16 )
17 )
18 select item
19 , day
20 , to_char(st,'hh24:mi') start_time
21 , to_char(et,'hh24:mi') end_time
22 , minutes
23 , fit
24 from schedule_normalized
25 model
26 partition by (day)
27 dimension by
28 ( row_number() over
29 (partition by day order by start_time nulls first, minutes desc) rn
30 )
31 measures
32 ( item
33 , start_time st
34 , minutes
35 , end_time et
36 , lead(start_time) over
37 (partition by day order by start_time) next_start_time
38 , count(nvl2(start_time,null,1)) over (partition by day) cnt
39 , 0 rn_with_largest_gap
40 , row_number() over
41 (partition by day order by start_time nulls first, minutes desc) rnm
42 , cast(null as varchar2(11)) fit
43 )
44 rules iterate(1000) until (iteration_number + 1 = cnt[1])
45 ( rn_with_largest_gap[1]
46 = max(rnm) keep
47 (dense_rank last order by next_start_time - et nulls first)[any]
48 , fit[iteration_number+1]
49 = case
50 when max(next_start_time - et)[any]
51 < minutes[iteration_number+1]/24/60
52 then 'doesn''t fit'
53 end
54 , st[iteration_number+1]
55 = case
56 when fit[iteration_number+1] is null
57 then et[rn_with_largest_gap[1]]
58 end
59 , et[iteration_number+1]
60 = case
61 when fit[iteration_number+1] is null
62 then et[rn_with_largest_gap[1]]
63 + numtodsinterval(minutes[iteration_number+1],'minute')
64 end
65 , next_start_time[iteration_number+1]
66 = case
67 when fit[iteration_number+1] is null
68 then next_start_time[rn_with_largest_gap[1]]
69 end
70 , next_start_time[rn_with_largest_gap[1]]
71 = case
72 when fit[iteration_number+1] is not null
73 then next_start_time[rn_with_largest_gap[1]]
74 end
75 )
76 order by decode(day,'mon',1,'tue',2,'wed',3,'thu',4,'fri',5)
77 , start_time
78 /

ITEM DAY START END_T MINUTES FIT
---------- --- ----- ----- ---------- -----------
opening mon 08:55 09:00 5
keynote mon 09:00 11:00 120
d mon 11:00 11:20 20
lunch mon 12:00 13:00 60
g mon 13:00 15:00 120
b mon 15:00 17:00 120
diner mon 18:00 19:00 60
c mon 19:00 19:20 20
close mon 20:00 20:05 5
opening tue 08:55 09:00 5
a tue 10:00 11:00 60
lunch tue 12:00 13:00 60
f tue 13:00 13:40 40
e tue 13:40 14:20 40
diner tue 18:00 19:00 60
close tue 20:00 20:05 5
opening wed 08:55 09:00 5
g wed 09:00 11:00 120
lunch wed 12:00 13:00 60
b wed 13:00 15:00 120
d wed 15:00 15:20 20
c wed 15:20 15:40 20
diner wed 18:00 19:00 60
close wed 20:00 20:05 5
opening thu 08:55 09:00 5
lunch thu 12:00 13:00 60
e thu 13:00 13:40 40
f thu 13:40 14:20 40
diner thu 18:00 19:00 60
close thu 20:00 20:05 5
opening fri 08:55 09:00 5
d fri 09:00 09:20 20
c fri 09:20 09:40 20
a fri 10:00 11:00 60
f fri 11:00 11:40 40
lunch fri 12:00 13:00 60
e fri 13:00 13:40 40
bye fri 14:00 16:00 120
close fri 16:00 16:05 5

39 rijen zijn geselecteerd.

In the with clause you see the normalized query discussed earlier. The part I'm discussing here, starts at line 18. For each iteration I have to choose which of the rows has the largest open time slot attached at the end. For this the auxiliary measure next_start_time is introduced, calculated with the lead analytic function. The index of the row with the largest open time slot is stored in the auxiliary measure rn_with_largest_gap. Each iteration starts with calculating this value again. All subsequent rules of the models use this rn_with_largest_gap measure. Rules 3 and 4 calculate the start_time and end_time of the rows that started out with an empty start_time. The last two rules adjust the next_start_time measures to the new situation: since the new allocated time slot is adjacent to the existing time slot, the next_start_time of the original one is set to null, and the next_start_time of the new time slot is the original next_start_time.

Last part of the solution is to addition of the fit measure. With this measure and the second rule, I can check whether the largest open time slot is large enough for the item. If it is not large enough, I put in a text "doesn't fit" in this cell. All subsequent rules effectively do nothing when a non null value is encountered in this cell. With the current data, everything fits, but if you click on the link to the original question, you'll see a situation where two rows don't fit.

If you have not given up and have read up through here, you maybe agree with me that this is another nice example of a complex algorithm that can be solved with only SQL. Although in this category, nothing beats this one of course. But remember kids: don't try this at work ;-)

2 comments:

  1. Rob,

    I am exhausted to death unserstanding this...:)
    Looking at these examples, I really wonder what is the objective behind introducing MODEL clause, if, as you said, it may not be practical to use them in production. While MODEL clause literally opens hundreads of opportunities to use SQL to solve vast set of problems, it seems many implementations are close to rocket-science for "normal" (or average) developers to understand.
    So when would you suggest using MODEL clause in real life ?

    ReplyDelete
  2. Hi Narendra,

    I totally understand and agree. It's only an example to show that using the SQL model clause opens up so many new possibilities. And it's fun if you like SQL. But it's hardly serious.

    Your question when I would suggest using the SQL model clause in real life, is the exact subject of the third part of the SQL Model Clause Tutorial. I finished this article for a Dutch paper in July, and it is to be published any time soon - this month I was told. Immediately after that, I will put a translated version on this blog. So I'll have to ask for your patience here.

    Regards,
    Rob.

    ReplyDelete