Sunday, October 6, 2013

Distributing tables evenly into groups using the SQL Model Clause

My colleague Ronald Rood recently had a nice SQL challenge for me. He had to perform an export of all the tables in a schema the old fashioned way and wanted to manually parallellize the operation. For that to work, all tables need to be assigned to a group.

For the parallellization to work, the groups need to be balanced. If, say, all large tables are in one group, the parallellization won't help much, because the total time wouldn't be reduced much. So the question is how to distribute the tables evenly where all groups approximately have the same total size in number of bytes. Another requirement, which came later, was to also take into account the total number of tables, and distribute the number of tables evenly as well. The schema contained 90,000 tables from a standard software application, of which most tables were empty.

Ronald supplied a create table script for me to test with, simulating the content of his dba_tables dictionary view, which is the source for his query. The test table contains 14,999 records. Here is the table and a select statement that gives you an idea what's in the table.

SQL> desc mytables
 Name                                                   Null?    Type         
 ------------------------------------------------------ -------- --------------
 TABLE_NAME                                                      VARCHAR2(30)
 NUM_ROWS                                                        NUMBER
 AVG_ROW_LEN                                                     NUMBER

SQL> select rownum
  2       , nvl(num_rows,0) * nvl(avg_row_len,0) bytes
  3    from mytables
  4   order by bytes desc
  5  /

         ROWNUM           BYTES
--------------- ---------------
              1     29387074200
              2     28471135400
              3     23974896400
              4     18589737600
              5     17953177900
              6     17014479300
              7     10880526800
              8      8832810600
              9      8372641700
             10      7888944000
             11      7527559700
             12      6314082900
             13      5814484500
             14      5452809600
             15      5194260000
             16      5160845400
             17      4323377800
             18      4245004800
             19      4226310600
             20      4196381000
...

           2256               4
           2257               4
           2258               3
           2259               3
           2260               3
           2261               3
           2262               3
           2263               2
           2264               2
           2265               2
           2266               2
           2267               2
           2268               2
           2269               2
           2270               2
           2271               2
           2272               2
           2273               0
           2274               0
           2275               0
           2276               0
...

          14995               0
          14996               0
          14997               0
          14998               0
          14999               0

14999 rows selected.


So there are a few large tables, the largest one being approximately 29GB. And most of the tables, 12727, are empty.

The algorithm I chose to distribute the tables in N groups, is the simplest one I could think of, and goes schematically like this:
  • order all tables by size in descending order
  • place the N largest tables in groups 1 .. N
  • iterate over the tables with a non-empty size in descending order and add the table to the first group encountered whose size is below the running average size of the groups
  • iterate over the empty tables and add the table to the first group encountered whose total number of tables is below the average number of tables per group
This algorithm isn't perfect, but for a large number of tables, it's pretty good. And certainly good enough for this one-off task.

Below is the SQL statement used, with lines 59-64 being different, just to show the distribution better. The original statement would just contain a "select * from t" there.

SQL> var NUMBER_OF_GROUPS number

SQL> exec :NUMBER_OF_GROUPS := 10



PL/SQL procedure successfully completed.

 
SQL> with t as
  2  ( select table_name
  3         , orig_bytes
  4         , grp
  5      from ( select table_name
  6                  , nvl(num_rows,0) * nvl(avg_row_len,0) bytes
  7                  , count(*) over () cnt
  8               from mytables
  9           )
 10     model
 11           dimension by (row_number() over (order by bytes desc) i)
 12           measures
 13           ( table_name
 14           , bytes
 15           , bytes orig_bytes
 16           , row_number() over (order by bytes desc) grp
 17           , cnt
 18           , sum(bytes) over (order by bytes desc) running_sum
 19           , 1 aantal
 20           )
 21           rules iterate (100000) until (iteration_number+1 >= cnt[1] - :AANTAL_GROEPEN)
 22           ( grp[:AANTAL_GROEPEN+1+iteration_number]
 23             = case
 24               when bytes[:AANTAL_GROEPEN+1+iteration_number] > 0 then
 25                 case
 26                 when bytes[1] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 1
 27                 when bytes[2] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 2
 28                 when bytes[3] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 3
 29                 when bytes[4] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 4
 30                 when bytes[5] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 5
 31                 when bytes[6] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 6
 32                 when bytes[7] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 7
 33                 when bytes[8] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 8
 34                 when bytes[9] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 9
 35                 when bytes[10] <= running_sum[cv()-1] / :AANTAL_GROEPEN then 10
 36                 end
 37               else  -- lege tabellen
 38                 case
 39                 when aantal[1] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 1
 40                 when aantal[2] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 2
 41                 when aantal[3] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 3
 42                 when aantal[4] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 4
 43                 when aantal[5] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 5
 44                 when aantal[6] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 6
 45                 when aantal[7] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 7
 46                 when aantal[8] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 8
 47                 when aantal[9] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 9
 48                 when aantal[10] <= (:AANTAL_GROEPEN+1+iteration_number ) / :AANTAL_GROEPEN then 10
 49                 end
 50               end
 51           , bytes[grp[:AANTAL_GROEPEN+1+iteration_number]]
 52             = bytes[cv()] + bytes[:AANTAL_GROEPEN+1+iteration_number]
 53           , aantal[grp[:AANTAL_GROEPEN+1+iteration_number]]
 54             = aantal[cv()] + 1
 55           )
 56  )
 57  select grp
 58       , sum(orig_bytes)
 59       , count(*)
 60    from t
 61   group by grp
 62   order by grp
 63  /


           GRP SUM(ORIG_BYTES)        COUNT(*)

--------------- --------------- ---------------

              1     29387074200            1500

              2     28471135400            1500

              3     27978114708            1500

              4     27978114707            1500

              5     27978114707            1500

              6     27978114708            1500

              7     27978114706            1500

              8     27978114706            1500

              9     27978114707            1500

             10     26076134516            1499



10 rows selected.

A few explanatory remarks. I needed some extra measures to help with the calculation of the average size of the groups:
  • the running sum of bytes
 The analytic function "sum(bytes) over (order by bytes desc)" calculates this running sum. Divides by the :NUMBER_OF_GROUPS, this gives us the average size of the groups, needed in the algorithm.
  • the current size of all groups
The bytes "array" contains 14,999 elements, but the first NUMBER_OF_GROUPS (10) elements are used for the intermediate results of the group size.
  • the total number of elements
This one is calculated by a simple "count(*) over ()" and is needed in the UNTIL clause to stop the iteration when there are no more elements.

The whole statement is a tad ugly because the case expression has 10 repeating WHEN..THEN constructions, but it does the job and it allows you to play with the NUMBER_OF_GROUPS variable, which needs to be between 1 and 10. But, it is another nice use case for the SQL model clause, and it might come in handy for other DBA's or developers who need a single query to distribute their data evenly.

UPDATE:
Of course it has turned out the problem isn't new.
Here is a blogpost from Brendan Furey about the same subject.
And here is the AskTom thread he refers to with several model clause queries and one using recursive subquery factoring.

2 comments:

  1. Hello Rob,

    Very nice SELECT :) :)

    I just asked myself whether using ROWNUM in a SELECT with "ORDER BY bytes DESC" will ensure you that the MODEL iterations do indeed process the rows
    in "bytes DESC" order, since ROWNUM values are assigned *BEFORE* the
    "ORDER BY bytes" happen.

    So, isn't it necessary to use ROW_NUMBER() OVER(ORDER BY bytes DESC)
    instead of ROWNUM in the inner SELECT ?

    Of course, this is the case only if processing the tables in descending size order
    by the MODEL iterations is indeed essential for the entire algorithm.

    Thanks a lot & Best Regards,
    Iudith Mentzel
    Oracle developer & PL/SQL Challenge player
    Haifa, Israel

    ReplyDelete
  2. Hello iudith,

    By processing the tables in descending size order, you considerably raise the odds that the distribution will be spread more evenly. So yes, it is quite essential.

    For my colleague, I used a statement that had a (select rownum ... from (select ... order by bytes desc)), but for this blog post I thought tidying up my statement and lose a subquery would not affect the outcome. I was wrong thinking that, but just by accident the outcome wasn't affected because the rows in the table were pre-sorted.

    So thank you very much for pointing it out. I've corrected the SQL statement.

    Regards,
    Rob.

    ReplyDelete