Thursday, March 19, 2009

Choosing between SQL and PL/SQL

If you prefer to read in German, then you can find a translation here.


On my post Calculating probabilities with N throws of a die, I received a comment by Narendra saying:

I hope you are not serious about your last statement with some comments I'm sure it's not that hard to maintain....:)

But I was serious. However, it's a sentiment I hear a lot: after my presentation Do more with SQL I received similar comments claiming PL/SQL would be easier in some cases. And according to Iggy Fernandez, Steven Feuerstein has said here:

Some people can perform seeming miracles with straight SQL, but the statements can end up looking like pretzels created by someone who is experimenting with hallucinogens.

And I don't get it.

Well, I can understand why that would be a first reaction when seeing some undocumented long piece of SQL containing some of the newer SQL constructs. But when thinking a bit longer about the subject, I don't think it's a totally fair reaction. Here's why.

  1. When faced with a challenging problem, most people tend to resort to the language they are most comfortable with. I know I do. For example, I'm way better with PL/SQL than I am with Java. So when faced with a hard algorithm, I'll always use PL/SQL. And I bet a Java programmer reasons the other way round. So when saying that straight SQL is harder to maintain than PL/SQL, I guess you are really saying that your PL/SQL skills are very good, but your SQL skills are, well, somewhat less than very good. That's no problem at all, since you will still be able to build applications effectively. But I don't think the language itself is to blame, it's the skills of the people talking that language.


  2. In production code, I see PL/SQL code more often being documented with comments than SQL code. But every piece of production code that isn't straightforward should be documented. Why does SQL code rarely contain comments? Probably because most SQL statements in production are of the basic SELECT ... FROM ... WHERE kind. And when used to not commenting those easy SQL statements, the harder ones are almost automatically lacking comments as well. When you start adding comments for SQL code, as well as for your PL/SQL code, then it will be a reason less for why you may find SQL code harder to grasp.


  3. In PL/SQL you are able to split up a complex task into many simple tasks. Each task is then performed by a single function or a procedure. And those functions and procedures have clear names, making the code self documenting. This is called modularizing your code and you've probably been taught about this subject in school already. Complex SQL used to be just a giant piece of text lacking this ability, but from Oracle9i onwards, Oracle gave us the WITH-clause (also known as subquery factoring). With this clause you can give a meaningful name to each subpiece of SQL. Thus, the same kind of modularization we achieve with PL/SQL, is possible with SQL since 9i as well.

And so I think SQL is equally readable, just a little more compact. And often faster due to less context switching. And more often correct because it's one read consistent query instead of several queries taking place at different times.

An example using the code from Calculating probabilities with N throws of a die, of how it could look like in production. Difference is that the results of that query are inserted into a table. I made both variants "production like" by documenting them well. First a package using SQL:

rwijk@ORA11GR1> create package probabilities_sql
2 as
3 --
4 -- The procedure "calculate" calculates all probabilities with
5 -- p_number_of_dies throws of a die. The results of the calculation
6 -- are inserted into the table probabilities.
7 --
8 procedure calculate (p_number_of_throws in number)
9 ;
10 end probabilities_sql;
11 /

Package is aangemaakt.

rwijk@ORA11GR1> create package body probabilities_sql
2 as
3 procedure calculate (p_number_of_throws in number)
4 is
5 begin
6 insert into probabilities
7 ( sum_of_dies
8 , percentage
9 )
10 with number_of_die_faces as (select count(*) cnt from die)
11 , all_probabilities as
12 ( select sum_value
13 , prob
14 , i
15 from --
16 -- Generate as many rows as there are possible combinations of the
17 -- dies. This equals: power(,p_number_of_throws).
18 -- For example: with a traditional die (6 faces) and 3 throws, there
19 -- are power(6,3) = 216 rows with a l-value running from 1 until 216.
20 --
21 ( select level l
22 from number_of_die_faces
23 connect by level <= power(cnt,p_number_of_throws)
24 )
25 , number_of_die_faces
26 model
27 --
28 -- A reference model to be able to quickly lookup the face_value
29 -- and probability when provided a face_id
30 --
31 reference r on (select face_id, face_value, probability from die)
32 dimension by (face_id)
33 measures (face_value,probability)
34 main m
35 --
36 -- Each combination is in a different partition.
37 -- Which means it is easy to parallellize if necessary.
38 --
39 partition by (l rn, cnt)
40 dimension by (0 i)
41 measures (0 die_face_id, 0 sum_value, 1 prob, l remainder)
42 --
43 -- Iterate as many times as there are throws of the die.
44 --
45 rules iterate (1000) until (iteration_number+1=p_number_of_throws)
46 --
47 -- For each throw of the die, calculate the face_id, remainder, the
48 -- sum and probability. For the sum and probability, the reference
49 -- model is used as a lookup. Each iteration overwrites the previous
50 -- one.
51 --
52 ( die_face_id[0] = 1 + mod(remainder[0]-1,cv(cnt))
53 , remainder[0] = ceil((remainder[0] - die_face_id[0] + 1) / cv(cnt))
54 , sum_value[0] = sum_value[0] + face_value[die_face_id[0]]
55 , prob[0] = prob[0] * probability[die_face_id[0]]
56 )
57 )
58 --
59 -- All probabilities of each possible combination are now calculated.
60 -- Now, sum them all up per sum of all face_values.
61 --
62 select sum_value
63 , sum(prob)
64 from all_probabilities
65 group by sum_value
66 ;
67 end calculate;
68 end probabilities_sql;
69 /

Package-body is aangemaakt.


And a package doing it the PL/SQL way, using the same idea:

rwijk@ORA11GR1> create package probabilities_plsql
2 as
3 --
4 -- The procedure "calculate" calculates all probabilities with
5 -- p_number_of_dies throws of a die. The results of the calculation
6 -- are inserted into the table probabilities.
7 --
8 procedure calculate (p_number_of_throws in number)
9 ;
10 end probabilities_plsql;
11 /

Package is aangemaakt.

rwijk@ORA11GR1> create package body probabilities_plsql
2 as
3 g_number_of_die_faces number(4)
4 ;
5 procedure initialization
6 --
7 -- Calculate the number of die faces (6 in case of a traditional die) only once.
8 --
9 is
10 begin
11 select count(*)
12 into g_number_of_die_faces
13 from die
14 ;
15 end initialization
16 ;
17 function face_value
18 ( p_face_id in die.face_id%type
19 ) return die.face_value%type result_cache relies_on (die)
20 --
21 -- A lookup function returning the face_value of a given face_id.
22 -- This function is called multiple times for the same face_id's and
23 -- is therefore optimized by the result_cache hint.
24 --
25 is
26 l_face_value die.face_value%type;
27 begin
28 select face_value
29 into l_face_value
30 from die
31 where face_id = p_face_id
32 ;
33 return l_face_value;
34 end face_value
35 ;
36 function probability
37 ( p_face_id in die.face_id%type
38 ) return die.probability%type result_cache relies_on (die)
39 --
40 -- A lookup function returning the probability of a given face_id.
41 -- This function is called multiple times for the same face_id's and
42 -- is therefore optimized by the result_cache hint.
43 --
44 is
45 l_probability die.probability%type;
46 begin
47 select probability
48 into l_probability
49 from die
50 where face_id = p_face_id
51 ;
52 return l_probability;
53 end probability
54 ;
55 procedure calculate (p_number_of_throws in number)
56 is
57 l_die_face_id die.face_id%type;
58 l_remainder number(10);
59 l_sum probabilities.sum_of_dies%type;
60 l_probability probabilities.percentage%type
61 ;
62 type ta_probabilities is table of probabilities%rowtype index by pls_integer;
63 a_probabilities ta_probabilities;
64 begin
65 --
66 -- Loop as many times as there are possible combinations of the
67 -- dies. This number equals: power(,p_number_of_throws).
68 -- For example: with a traditional die (6 faces) and 3 throws, there
69 -- are power(6,3) = 216 iterations.
70 --
71 for i in 1 .. power(g_number_of_die_faces,p_number_of_throws)
72 loop
73 l_remainder := i;
74 l_sum := 0;
75 l_probability := 1;
76 --
77 -- For each combination, iterate over all throws of each individual die,
78 -- and calculate the face_id of that die (using l_die_face_id and
79 -- l_remainder) and use that face_id to calculate the sum of the die
80 -- face values and the probability.
81 --
82 for j in 1 .. p_number_of_throws
83 loop
84 l_die_face_id := 1 + mod(l_remainder-1, g_number_of_die_faces);
85 l_remainder := ceil((l_remainder-l_die_face_id+1)/g_number_of_die_faces);
86 l_sum := l_sum + face_value(l_die_face_id);
87 l_probability := l_probability * probability(l_die_face_id);
88 end loop;
89 --
90 -- Sum up all the probabilities with the same sum.
91 --
92 a_probabilities(l_sum).sum_of_dies := l_sum;
93 a_probabilities(l_sum).percentage :=
94 nvl(a_probabilities(l_sum).percentage,0) + l_probability
95 ;
96 end loop;
97 --
98 -- Bulk insert all calculated probabilities into the table PROBABILIIES.
99 --
100 forall i in indices of a_probabilities
101 insert into probabilities
102 values a_probabilities(i)
103 ;
104 end calculate
105 ;
106 begin
107 initialization;
108 end probabilities_plsql;
109 /

Package-body is aangemaakt.


Note that this algorithm is not data intensive and the PL/SQL variant here is actually faster than the SQL variant due to 11g's result cache, but that's not the point here. The point is of course readability. Do you really think the SQL variant is much more complex than the PL/SQL variant? I'd love to hear your thoughts about this subject, whether you agree or not.



UPDATE



Three nice follow-up blog entries:

by Chen Shapira

by Laurent Schneider

by H.Tonguç Yılmaz

16 comments:

  1. Rob,
    excellent post.
    The pl/sql variant is much more comfortable to me. You are absolutely right in your first point:

    "So when saying that straight SQL is harder to maintain than PL/SQL, I guess you are really saying that your PL/SQL skills are very good, but your SQL skills are, well, somewhat less than very good."

    I am not comfortable with the model clause (yet) and that is the reason I find the sql variant hard the read.
    Unfortunately I can not blame you for the fact that you do understand this model clause ;-)

    So in my opinion you can absolutely use the model clause unless (as you said) you document it.

    ReplyDelete
  2. Hey Rob,
    Even if I am a big fan of Model, I would ***DEFINITELY*** use plsql or java to solve this issue.

    Why? Because it will be way more performant! And also because, frankly, model is unmaintenable, almost unsupported (did you ever try to chase bugs with model with metalink), completly unknown.

    I definitely second Steven Feuerstein : I do take hallucinogen before solving riddles with SQL ;)

    ReplyDelete
  3. well, maybe it was a bit hard words against model, model is fine for the purpose it has been designed for, worksheet functionality...

    ReplyDelete
  4. Rob,

    I completely agree with all of the 3 reasons that you mentioned. And these are the exact reasons, especially the first, for my comment earlier. In our profession, we generally work in teams. Now I think it is almost always the case that not every team member has same skill-level, especially in SQL. When we write code, we should also consider whether the code will be "maintainable". Now I know it is a relative term, but majority of "PL/SQL developers" are always more comfortable with amending PL/SQL code rather than pretzel-looking SQLs. Now, I agree that it is not a good execuse to not write SQL, but unfortunately, it is a reality. Even you mentioned in one of your posts about MODEL clause, that it should not be used in production. While commenting may help, I guess at the end of the day, it is left to individual teams to decide when a SQL-approach starts looking like a pretzel and hence should be replaced by PL/SQL code.
    As they say, Never say NEVER, never say ALWAYS

    ReplyDelete
  5. Good post.

    IMHO
    All you need is solve your problem (task) well. All another - method. The main thing - what do we mean saying "well"?

    ReplyDelete
  6. Thanks Erik, Laurent, Narendra and Mironm all very much for sharing your opinion here.

    Some reactions:

    Laurent: you certainly cannot be accused of being less skilled in SQL, so it's interesting to see you'd use PL/SQL. And do you have the same opinion if it wasn't a model clause, but - say - a hierarchical query or a query containing advanced XML-functions and/or regular expressions?

    By the way: now I really know what they mean with SQL junkie ;-)

    Narendra: It's indeed a reality that PL/SQL skills are more abundant in a team most of the time. But as an Oracle developer I think you should strive to be skilled in both. For example: in 2005 I had never written a SQL model clause query. But if I had to maintain one back then, I'd not say "what an unmaintainable mess, let's rewrite it to PL/SQL", but rather "oops, I don't know this construct yet, I have to learn about it as fast as I can". So I wouldn't let current knowledge prevail that much, but try to leverage the power of both languages.

    And in the above example, I think the SQL model clause is still relatively simple, especially when properly documented. I would not use it in production when the model is really difficult, as in that post you refer to, or in the famous sudoku solver.

    ReplyDelete
  7. > advanced XML-functions
    Oh yes! it will just degrade your mind to use XML functions instead of PL/SQL.

    I just had a session on XML this week and, when someone asked if it is performant, I showed that SELECT * FROM XMLTABLE('1 to 10') takes about 10 seconds to complete!

    I love the challenge and congratulate you on your solutions. But please: use hierarchical queries for hierarchies (EMPNO=PRIOR MGR), use model for interrows calculation (worksheet) and xml functions to process XML documents.

    About solving Chen obfuscation context or Iggy sql competition with Model, keep on blogging your solutions!

    :)

    ReplyDelete
  8. Hi Laurent,

    I guess we have to agree to disagree.

    If some SQL construct is going to help me to achieve my desired result set in a performant and readable way, I'm going to use that and add sufficient comments for my successor.

    And if that means using a model clause to do string aggregation, or a hierarchical query to split comma separated string or whatever, then that's fine with me.

    To quote Greg Rahn:

    "You should be writing your SQL using the constructs that are provided. Often times very complex data transformations can be done with just SQL. Leverage this power."

    Regards,
    Rob.

    ReplyDelete
  9. Rob,

    I completely agree with you. Even I am of the opinion that if I am writing the same code, say using 8i in the past, and same code, say 10g today, I am almost certainly not evolving. I am happy that my code (and thought process) today looks much better that the one in 2005. But I still find at least 10 "Oracle Database Developers" who will start (and end) with using PL/SQL to solve problems. The problem that I see is most of the time "managers" just take the approach of getting work done rather than getting it done the smart way.
    Chen Shapira has also shared some interesting arguments and I must admit I did not think about the debugging aspect here.

    ReplyDelete
  10. Rob,

    You will make your code a Rob-code. With lots of comments, your successor will maybe understand how it works, but will he be able to modify it ... What about code reusability?

    It is great to use new features to enhance the performance. However I do not support integrating a new syntax because it exists unless it provides a ***significant*** advantage

    ReplyDelete
  11. Rob,

    I'd just like to add that most (all?) of the modularizations that you can do with WITH since 9i can be done in 8i as well: you just need to use an inline view for this. Actually, since Oracle's subquery factoring is not recursive (as MS SQL Server's Common Table Expressions) the difference between it and inline views are just syntactical and performance wise (different plans) - unless I missed something.

    Cheers

    robert

    ReplyDelete
  12. Robert,

    That's true. Inline views show some similarities with the with-clause, but also differences:

    You can't place them at the top for easy reading/referencing. And it's cumbersome to give the inline view a readable name since you will be using that same name as an alias.

    And using subquery factoring, the cost based optimizer has the extra possibility to materialize the set.

    Regards,
    Rob.

    ReplyDelete
  13. > You can't place them at the top for easy reading/referencing.

    That's the syntactical difference I meant.

    > And it's cumbersome to give the inline view a readable name since you will be using that same name as an alias.

    I hadn't though of that - although this can be easily mitigated by having appropriate comments. :-)

    > And using subquery factoring, the cost based optimizer has the extra possibility to materialize the set.

    I didn't know that. In fact, when I used SQF plans I got were worse than with inline views. Might be because of the particular type of problems I was trying to solve. Although, ideally the CBO should come up with an optimal plan regardless of IV or SQF - since they are semantically equivalent as far as I can see. It's different with SQL Server's CTE as I've pointed out.

    Thanks for the education!

    robert

    ReplyDelete
  14. I think there is a large inherent component of the suitability of various languages for various tasks, well beyond simple familiarity issues.

    In a nutshell, set based languages are going to be a lot easier for solving set based problems.

    You get the pretzel logic when you try to use set based language for problems that are complicated to express in a set that the language comprehends. I used to consider SQL puzzles de facto proofs of this. I may yet be convinced otherwise.

    You have to understand, I started on relational databases before SQL was a standard, and have never been convinced that business analysis easily translates to predicate caculus, or that SQL does either. So I use little SQL and less PL, choosing neither.

    word: citheea

    ReplyDelete
  15. Hi Joel,

    Well said. Maybe this book can convince you that business analysis easily translates to predicate calculus:
    http://www.amazon.com/Applied-Mathematics-Database-Professionals-Experts/dp/1590597451/ref=pd_sxp_f_pt

    Although I have to admit I don't usually use predicate calculus when solving some problem. Chris Date's seminar will probably show you how horribly wrong that is :-)

    Regards,
    Rob.

    ReplyDelete
  16. nice piece of information, I had come to know about your internet site from my friend vinay, delhi,i have read atleast 12 posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new post, once again hats off to you! Thanx a ton once again, Regards, sql and plsql difference

    ReplyDelete