Monday, May 31, 2010

Exponential Moving Average

There was a nice question on OTN today about whether there is a standard Oracle function to calculate the exponential moving average. The answer is that there is no such function, but with the model clause, you can calculate it very easy. And it's a great example of what I mean with "variable number of calculations based on calculated values", written in my third part of the model clause tutorial.

Before today, I didn't even know what an exponential moving average was exactly. You can read more about it here on Wikipedia or here with a nice example. From the first link:

An exponential moving average (EMA), applies weighting factors which decrease exponentially. The weighting for each older data point decreases exponentially, giving much more importance to recent observations while still not discarding older observations entirely.


From the second link:

The formula for calculating an Exponential Moving Average (EMA) is:

X = (K * (C - P)) + P

Where:

X = Current EMA (i.e. EMA to be calculated)

C = Current original data value

K = Smoothing Constant

P = Previous EMA

(The first EMA in the range to be calculated is arbitrary and can be the corresponding original data value or, often, a Simple Moving Average value.

Where:

K = Smoothing Constant = 2 / (1 + n)


And this formula is followed by an example which I extended a bit, using this table:

rwijk@TEST10> create table sales (product,month,amount)
2 as
3 select 'A', date '2009-01-01', 10 from dual union all
4 select 'A', date '2009-02-01', 15 from dual union all
5 select 'A', date '2009-03-01', 17 from dual union all
6 select 'A', date '2009-04-01', 20 from dual union all
7 select 'A', date '2009-05-01', 22 from dual union all
8 select 'A', date '2009-06-01', 20 from dual union all
9 select 'A', date '2009-07-01', 25 from dual union all
10 select 'A', date '2009-08-01', 27 from dual union all
11 select 'A', date '2009-09-01', 30 from dual union all
12 select 'A', date '2009-10-01', 35 from dual union all
13 select 'A', date '2009-11-01', 37 from dual union all
14 select 'A', date '2009-12-01', 40 from dual union all
15 select 'B', date '2009-01-01', 0 from dual union all
16 select 'B', date '2009-02-01', 50 from dual union all
17 select 'B', date '2009-03-01', 10 from dual union all
18 select 'B', date '2009-04-01', 40 from dual union all
19 select 'B', date '2009-05-01', 15 from dual union all
20 select 'B', date '2009-06-01', 35 from dual union all
21 select 'B', date '2009-07-01', 30 from dual union all
22 select 'B', date '2009-08-01', 30 from dual union all
23 select 'B', date '2009-09-01', 20 from dual union all
24 select 'B', date '2009-10-01', 20 from dual union all
25 select 'B', date '2009-11-01', 20 from dual union all
26 select 'B', date '2009-12-01', 20 from dual
27 /

Table created.


The records from product A match the example in the link. I made up the numbers from product B. Here is the model clause query that implements the formula. Note how the formula directly translate into the one and only rule of the model clause. The "smoothing constant" K is set to .5, based on a window of values (n) equalling 3.

rwijk@TEST10> select product
2 , month
3 , amount
4 , round(ema,3) "Exponential Moving Average"
5 from sales
6 model
7 partition by (product)
8 dimension by (month)
9 measures (amount,0 ema)
10 ( ema[any] order by month
11 = nvl2
12 ( ema[add_months(cv(),-1)]
13 , ( .5 * (amount[cv()] - ema[add_months(cv(),-1)])) + ema[add_months(cv(),-1)]
14 , amount[cv()]
15 )
16 )
17 order by product
18 , month
19 /

PRO MONTH AMOUNT Exponential Moving Average
--- ------------------- ---------- --------------------------
A 01-01-2009 00:00:00 10 10
A 01-02-2009 00:00:00 15 12,5
A 01-03-2009 00:00:00 17 14,75
A 01-04-2009 00:00:00 20 17,375
A 01-05-2009 00:00:00 22 19,688
A 01-06-2009 00:00:00 20 19,844
A 01-07-2009 00:00:00 25 22,422
A 01-08-2009 00:00:00 27 24,711
A 01-09-2009 00:00:00 30 27,355
A 01-10-2009 00:00:00 35 31,178
A 01-11-2009 00:00:00 37 34,089
A 01-12-2009 00:00:00 40 37,044
B 01-01-2009 00:00:00 0 0
B 01-02-2009 00:00:00 50 25
B 01-03-2009 00:00:00 10 17,5
B 01-04-2009 00:00:00 40 28,75
B 01-05-2009 00:00:00 15 21,875
B 01-06-2009 00:00:00 35 28,438
B 01-07-2009 00:00:00 30 29,219
B 01-08-2009 00:00:00 30 29,609
B 01-09-2009 00:00:00 20 24,805
B 01-10-2009 00:00:00 20 22,402
B 01-11-2009 00:00:00 20 21,201
B 01-12-2009 00:00:00 20 20,601

24 rows selected.


Challenge: try this without the model clause and see if you can come up with something more comprehensive ...

5 comments:

  1. 11.2 features in use

    with dat as (
    select 'A' product , date '2009-01-01' month, 10 amount from dual union all
    select 'A', date '2009-02-01', 15 from dual union all
    select 'A', date '2009-03-01', 17 from dual union all
    select 'A', date '2009-04-01', 20 from dual union all
    select 'A', date '2009-05-01', 22 from dual union all
    select 'A', date '2009-06-01', 20 from dual union all
    select 'A', date '2009-07-01', 25 from dual union all
    select 'A', date '2009-08-01', 27 from dual union all
    select 'A', date '2009-09-01', 30 from dual union all
    select 'A', date '2009-10-01', 35 from dual union all
    select 'A', date '2009-11-01', 37 from dual union all
    select 'A', date '2009-12-01', 40 from dual union all
    select 'B', date '2009-01-01', 0 from dual union all
    select 'B', date '2009-02-01', 50 from dual union all
    select 'B', date '2009-03-01', 10 from dual union all
    select 'B', date '2009-04-01', 40 from dual union all
    select 'B', date '2009-05-01', 15 from dual union all
    select 'B', date '2009-06-01', 35 from dual union all
    select 'B', date '2009-07-01', 30 from dual union all
    select 'B', date '2009-08-01', 30 from dual union all
    select 'B', date '2009-09-01', 20 from dual union all
    select 'B', date '2009-10-01', 20 from dual union all
    select 'B', date '2009-11-01', 20 from dual union all
    select 'B', date '2009-12-01', 20 from dual
    ), rns as (
    select dat.*
    , row_number() over (partition by product order by month) rn
    -- , 2/(1+count(*) over (partition by product)) k
    , 0.5 k
    from dat
    ), res(product, month, amount, rn, x ) as (
    select r.product, r.month, r.amount, r.rn, r.amount x
    from rns r
    where rn = 1
    union all
    select ns.product, ns.month, ns.amount, ns.rn, ns.k*(ns.amount - es.x) + es.x x
    from rns ns, res es
    where ns.rn = es.rn + 1
    and ns.product = es.product
    ) select product, month, amount, rn, round(x,3) EMA
    from res
    order by product,month
    ;

    ReplyDelete
  2. Hi Rob,

    after computing the closed form I came up with the following code that if more like an obfuscation than anything comprehensive.
    The idea is to create running multiples using a string concatenation and the xml-eval-functionality.
    The closed forms of the special cases only need running sums.


    There's a general case and two special cases that are much easier:

    General case:

    with t1 as (
    select product,month,amount, amount ci, row_number() over (partition by product order by month) rn,
    --2 / (1 + row_number() over (partition by product order by month)) ki
    0.5 ki
    from sales
    ), t2 as (
    select product,month,amount, (case when rn = 1 then 1 else ki end * ci) ai, case when rn = 1 then 1 else (1 - ki) end bi from t1
    ), t3 as (
    SELECT product,MONTH,amount,
    ai,
    xmlquery(REPLACE(wm_concat(bi) over(PARTITION BY product ORDER BY MONTH rows BETWEEN unbounded preceding AND CURRENT ROW), ',', '*') RETURNING content).getnumberval() mi
    FROM t2
    ), t4 as (
    select product, month, amount, mi, (ai / mi) xi from t3
    )
    SELECT product,MONTH,amount,
    round(mi * SUM(xi) over(PARTITION BY product ORDER BY MONTH rows BETWEEN unbounded preceding AND CURRENT ROW), 3) ema
    FROM t4;


    Special case K = 0.5:

    with t1 as (
    select product,month,amount, row_number() over (partition by product order by month) rn, amount * power(2, nvl(nullif(row_number() over (partition by product order by month) - 1, 0), 1)) ci
    from sales
    )
    select product, month, amount, round(sum(ci) over (partition by product order by month rows between unbounded preceding and current row) / power(2, rn), 3) ema
    from t1;

    Special case K = 2 / (1 + i):

    with t1 as (
    select product,month,amount, row_number() over (partition by product order by month) rn, amount * row_number() over (partition by product order by month) ci
    from sales
    )
    select product, month, amount, round(sum(ci) over (partition by product order by month rows between unbounded preceding and current row) * 2 / (rn * (rn + 1)), 3) ema
    from t1;

    I'll post the proof of the closed form if anybody is interested in it.

    ReplyDelete
  3. Hi Raymond,

    This is a great example of "fun with SQL" :-)

    A combination of XMLQuery, the undocumented wm_concat, and analytic functions with the windowing clause. I like it. Although it is not as comprehensive as the model clause variant and the Rafu's recursive with one, as you said yourself.

    And sure, I'd like to see the proof of the closed form.

    Regards,
    Rob.

    ReplyDelete
  4. I tackled another question: how to optimize the smoothing constant?

    SELECT k -- smoothing constant
    , mse -- mean square error
    FROM (
    SELECT *
    FROM sales
    MODEL
    DIMENSION
    BY ( product
    , ROW_NUMBER() OVER
    (PARTITION BY product
    ORDER BY month ASC ) rn
    )
    MEASURES
    ( amount -- sales amount
    , month -- month
    , 0 AS C
    , 0 AS P
    , 0 AS X
    , 0 AS SE -- squared error
    -- ****************************
    -- work row and attributes
    -- a) work row is product = 'X', rn = 1
    -- b) work attributes are as follows:
    , 0 AS SSE -- sum SE for all products/months
    , 0 AS MSE -- mean SSE for all products/months
    , 0 AS k -- for all products/months
    , 0 AS PreMSE -- prior k's MSE for all products/months
    , 0 AS diff -- between current MSE and prior
    , 0.1 AS delta -- initial increment
    , 0 AS prior_pt -- initial starting point
    -- ****************************
    )
    RULES ITERATE (99)
    UNTIL (abs(diff['A',1]) <= 0.00010)
    ( C[any,rn] = amount[cv(),cv()]
    , K['A',1] = prior_pt['A',1]
    + delta['A',1]
    , X[any,rn]
    ORDER BY product, rn ASC
    = COALESCE(
    K['A',1] * C[cv(),cv()]
    + (1-K['A',1]) * X[cv(),cv()-1]
    ,
    C[cv(),cv()]
    )
    , P[product,rn]
    = X[cv(),cv()-1]
    , SE[product,rn]
    = POWER(C[cv(),cv()]
    -X[cv(),cv()-1]
    ,2)
    , SSE['A',1] = SUM(SE)[any,any]
    , MSE['A',1] = SUM(SE)[any,any] / 24
    , diff['A',1]
    = CASE iteration_number
    WHEN 0 then NULL
    ELSE preMSE['A',1]
    -MSE['A',1]
    END
    , preMSE['A',1] = MSE['A',1]
    , delta['A',1]
    = CASE WHEN diff['A',1] < 0
    THEN -abs(delta['A',1]/2)
    ELSE abs(delta['A',1])
    END
    , prior_pt['A',1]
    = K['A',1]
    )
    )
    where product = 'A' and rn = 1
    /
    K MSE
    ---------- ----------
    .599999237 174.016094

    ReplyDelete