ProfileDeferred ProcessingBlog Tools Help

Blog


    August 05

    YAMC

    /*

    YAMC

     

    Yet Another Median Calculation

     

    SQL Server 2005 gives us yet another way to calculate a median. The secret is the ROW_NUMBER() function, it can be used to find the "middle" row. ROW_NUMBER() is a bit special because an alias for it cannot be used in a predicate in the same query. Either a CTE or sub-query can be used to get around this.

    The basic idea is to order the data on the value for which you need to calculate a median, then select the row whose ROW_NUMBER() is half the size of the result of the query. There is a problem though, if the number of rows is even, there is no single "middle" row.

    The options here are to pick a value between the "middle" rows or just pick one of the rows. The value between the two "middle" values is called the financial median. If the first of the two "middle" rows is used it is  called the statistical median.

    */

    -- this is a very simple example table

    -- we will use as a source of data for

    -- median calculations

    create table data

    (

    id int Primary Key,

    value

    float

    )

    -- see then end of this note for a helper

    -- script that will fill the table with

    -- some random data.

     

    /*

    Now we look at some solutions using CTE's

    */

    -- this calculates the financial median

    with

    positions

    as

    (

    select

    (1 + Count(*)) / 2 as mid,

    1-(count(*) % 2) as even

    from

    data

    ),

    rows

    as

    (

    select value,

    row_number() over (order by value) as rn

    from

    data

    )

    select

    AVG(value) from rows JOIN positions on

    rn in (positions.mid, positions.mid + positions.even)

     

    /*

    The first CTE, positions, calculates the a couple of aggregates on our sample data.

    position.mid is the number of the middle row if the number of rows is odd, otherwise it is the number of the first middle row.

    position.even is a marker that is one if the number of rows is even and zero otherwise.

    The second CTE, rows, calculates the row number of each value in our sample data when ordered by value. The ROW_NUMBER() is aliased as rows.rn. You might be tempted use this alias in a predicate of this CTE to shortcut the calculation of the median, but an alias for the ROW_NUMBER() function cannot be used in a predicate of the expression that defines it.

    The final expression selects the average value of the values from the positions CTE whose row number is either position.mid or position.mid + position.even. Note that when the number of rows in data is odd both position.mid and position.mid + position.even will be the same value.

    The result of averaging one or two rows will be the financial median for the values in the data table.

    Another way to calculate the median, which can be used in versions of SQL Server prior to 2005, is to move the sorted values into a temporary table that has an identity column. The identity column is, in effect, the same as the ROW_NUMBER(). Then, as the ROW_NUMBER based solution does, pick the "middle" row

    */

    -- temp table median calculation of median

    declare

    @positions TABLE

    (id int identity, value float)

    insert

    into @positions

    select value from data order by value

    DECLARE @mid int

    DECLARE @even int

    SELECT @mid = (1 + count(*))/2,

    @even = 1 - (count(*) % 2)

    from @positions

    select

    AVG(value) from @positions

    where id in (@mid, @mid+@even)

     

    /*

    Is there any advantage to using the ROW_NUMBER based solution? If you compare the actual execution plan of the two calculations run in a single batch, the ROW_NUMBER based one has an insignificant advantage, 48% vs. 52% of the overall plan on my test system.

    The ROW_NUMBER solution does have one nice feature though, it is a SQL expression and the temporary table solution is a batch, not an expression. You can use this fact to improve the relative performance of the ROW_NUMBER based calculation. If you know you are going to do this median calculation very often you can add an index to the value column of the data table. This, of course, is not for free, you are making a space/time tradeoff by spending space to cut down time.

    */

    -- index to speed up ROW_NUMBER

    -- based median calculation

    create

    index value_indx on data(value)

    /*

    With this index in place you will find a notable difference between the execution plans for the ROW_NUMBER and temporary table calculations, the ROW_NUMBER calculation takes up only 37% of the overall plan on my test system.

    The reason for this is that when there is no index that sorts the values the ROW_NUMBER calculation must build a temporary table itself and sort it, just like temporary table calculation does. But the temporary table solution always builds the temporary table, even if there is a index available to sort it. The ROW_NUMBER solution mearly has to walk over the index to find the middle value.

    Calculating the statistical mean, the one that always uses one of the values from the table, takes a bit more work, but not much.

    */

    -- calculation of statistical mean

    with

    positions

    as

    (

    select (1 + Count(*)) / 2 as mid,

    1-(count(*) % 2) as even

    from

    data

    ),

    rows

    as

    (

    select

    value,

    row_number() over (order by value) as rn

    from

    data

    )

    select

    TOP 1 value from rows JOIN positions on

    rn in (positions.mid, positions.mid + positions.even)

    ORDER

    BY rn

     

    /*

    For the statistical mean you must sort the results by row number and pick the first one. If you want the last one then do an DESC sort.

    This calculation isn't a real aggregate, but you can get some of the effect of an aggregate by wrapping it inside of an inline table valued user defined function. Note that because the ROW_NUMBER calculation is a SQL expression you can do this. The temporary table solution can only be wrapped into a multi-line user defined function.

    */

    -- median TVF

    -- note that this function

    -- always returns a table

    -- with a single row

    CREATE

    FUNCTION dataMedian()

    returns

    table

    return

    with

    positions

    as

    (

    select (1 + Count(*)) / 2 as mid,

    1-(count(*) % 2) as even

    from

    data

    ),

    rows

    as

    (

    select

    value,

    row_number() over (order by value) as rn

    from

    data

    )

    select

    AVG(value) as median from rows JOIN positions on

    rn in (positions.mid, positions.mid + positions.even)

     

    /*

    So if you now want to get all the values in the data table that are greater than the median you can do something like:

    */

    Select id, value from data JOIN dbo.dataMedian()

    ON

    value > median

    /*

    So that is yet another median calculation for SQL Server. In some cases it will may provide better performance than using a temporary table, but at the cost of space. It not worse that using a temporary table and gives you the option of making the space/time tradeoff whenever you feel it is necessary.

     

    */

     

    -- helper script to fill sample TABLE data

    set

    nocount on

    declare

    @index int

    set

    @index = 20001

    while

    @index > 0

    BEGIN

    Insert

    into data values (@index, RAND() * 1000)

    set

    @index = @index - 1

    END

     

    Comments (4)

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.

    To add a comment, sign in with your Windows Live ID (if you use Hotmail, Messenger, or Xbox LIVE, you have a Windows Live ID). Sign in


    Don't have a Windows Live ID? Sign up

    No namewrote:
    http://www.batteryfast.com/sony/pcga-bp2e.htm sony pcga-bp2e battery
    http://www.batteryfast.com/sony/vgp-bp2ea.htm sony vgp-bp2ea battery
    http://www.batteryfast.com/sony/pcg-gr100.htm sony pcg-gr100 battery
    http://www.batteryfast.com/sony/pcg-gr150.htm sony pcg-gr150 battery
    http://www.batteryfast.com/sony/pcg-gr250.htm sony pcg-gr250 battery
    http://www.batteryfast.com/sony/pcg-gr270.htm sony pcg-gr270 battery
    http://www.batteryfast.com/toshiba/b404.htm toshiba b404 battery
    http://www.batteryfast.com/toshiba/ba1405.htm toshiba ba1405 battery
    http://www.batteryfast.com/toshiba/pa2487.htm toshiba pa2487 battery
    http://www.batteryfast.com/toshiba/pa2487urg.htm toshiba pa2487urg battery
    http://www.batteryfast.com/toshiba/pa3107u-1brs.htm toshiba pa3107u-1brs battery
    http://www.batteryfast.com/toshiba/1800.htm toshiba 1800 battery
    http://www.batteryfast.com/toshiba/4000.htm toshiba 4000 battery
    http://www.batteryfast.com/toshiba/pa3128u.htm toshiba pa3128u battery
    http://www.batteryfast.com/toshiba/pa3191u.htm toshiba pa3191u battery
    http://www.batteryfast.com/toshiba/portege-4000.htm toshiba portege 4000 battery
    http://www.batteryfast.com/toshiba/portege-m200.htm toshiba portege m200 battery
    http://www.batteryfast.com/toshiba/portege-m205.htm toshiba portege m205 battery
    http://www.batteryfast.com/toshiba/portege-m400.htm toshiba portege m400 battery
    http://www.batteryfast.com/toshiba/pa3128u-grey.htm toshiba pa3128u grey battery
    http://www.batteryfast.com/toshiba/pa3191u-grey.htm toshiba pa3191u grey battery
    http://www.batteryfast.com/toshiba/te2000-grey.htm toshiba te2000 grey battery
    http://www.batteryfast.com/toshiba/pa3383u.htm toshiba pa3383u battery
    http://www.batteryfast.com/toshiba/pa3383.htm toshiba pa3383 battery
    http://www.batteryfast.com/toshiba/p30.htm toshiba p30 battery
    http://www.batteryfast.com/toshiba/a70.htm toshiba a70 battery
    http://www.batteryfast.com/toshiba/satellite-a70.htm toshiba satellite a70 battery
    http://www.batteryfast.com/toshiba/satellite-a75.htm toshiba satellite a75 battery
    http://www.batteryfast.com/toshiba/satellite-p30.htm toshiba satellite p30 battery
    http://www.batteryfast.com/toshiba/satellite-p35.htm toshiba satellite p35 battery
    http://www.batteryfast.com/toshiba/pa3431u.htm toshiba pa3431u battery
    http://www.batteryfast.com/toshiba/satellite-m60.htm toshiba satellite m60 battery
    http://www.batteryfast.com/toshiba/satellite-m65.htm toshiba satellite m65 battery
    http://www.batteryfast.com/uniwill/un223.htm uniwill un223 battery
    http://www.batteryfast.com/uniwill/223-3s4000-f1p1.htm uniwill 223-3s4000-f1p1 battery
    http://www.batteryfast.com/uniwill/223-3s4000-s1p1.htm uniwill 223-3s4000-s1p1 battery
    http://www.batteryfast.com/uniwill/bat-243s1.htm uniwill bat-243s1 battery
    http://www.batteryfast.com/uniwill/un243.htm uniwill un243 battery
    http://www.batteryfast.com/uniwill/un244.htm uniwill un244 battery
    http://www.batteryfast.com/uniwill/255-3s4400-g1l1.htm uniwill 255-3s4400-g1l1 battery
    http://www.batteryfast.com/uniwill/un255.htm uniwill un255 battery
    http://www.batteryfast.com/uniwill/un258.htm uniwill un258 battery
    http://www.batteryfast.com/uniwill/258-4s4400-s1p1.htm uniwill 258-4s4400-s1p1 battery
    http://www.batteryfast.com/uniwill/258-4s4400-s2m1.htm uniwill 258-4s4400-s2m1 battery
    http://www.batteryfast.com/hp/mini-1000.htm hp mini 1000 battery
    July 21
    Nov. 1
    No namewrote:

    Hi,Do you have used LCDs, second hand LCDs, used flat screens and used LCD monitors? Please go here:www.sstar-hk.com(Southern Stars).We are constantly buying re-usable LCD panels and working for LCD recycling.The re-usable panels go through strictly designed process of categorizing, checking, testing, repairing and refurbishing before they are re-used to make remanufactured LCD displays and TV sets.Due to our recent breakthrough in testing and repairing technology of LCD, we can improve the value for your LCD panels. website:www.sstar-hk.com[chcfdhadeigdfbh]

    Oct. 25
    No namewrote:

    Hi,Do you need advertising displays, digital signages, ad players, advertisement player and LCD advertisings? Please go Here:www.amberdigital.com.hk(Amberdigital).we have explored and developed the international market with professionalism. We have built a widespread marketing network, and set up a capable management team dedicated to provide beyond-expectation services to our customers. amberdigital Contact Us

    website:www.amberdigital.com.hk
    alibaba:amberdigital.en.alibaba.com[ibhdcfhdhcjch]

    Sept. 30

    Trackbacks

    The trackback URL for this entry is:
    http://dsullivan.spaces.live.com/blog/cns!F253C6CDC8EC1EAC!115.trak
    Weblogs that reference this entry
    • None