I took a cursor-based solution and took a stab at writing the equivalent to it as a set-based procedure. I then timed each process using Query Profiler by executing each 20 consecutive times in Query Analyzer (i.e. EXECUTE sp_CalculateRewards and EXECUTE sp_CalculateRewards2) With respect to data size and row counts, of the approximately 12,000 rows in the customer table, only 85 records meet the criteria for processing.
Perhaps my set-based solution is way off, but the cursor based solution consistently ran in the 500ms range (avg 525ms), while my set-based one ran (progressively better) in cycles of 4000ms, 2000ms, 600ms, 4000ms...etc... repeating this pattern over the 20 repeated executions.
My two questions are this. First, does my set-based approach seem reasonably correct or is there a better way. Second, has anyone seen the same anomaly with respect to the timings and is there a reason for it?
Thanks for your input!
The cursor based solution: delete all records from reporting table...open cursor for those customers from the customer table with an enrollment date...calculate sales for prior 365 days from the enrollment date and then year to date sales since enrollment date...build an INSERT statement and EXECUTE the statement thus inserting the customer_id, customer_name, and respective sales figures back into the reporting table...and continue for each customer_id in the cursor resultset.
The set-based version:
DELETE FROM RewardsTable
INSERT RewardsTable (customer_id, customer_name, enrollment_date, sales_base, ytd_sales, percent_to_goal) SELECT customer_id, customer_name, cert_date, 0, 0, 0) FROM Customer WHERE ISDATE(cert_date) = 1
SELECT rt.customer_id CASE WHEN sum(inv.invoice_amount) IS NULL THEN 0 ELSE sum(inv.invoice_amount) END [sales_base] INTO #rt_sales_base FROM RewardsTable rt LEFT OUTER JOIN invoice_view inv WITH (NOLOCK) ON rt.customer_id = inv.customer_id WHERE inv.invoice_date >= rt.enrollment_date - 365 AND inv.invoice_date <= rt.enrollment_date
SELECT rt.customer_id CASE WHEN sum(inv.invoice_amount) IS NULL THEN 0 ELSE sum(inv.invoice_amount) END [ytd_sales] INTO #rt_ytd_sales FROM RewardsTable rt LEFT OUTER JOIN invoice_view inv WITH (NOLOCK) ON rt.customer_id = inv.customer_id WHERE inv.invoice_date >= rt.enrollment_date AND inv.invoice_date <= rt.enrollment_date + 365
UPDATE RewardsTable SET sales_base = tmp.sales_base FROM RewardsTable INNER JOIN #rt_sales_base tmp ON RewardsTable.customer_id = tmp.customer_id
UPDATE RewardsTable SET ytd_sales = tmp.ytd_sales FROM RewardsTable INNER JOIN #rt_ytd_sales tmp ON RewardsTable.customer_id = tmp.customer_id
DROP TABLE #rt_sales_base
DROP TABLE #rt_ytd_salesI did some additional testing. I populated an enrollement_date in for all customers such that all ~12000 customer records would be processed, and notice that the CURSOR-based stored procedure was no longer faster than the set-based one. So I now have a third question. At some point the dataset became large enough that the optimations chosed on the execution path passed up the cursor based solution. Is it fair to say that (if your not worried about running in a non-MS SQL environment) the CURSORS on smaller sets of data hold a performance advantage?
No comments:
Post a Comment