Wednesday, March 7, 2012

performance of bulk insert

Hi,
I have a database that holds an information about customers, each customer's data is holded in many referenced tables. In one stored procedure i should insert/update many customers. I have 2 options: bulk insert of all customers data in one transaction or insert each customer in separate transaction. My question is: what approach is considered to be correct, and gives better performance assuming that the number of inserted/updated customers is very large? Thank youHow many are you looking at to insert or update?

If it is a 2-5 thousand at a time, our dba lets us do that. If it gets larger then that, she usually likes us to do a cursor for the update or insert. Cursors will not lock the tables as much so it wouldn't hurt performance as much for the other processes running on the db for the larger updates. If you are running the process off hours and you do not have to worry about processes as much, you can use bulk insert and updates for larger number of rows.|||My question is: what approach is considered to be correct, and gives better performance assuming that the number of inserted/updated customers is very large?
The answer is ... it depends.

More properly, since you have identified the customer data exists in several tables (which infers relationships), you probably need to enforce ACID properties, which means transactional consistency for each customer insertion / update.

Now the question becomes ... how much locking can you, your applications, and your users live with. You stated that "... the number of inserted/updated customers is very large ...". How large is very large? How high is up?

If you use one transaction for each customer, you only retain the lock for the brief time needed to retain data and row consistency across ? tables. You could easily identify single customer datum failures, and write or flag the failure for follow-up. Your process will probably in row level locking (unless you start splitting data and / or index pages), however, if very large is on the order of millions of entries and several tables, your insert / update could take hours and lots of system resources to complete.

If you try to batch them all together, you find that tracking a single datum failure becomes much more difficult, you will probably end up with page and table locking (which **will** affect application performance), and possibly a massive transaction log.

Smaller batch processing (say 100 or 500 or 1000 or some other number) puts you between these two extremes but with still the same concerns.

So ... it depends. Test and find out what is best for your situation. But remember, maintain consistency, or the data becomes unreliable, as does your reputation!

No comments:

Post a Comment