Showing posts with label customer. Show all posts
Showing posts with label customer. Show all posts

Tuesday, March 20, 2012

Performance problem ?

Hi all,

I have a table "Customer"

And field names call ForeName varchar(30),Surname varchar(30)

When i select

SELECT c.SURNAME + ', ' + c.FORENAME as ID, c.SURNAME + ', ' + c.FORENAME as label

FROM Customer c

where c.SURNAME + ', ' + c.FORENAME='Perera, N'

It will give me the result writhing a second.

Now the problem is in our company we have moved to a different database table structure from last week.

now my "Customer" table has got a same field name but different data type like bellow.

ForeName nvarchar(Max),Surname nvarchar(Max)

So when I give the same select sql statement like previously mentioned, it takes more than 8 seconds time?

How come this happens? Because we have the same amount of data from old database structure and new structure?

Any Idea ?

Regards

sujithf

If you use nvarchar(max) field you couldn't index it. May be in old structure you had indexes.|||

Thaks for your response ,

How do i know wheater my table field indexed or not ,

i have no idea abt indexing ?

can you please explain it bit more ?

i am bit new to sql server 2005

regards

sujithf

|||

You could review indexes using following sp:

sp_helpindex '<your table name>'

But if you already have nvarchar(max) fields, if don't have index, because:

Columns that are of the large object (LOB) data types ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be specified as key columns for an index. Also, a view definition cannot include ntext, text, or image columns, even if they are not referenced in the CREATE INDEX statement.

Also I think, that it's bad idea to use nvarchar(max) as field data type for Surname, Forename. Is it really possible, that surname or forename could be greater than 4000 characters? nvarchar(max) data type designed for very large amount of data and stored outside of table.

|||

Hi.

many thanks for your valuable comments ,

i checked using your command, it says object does not have any indexes,

Yes i am also agree with you,because its wasting space by specifying Nvarchar(max) for surname+forename

but our company hired a third party company to migrate from old database structure to new structure,

i think we have to contact third party company for this matter.

so do you think , this sql query 7 seconds times slow in a new structure because of the datatype nvarchar(max) ?

what is your sugestions ?

regards

sujith

|||

I don't have large table, but It make test on table Person.Contact from AdventureWorks sample Database. This table has ~20,000 rows. I ran following query:

select FirstName+', '+MiddleName from Person.Contact where FirstName+', '+MiddleName = 'Frances, B.'

If both fields are nvarchar(50) without indexes, query time ~0,06s

If both feidsl are nvarchar(max), query time ~0,2s

So, i think, we have negative influence of max data type.

Also, I think, that better use following query:

select FirstName+', '+MiddleName from Person.Contact where FirstName='Frances' and MiddleName = 'B.'

and create indexes

|||

thank you so much for your effort to sort out my problem,

and finally can you provide me a example, that how to index the table ?

or can you provide me a articles to learn how to index tables and field ?

cos i really like to learn new stuff in sql server 2005

thank you again

sujithf

|||

CREATE INDEX article: http://msdn2.microsoft.com/en-us/library/ms188783.aspx. At the end of this article, you could find good examples

I suggest you to download latest version of BOL: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

|||

I think the problem is that if you are using max then by default the value is not stored on the data page but rather a 16-byte root pointer instead. To compare to the value it has to follow this pointer and load the text itself (more IO therefore slower).

It is possible to change this by using sp_tableoption to set the "large value types out of row" option to 0 but then all the values can only add up to a size of 8000 bytes (minus I would suspect the other data in the row). Sort of destroys the reason for not just using the smaller 1 to 4000 value (4000 nvarchar is 8000 bytes).

The problem with the max data types is that they are a new feature and everyone wants to use them without understanding the impact. They are really suposed to store large data that you do not generally expect to directly search (documents, free form commentary, images). That is what Full Text Search and the new xml indexes are for (to build search indices on some of the data types stored in this format).

As mentioned earlier there is no reason for expanding the fields in the way they have unless they can argue that people are going to have names of that length. I am particularly intrigued by the fact that you seem to only be storing the first initial in this enourmous first name field.

I would probably also be tempted to compare the fields individually rather than concatonating them if you are sure the break will always be (otherwise you have comma insertions issues anyway).

|||

Thank you both of you Dhericean + Konstantin Kosinsky

for your valuable time to consider my problem.

regards

sujithf

Wednesday, March 7, 2012

performance of bulk insert

Hi,
I have a database that holds an information about customers, each customer's data is holded in many referenced tables. In one stored procedure i should insert/update many customers. I have 2 options: bulk insert of all customers data in one transaction or insert each customer in separate transaction. My question is: what approach is considered to be correct, and gives better performance assuming that the number of inserted/updated customers is very large? Thank youHow many are you looking at to insert or update?

If it is a 2-5 thousand at a time, our dba lets us do that. If it gets larger then that, she usually likes us to do a cursor for the update or insert. Cursors will not lock the tables as much so it wouldn't hurt performance as much for the other processes running on the db for the larger updates. If you are running the process off hours and you do not have to worry about processes as much, you can use bulk insert and updates for larger number of rows.|||My question is: what approach is considered to be correct, and gives better performance assuming that the number of inserted/updated customers is very large?
The answer is ... it depends.

More properly, since you have identified the customer data exists in several tables (which infers relationships), you probably need to enforce ACID properties, which means transactional consistency for each customer insertion / update.

Now the question becomes ... how much locking can you, your applications, and your users live with. You stated that "... the number of inserted/updated customers is very large ...". How large is very large? How high is up?

If you use one transaction for each customer, you only retain the lock for the brief time needed to retain data and row consistency across ? tables. You could easily identify single customer datum failures, and write or flag the failure for follow-up. Your process will probably in row level locking (unless you start splitting data and / or index pages), however, if very large is on the order of millions of entries and several tables, your insert / update could take hours and lots of system resources to complete.

If you try to batch them all together, you find that tracking a single datum failure becomes much more difficult, you will probably end up with page and table locking (which **will** affect application performance), and possibly a massive transaction log.

Smaller batch processing (say 100 or 500 or 1000 or some other number) puts you between these two extremes but with still the same concerns.

So ... it depends. Test and find out what is best for your situation. But remember, maintain consistency, or the data becomes unreliable, as does your reputation!