Top 10 steps to optimize data access in SQL Server. Part II
Introduction Remember we ware in a mission? Our mission was to optimize the performance of an SQL Server database. We had an application that was built on top of that database. The application was working pretty fine while tested, but, soo
Introduction
Remember we ware in a mission? Our mission was to optimize the performance of an SQL Server database. We had an application that was built on top of that database. The application was working pretty fine while tested, but, soon after deployment at production, it started to perform slowly as the data volume was increased in the database. Within a very few months, the application started performing so slowly that, the poor developers (including me) had to start this mission to optimize the database and thus, optimize the application.
Please have a look at the previous article to know how it started and what did we do to start the optimization process.
Top 10 steps to optimize data access in SQL Server. part I (Use Indexing)
Well, in the first 3 steps (Discussed in the previous article), we had implemented indexing in our database. That was because; we had to do something that improves the database performance in a quick amount of time, with a least amount of effort. But, what if we wrote the data access codes in an inefficient way? What if we wrote the TSQLs poorly?
Applying indexing will obviously improve the data access performance, but, at the most basic level in any data access optimization process, you have to make sure that you have written your data access codes and TSQLs in the most efficient manner, applying the best practices.
So, in this article, we are going to focus on writing or refactoring the data access codes using the best practices. But, before we start playing the game, we need to prepare the ground first. So let’s do the groundwork at this very next step:
Step4: Move TSQL codes from application into the database server
I know you may not like this suggestion at all. You might have used an ORM that does generate all the SQLs for you on the fly. Or, you or your team might have a “principle” of keeping SQLs in your application codes (In the Data access layer methods). But, still, if you need to optimize the data access performance, or, if you need to troubleshoot a performance problem in your application, I would suggest you to move your SQL codes into your database server (Using Stored procedure, Views, Functions and Triggers) from your application. Why? Well, I do have some strong reasons for this recommendation:
- Moving the SQLs from application and implementing these using stored procedures/Views/Functions/Triggers will enable you to eliminate any duplicate SQLs in your application. This will also ensure re-usability of your TSQL codes.
- Implementing all TSQLs using the database objects will enable you to analyze the TSQLs more easily to find possible inefficient codes that are responsible for slow performance. Also, this will let you manage your TSQL codes from a central point. 3. Implementing all TSQLs using the database objects will also enable the database server to re-use the query execution plans most of the times, because, the database objects are pre-compiled while they are created.
- Doing this will also enable you to re-factor your TSQL codes to take advantage of some advanced indexing techniques (going to be discussed in later parts in this series of articles). Also, this will help you to write more “Set based” SQLs along with eliminating any “Procedural” SQLs that you might have already written in your application.
Despite the fact that indexing (In Step1 to Step3) will let you troubleshoot the performance problems in your application in a quick time (if properly done), following this step 4 might not give you a real performance boost instantly. But, this will mainly enable you to perform other subsequent optimization steps and apply different other techniques easily to further optimize your data access routines.
If you have used an ORM (Say, NHibernate) to implement the data access routines in your application, you might find your application performing quite well in your development and test environment. But, if you face performance problem in a production system where lots of transactions take place each second, and where too many concurrent database connections are there, in order to optimize your application’s performance you might have to re-think with your ORM based data access logics. It is possible to optimize an ORM based data access routines, but, it is always true that if you implement your data access routines using the TSQL objects in your database, you have the maximum opportunity to optimize your database.
If you have come this far while trying to optimize your application’s data access performance, come on, convince your management and purchase some time to implement a TSQL object based data operational logic. I can promise you, spending one or two man-month doing this might save you a man-year in the long run!
OK, let’s assume that you have implemented your data operational routines using the TSQL objects in your database. So, having done this step, you are done with the “ground work” and ready to start playing. So, let’s move towards the most important step in our optimization adventure. We are going to re-factor our data access codes and apply the best practices.
Step5: Identify inefficient TSQLs, re-factor and apply best practices
No matter how good indexing you apply in your database, if you use poorly written data retrieval/access logic, you are bound to get slow performance.
We all want to write good codes, don’t we? While we write data access routines for a particular requirement, we really have lots of options to follow for implementing particular data access routines (And application’s business logics). But, most of the cases, we have to work in a team with members of different calibers, experience and ideologies. So, while at development, there are strong chances that our team members may write codes in different ways and some of them miss following the best practices. While writing codes, we all want to “get the job done” first (Most of the cases). But, while our codes run in production, we start to see the problems.
Time to re-factor those codes now. Time to implement the best practices in your codes.
I do have some SQL best practices for you that you can follow. But, I am sure that you already know most of them. Problem is, in reality, you just don’t implement these good stuffs in your code (Of course, you always have some good reasons for not doing so). But what happens, at the end of the day, your code runs slowly and your client becomes unhappy.
So, knowing the best practices is not enough at all. The most important part is, you have to make sure that you follow the best practices while writing TSQLs. This is the most important thing.
Some TSQL Best practices
Don’t use “SELECT*" in SQL Query
- Unnecessary columns may get fetched that adds expense to the data retrieval time.
- The Database engine cannot utilize the benefit of “Covered Index” (Discussed in the previous article), hence, query performs slowly.
Avoid unnecessary columns in SELECT list and unnecessary tables in join conditions
- Selecting unnecessary columns in select query adds overhead to the actual query, specially if the unnecessary columns are of LOB types.
- Including unnecessary tables in the join conditions forces the database engine to retrieve and fetch unnecessary data that and increase the query execution time.
Use EXISTS and IN instead of NOT EXISTS and NOT IN (If possible)
- For EXISTS and IN, SQL Server can return TRUE as soon as a single row matches. For the negated expressions, it must examine all values to determine that there are not matches (That results in Full Table Scan)
- The NOT EXISTS based codes can be easily re-arranged as follows:
Collapse
IF NOT EXISTS (SELECT * FROM table WHERE...) BEGIN /* Statement Group 1 */ END ELSE BEGIN /* Statement Group 2 */ END
It can be rewritten as:
Collapse
IF EXISTS (SELECT * FROM table WHERE...) BEGIN /* Statement Group 2 */ END ELSE BEGIN /* Statement Group 1 */ END
- Do not use the COUNT() aggregate in a subquery to do an existence check:
Do not use
Collapse
SELECT column_list FROM table WHERE 0 SELECT