SQL and BI concepts; MS SQL SERVER: March 2016

Thursday, March 31, 2016

Performance Tuning of SQL SERVER

Performance Tuning of SQL SERVER

Best performance is the main concern to develop a successful application. Like a coin database is the tail side (back-end) of an application. A good database design provides best performance during data manipulation which results into the best performance of an application.

During database designing and data manipulation we should consider the following key points:

1. Choose Appropriate Data Type

Choose appropriate SQL Data Type to store your data since it also helps in to improve the query performance. Example: To store strings use varchar in place of text data type since varchar performs better than text. Use text data type, whenever you required storing of large text data (more than 8000 characters). Up to 8000 characters data you can store in varchar.

2. Avoid nchar and nvarchar

Practice to avoid nchar and nvarchar data type since both the data types takes just double memory as char and varchar. Use nchar and nvarchar when you required to store Unicode (16-bit characters) data like as Hindi, Chinese characters etc.

3. Avoid NULL in fixed-length field

Practice to avoid the insertion of NULL values in the fixed-length (char) field. Since, NULL takes the same space as desired input value for that field. In case of requirement of NULL, use variable-length (varchar) field that takes less space for NULL.

4. Avoid * in SELECT statement

Practice to avoid * in Select statement since SQL Server converts the * to columns name before query execution. One more thing, instead of querying all columns by using * in select statement, give the name of columns which you required.

1. -- Avoid

2. SELECT * FROM tblName

3. --Best practice

4. SELECT col1,col2,col3 FROM tblName

5. Use EXISTS instead of IN

Practice to use EXISTS to check existence instead of IN since EXISTS is faster than IN.

1. -- Avoid
2. SELECT Name, Price FROM tblProduct
3. where ProductID IN (Select distinct ProductID from tblOrder)
4. --Best practice
5. SELECT Name,Price FROM tblProduct
6. where ProductID EXISTS (Select distinct ProductID from tblOrder)

6. Avoid Having Clause

Practice to avoid Having Clause since it acts as filter over selected rows. Having clause is required if you further wish to filter the result of an aggregations. Don't use HAVING clause for any other purpose.

7. Create Clustered and Non-Clustered Indexes

Practice to create clustered and non-clustered index since indexes helps in to access data fastly. But be careful, more indexes on a tables will slow the INSERT, UPDATE, DELETE operations. Hence try to keep small no of indexes on a table.

8. Keep clustered index small

Practice to keep clustered index as much as possible since the fields used in clustered index may also used in nonclustered index and data in the database is also stored in the order of clustered index. Hence a large clustered index on a table with a large number of rows increase the size significantly.

9. Avoid Cursors

Practice to avoid cursor since cursor are very slow in performance. Always try to use SQL Server cursor alternative.

10. Use Table variable inplace of Temp table

Practice to use Table varible in place of Temp table since Temp table resides in the TempDb database. Hence use of Temp tables required interaction with TempDb database that is a little bit time taking task.

11. Use UNION ALL inplace of UNION

Practice to use UNION ALL in place of UNION since it is faster than UNION as it doesn't sort the result set for distinguished values.

12. Use Schema name before SQL objects name

Practice to use schema name before SQL object name followed by "." since it helps the SQL Server for finding that object in a specific schema. As a result performance is best.

1. --Here dbo is schema name

2. SELECT col1,col2 from dbo.tblName

3. -- Avoid

4. SELECT col1,col2 from tblName

13. Keep Transaction small

Practice to keep transaction as small as possible since transaction lock the processing tables data during its life. Sometimes long transaction may results into deadlocks.

14. SET NOCOUNT ON

Practice to set NOCOUNT ON since SQL Server returns number of rows effected by SELECT, INSERT, UPDATE and DELETE statement. We can stop this by setting NOCOUNT ON like as:

1. CREATE PROCEDURE dbo.MyTestProc

2. AS

3. SET NOCOUNT ON

4. BEGIN

5. .

6. .

7. END

15. Use TRY-Catch

Practice to use TRY-CATCH for handling errors in T-SQL statements. Sometimes an error in a running transaction may cause deadlock if you have no handle error by using TRY-CATCH.

16. Use Stored Procedure for frequently used data and more complex queries

Practice to create stored procedure for quaery that is required to access data frequently. We also created stored procedure for resolving more complex task.

17. Avoid prefix "sp_" with user defined stored procedure name

Practice to avoid prefix "sp_" with user defined stored procedure name since system defined stored procedure name starts with prefix "sp_". Hence SQL server first search the user defined procedure in the master database and after that in the current session database. This is time consuming and may give unexcepted result if system defined stored procedure have the same name as your defined procedure.

Tuesday, March 29, 2016

What is a subquery, Nested Subqueries&Correlated Subqueries?

What is a subquery?

A subquery is a SELECT statement with another SQL statement, like in the example below:

select *
from employee
where id in
(select employee_id
from employee
where id = 15);

Subqueries are further classified as either a correlated sub query or a nested sub query

Nested Subqueries
Nested sub-queries are sub queries that don't rely on an outer query. In other words, both queries in a nested sub query may be run as separate queries.
This type of sub query could be used almost everywhere

Select *
from employee
where city in (select city
from employer);

The example subquery returns all clients that are from the same city as the product providers.
The IN operator checks if the value is within the table and retrieves the matching rows.

Correlated Subqueries
Subqueries are correlated when the inner and outer queries are interdependent, that is, when the outer query is a query that contains a subquery and the subquery itself is an inner query. Users that know programming concepts may compare it to a nested loop structure.

Let's start with a simple example. The inner query calculates the average value and returns it. In the outer query’s where clause, we filter only those purchases which have a value greater than the inner query’s returned value.

select id
from employee E1
where joiningdate > '2016-01-15'
and salary > (select avg(salary)
from employee E2
where E1.date = E2.date)

Wednesday, March 16, 2016

What is the use of COALESCE in SQL Server

Use of COALESCE in SQL Server:

Let us understand this by taking an example

create table candidate
( id int identity ,primaryemail varchar(20),secondaryemail varchar(20))

insert into candidate values ('avi@gmail.com',null)
insert into candidate values ('siva@gmail.com',null)
insert into candidate values ('ravii@gmail.com',null)
insert into candidate values ('bavi@gmail.com',null)
insert into candidate values (null,'chiull@gmail.com')
insert into candidate values (null,'boht@gmai.com')
insert into candidate values ('pandu@gmail.com','kaya@gmail.com')

QUERY:
select coalesce(primaryemail,secondaryemail) from candidate

Here in this example, the Candidate table is shown to include three columns with information about a Candidate:

1. id
2. PrimaryEmail
3. SecondaryEmail

COALESCE in the SELECT statement , selects the PrimaryEmail if it is not null. If the PrimaryEmail is null then SecondaryEmail will be selected. If both PrimaryEmail and SecondaryEmail is present then only PrimaryEmail is selected. So, COALESCE returns the first nonnull column among the list of columns passed. If both PrimaryEmail and SecondaryEmail is NULL, COALESCE returns NULL.

Differenece between isnull and coalesce ?

Coalesce is use for more than two columns where as Isnull works only for two column. Also you can use coalesce function in joining to get first not null value from more than two different table.

Reference:
http://venkatsqlinterview.blogspot.in/

Monday, March 14, 2016

How to find the Nth highest salary

To find the Nth highest salary

Step 1: Create a schema of a table named "Employee" in your database as in the following:

create table Employee

(

ID int identity,

Name varchar(20),

Salary money,

Department varchar(20)

)

Step 2:Insert some values in the table according to the column behaviour like Name and Data Type.

Insert into Employee(Name,Salary,Department)values('Avi',20000,'Finance')

Insert into Employee(Name,Salary,Department)values('Bhar',10000,'Finance')

Insert into Employee(Name,Salary,Department)values('Chait',28000,'IT')

Insert into Employee(Name,Salary,Department)values('Dhar',15000,'Finance')

Insert into Employee(Name,Salary,Department)values('Einstein',39000,'Finance')

Insert into Employee(Name,Salary,Department)values('Fedarer',12000,'Finance')

Insert into Employee(Name,Salary,Department)values('Giri',40000,'IT')

Insert into Employee(Name,Salary,Department)values('Hari',32000,'Finance')

Insert into Employee(Name,Salary,Department)values('Irfa',56000,'IT')

Insert into Employee(Name,Salary,Department)values('Janaki',29000,'Finance')

Select * from Employee

To find the 3 rd highest salary

Query : By using Top

SELECT TOP 1 salary

FROM (

SELECT DISTINCT TOP 3 salary

FROM employee

ORDER BY salary DESC

) a

ORDER BY salary

method 2 : By Using MIN

SELECT MIN(Salary)

FROM employee

WHERE Salary IN(SELECT TOP 3 Salary FROM employee ORDER BY Salary DESC)

Method 3 BY using Dense rank

SELECT Salary,Name

FROM

(

SELECT Salary,Name,DENSE_RANK() OVER(ORDER BY Salary DESC) Rno from EMPLOYEE

) tbl

WHERE Rno=3

By using above examples you can find any nth highest salary .You can get any highest value by replace 3 with your value .

Thursday, March 10, 2016

WHAT IS VIEW? TYPES OF VIEWS?

WHAT IS VIEW?

View is nothing but parsed SQL statement which fetches record at the time of execution.

TYPES OF VIEWS?

There are mainly two type of views

a) Simple View

b) Complex View

apart from that we can also subdivided views as UpdatableViews and Read only Views.

Lastly there is an another view named as Materialized Views.

PURPOSE OF VIEWS

View is used for the purposes as stated below:

a) Security

b) Faster Response

c) Complex Query solve

Syntax is :

Create or replace view([olumn1],[column2]...)

Select column1,column2...

from table_name

[where condition]

[with read only],[with check option]

What is Materialized View ?

Materialized views are also logical view of our data driven by select query but the result of the query will get stored in the table or disk, also definition of the query will also store in the database .When we see the performance of Materialized view it is better than normal View because the data of materialized view will stored in table and table may be indexed so faster for joining also joining is done at the time of materialized views refresh time so no need to every time fire join statement as in case of view.

Difference between View vs Materialized View in database

Based upon on our understanding of View and Materialized View, Let’s see, some short difference between them :

1) First difference between View and materialized view is that, In Views query result is not stored in the disk or database but Materialized view allow to store query result in disk or table.

2) Another difference between View vs materialized view is that, when we create view using any table, row id of view is same as original table but in case of Materialized view row id is different.

3) One more difference between View and materialized view in database is that, In case of View we always get latest data but in case of Materialized view we need to refresh the view for getting latest data.

4) Performance of View is less than Materialized view.

5) This is continuation of first difference between View and Materialized View, In case of view its only the logical view of table no separate copy of table but in case of Materialized view we get physically separate copy of table

6) Last difference between View vs Materialized View is that, In case of Materialized view we need extra trigger or some automatic method so that we can keep MV refreshed, this is not required for views in database.

Friday, March 4, 2016

Import multiple Excel files data into SQL Server Destination (SSIS)

How Import multiple Excel files data into SQL Server Destination in SSIS?

solution:

Take a package scope variable

Name : Excelsourcepath
Datatype : string
Value :C:\Avi\msbiinput\emp1.xslx

Go to Control flow and place a for each loop container

Double click and set the properties like below

collection page :

Enumerator : For each file enumerator
Folder :C:\Avi\msbiinput
Files : *.xlsx

check the button Fully qualified

Go to Variable mapping Page

Variable

Set User : Excelsourcepath

Place dataflow task in the container :

Goto Dataflow tab

---> Place Excel source adapter and configure it

Go to column page

To check which adapter using which connection manager

Just right click on adapter
and select the show advanced editor

Then right click on the Excelconnectionmangers --> Properties

Property
Connection String

Expression

"Provider =Microsoft.OLEDB.12.0
data source ="+@[User:Excelpath]+";
Extended Properties =\"Excel12.0;HDR=yes\";"

Place a DATA Conversion

and configure it

We need data Conversion transformation to convert the data types when we are sending the data from excel to SQL.

place a SQL Server Destination

Configure it as same above dataflow task

Then Press F5 to run the package .

Thursday, March 3, 2016

Step by step Procedure to Create a directory on Local System (use of File System Task) -SSIS part 2

How to create a directory on Local System?(use of File System Task)

Go to control flow section
---> Place a file system task and double click on it to configure properties .
In General tab
you will see Destination directory options

Use directory if exists :True

True means if directory already exists with the same name task will use same directory without fail.
False means if directory already exists with the same name task will be failed .

-->Set name and description of the task.

In Operation Tab select

Operation : Create Directory

Source Connection:

Issourcepathvariable : false

false means we are not using variables in the source connection.

True means we are using the variables to represent source connection.

Source connection : <new connection >

usage type : create folder
folder : Path

and press ok

Now connection has been created in the connection managers section.

Wednesday, March 2, 2016

Introduction to SQL Server Integration Services Concepts

SQL Server Integration Services Concepts

Package :
Package Means collection of tasks.Generally Package contains tasks, variables,connection managers,log providers,precedence constraints,configurations etc.

Control Flow:
Order of tasks execution.
control flow contains containers which provide structures to the package ,tasks which defines functionality of the package.

Data Flow:
Data flow means Extracting data from source systems, performing transformation and loading the data in to the target systems.

Developing a SSIS Package :

Three ways to develop ssis packages

SSIS Wizard
SSIS Designer
Custom Applications

SSIS Wizard :

Wizard allows to develop simple packages to import or export data between one data source to one destination.

Wizard doesn't allows any transformation.

Limitations of SSIS Wizard:

Wizard only allows to develop a simple package with only one dataflow task.
That dataflow can have only one source , one transformation and no transformations.
Wizard doesn't allows to develop a package with pipeline of tasks .

Usage of Wizard:

We can import or export data between one data source to one destination without any transformation.

SSIS Designer:

SSIS designer allows to develop simple and complex packages with workflow of tasks.

To develop a package we have to use

SSIS Designer
Tool Box
properties window
Solution explorer

The primary sections of ssis designer are

Control Flow:
Order of tasks execution.
control flow contains containers which provide structures to the package ,tasks which defines functionality of the package.

Data Flow:
Data flow means Extracting data from source systems, performing transformation and loading the data in to the target systems.
dataflow pipeline designed with source adapters ,transformations,destination adapters

Event Handlers :
Used to implement error handling ,debugging and monitoring of Packages .

Package Explorer:
Used to browse the elements of the package .

Connection Mangers :
Used to Create and manage the Connections at Package level.

Temporary Objects in SQL Server

CTE

CTE stands for Common Table expressions. It was introduced with SQL Server 2005. It is a temporary result set and typically it may be a result of complex sub-query. Unlike temporary table its life is limited to the current query. It is defined by using WITH statement. CTE improves readability and ease in maintenance of complex queries and sub-queries. Always begin CTE with semicolon.

A sub query without CTE is given below :

Select * from (

Select a.addrs, e.emp,e.age from address a inner join employee e on e.eid=a.eid) temp

Where temp.age>30

Order by temp.name

By using CTE above query can be re-written as follows :

With cte (address,name,age)

As (

Select a.addrs, e.emp,e.age from address a inner join employee e on e.eid=a.eid )

Select * from cte

Where cte.age>30

Order by cte.name

When to use CTE

1. This is used to store result of a complex sub query for further use.

2. This is also used to create a recursive query.

Temporary Tables

In SQL Server, temporary tables are created at run-time and you can do all the operations which you can do on a normal table. These tables are created inside Tempdb database. Based on the scope and behavior temporary tables are of two types as given below-

1. Local Temp Table

Local temp tables are only available to the SQL Server session or connection (means single user) that created the tables. These are automatically deleted when the session that created the tables has been closed. Local temporary table name is stared with single hash ("#") sign.

CREATE TABLE #LocalTemp

(

UserID int,

Name varchar(50),

Address varchar(150)

)

insert into #LocalTemp values ( 101, 'avinash','bangalore');

Select * from #LocalTemp

The scope of Local temp table exist to the current session of current user means to the current query window. If you will close the current query window or open a new query window and will try to find above created temp table, it will give you the error.

2. Global Temp Table

Global temp tables are available to all SQL Server sessions or connections (means all the user). These can be created by any SQL Server connection user and these are automatically deleted when all the SQL Server connections have been closed. Global temporary table name is stared with double hash ("##") sign.

CREATE TABLE ##GlobalTemp

(

UserID int,

Name varchar(50),

Address varchar(150)

)

insert into ##GlobalTemp values ( 101, 'avinash','bangalore');

Select * from #GlobalTemp

Global temporary tables are visible to all SQL Server connections while Local temporary tables are visible to only current SQL Server connection.

Table Variable

This acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory. This also allows you to create primary key, identity at the time of Table variable declaration but not non-clustered index.