SQL and BI concepts; MS SQL SERVER: 2016

Tuesday, July 19, 2016

SSIS Interview Question and Answers Part 2

1. Logging. Different types of Logging files
Logging is used to log the information during the execution of package.
A log provider can be a text file, the SQL Server Profiler, a SQL Server relational database, a Windows event log, or an XML file. If necessary, you can define a custom log provider (e.g., a proprietary file format).

2. How to debug a package

For debugging a package, we have 3 options:
-Setting breakpoints in a package, container or Task
-Using Data Viewer in the Data flow task
-Setting Error redirection in the Data Flow task

3. What is the LoggingMode property?
SSIS packages, tasks and containers have a property called LoggingMode. This property accepts 3 possible values:
Enabled - to enable logging of that component
Disabled - to disable logging of that component
UseParentSetting - to use parent's setting of that component to decide whether or not to log the data.

4. How to configure Error Output in SSIS
We have 3 options to configure error output in ssis.
1. Ignore Failure
2. Redirect Row
3. Fail Component

For Example:

I have comma-separated value in a flat file with two columns (code, Name). Code is an integer value and name is a varchar(20) data type configured in the flat file connection manager. Some of the codes in the flat files are characters. So, flat file reader component will fail reading the character value. But, I want to redirect the error data to separate table.

5. What is Ignore Failure option in SSIS?

In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data(wrong type of data ) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.

6. Incremental Load in SSIS

Using Slowly Changing Dimension
Using Lookup and Cache Transformation

7. Difference between Synchronous and Asynchronous Transformation

Synchronous T/F process the input rows and passes them onto the data flow one row at a time.

When the output buffer of Transformation created a new buffer, then it is Asynchronous transformation. Output buffer or output rows are not sync with input buffer.

8. What are Row Transformations, Partially Blocking Transformation, Fully Blocking Transformation with examples.

In Row Transformation, each value is manipulated individually. In this transformation, the buffers can be re-used for other purposes like following:

OLEDB Datasource, OLEDB Data Destinations

Other Row transformation within the package, Other partially blocking transformations within the package.

examples of Row Transformations: Copy Column, Audit, Character Map

Partially Blocking Transformation:

These can re-use the buffer space allocated for available Row transformation and get new buffer space allocated exclusively for Transformation.

examples: Merge, Conditional Split, Multicast, Lookup, Import, Export Column

Fully Blocking Transformation:

It will make use of their own reserve buffer and will not share buffer space from other transformation or connection manager.

examples: Sort, Aggregate, Cache Transformation

9. Difference between FileSystem and Sql server Deployment

File System Deployment: We can save the package on a physical location on hard drive or any shared folder with this option, and we should provide a full qualified path to stored package in the FileSystem option.
Sql Server Deployment: SSIS packages will be stored in the sysssispackages table of MSDB Database.

10. Difference between Lookup and Fuzzy Lookup transformation

Lookup Transformation finds the exact match.
Fuzzy Lookup transformation matches input table with reference table. It finds the closest match and indicates the quality of the match.

Thursday, July 14, 2016

SSIS Interview Question and Answers Part 1

1.In BIDS, the SSIS project contain 10 packages. But How to deploy only 5 packages in Destination machine even though Manifest file contains all 10 packages after the Build?
-Open the Manifest file in any editor like BIDS or Notepad, keep the required 5 packages, remove remaining 5 packages.
-Save and Close the Manifest file.
-Double click on Manifest file to deploy the required 5 package.

2.What are the different types of Transformations you have worked

AGGEGATE -The Aggregate transformation applies aggregate functions to column values and copies the results to the transformation output. Besides aggregate functions, the transformation provides the GROUP BY clause, which you can use to specify groups to aggregate across.
The Aggregate Transformation supports following operations:
Group By, Sum, Average, Count, Count Distinct, Minimum, Maximum

AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..

CHARACTER MAP - When it comes to string formatting in SSIS, Character Map transformation is very useful, used to convert data lower case, upper case.

CONDITIONAL SPLIT – used to split the input source data based on condition.

COPY COLUMN - Add a copy of column to the output, we can later transform the copy keeping the original for auditing.

DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.

DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.

DERIVED COLUMN - Create a new (computed) column from given expressions.

EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.

FUZZY GROUPING – Groups the rows in the dataset that contain similar values.

FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.

IMPORT COLUMN - Reads image specific column from database onto a flat file.

LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used to find exact matches only.

MERGE - Merges two sorted data sets of same column structure into a single output.

MERGE JOIN - Merges two sorted data sets into a single dataset using a join.

MULTI CAST - is used to create/distribute exact copies of the source dataset to one or more destination datasets.

ROW COUNT - Stores the resulting row count from the data flow / transformation into a variable.

ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.

UNION ALL - Merge multiple data sets into a single dataset.

PIVOT – Used for Normalization of data sources to reduce analomolies by converting rows into columns

UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.

3. What are the different types of Transaction Options

Required: If a transaction already exists at the upper level, the current executable will join the transaction. If No transaction at the upper level, a new transaction is created automatically.

Supported:In any executable, if there is a transaction at upper level, the executable join the transaction else do not create a new transaction.

Not Supported:The executable of the package do not honour any transaction ie do not join other transaction nor creates new transaction.

4. Explain about Checkpoints with properties

Checkpoint is used to restart the package execution from the point of failure rather than from initial start.

Set the following Properties:
CheckpointFileName: Specifies the name of the checkpoint file.
CheckpointUsage: Never, IfExists, Always
SaveCheckpoints: indicates whether the package needs to save checkpoints. This property must be set to True to restart a package from a point of failure.

FailPackageOnFailure: property needs to be set to True for enabling the task in the checkpoint.
Checkpoint mechanism uses a Text File to mark the point of package failure.

These checkpoint files are automatically created at a given location upon the package failure and automatically deleted once the package ends up with success.

5. Package configuration? Different types of Configuration Files

The package can be transferred across various environments like development and unit testing, system testing, UAT and production. Most packages will have environment specific variables like connection string to a database or path to a flat file, or user defined variables etc. that would be impacted while moving the package across environments as part of deployment process. Hence, it is mandatory to change these environment dependent variables when the package is transferred across environments. Package configurations help in managing such changes without actually opening and editing the SSIS package in Business Intelligence Development Studio (BIDS). After deploying the package to a different machine (using SQL Server or file system deployment mode) it is mandatory to copy the related package configuration files on to that machine. If the package is scheduled to run through a SQL Agent job, the configuration file should be added while creating the job so that package will read the information from the configuration file. While executing the job, SQL Agent will take the design time values for connection strings if the package configuration file is not supplied.

There are 5 configuration types available with package configurations.

Sunday, July 3, 2016

Important Interview Questions in Sql Server.

1.Ho to print 1-100 numbers without using loop?

SOL:
with cte
as
(
select 1 as count
union all
select count + 1 from cte where count <100
)
select * from cte

Monday, May 30, 2016

How to get unmatched rows from two tables

Let Create two tables

create table t1 (Id int, name varchar(50))
create table t2 (Id int, name varchar(50))

insert t1 values (1, 'avi')
insert t1 values (2, 'ravi')
insert t1 values (3, 'sun')
insert t1 values (5, 'naveen')
insert t1 values (7, 'rachin')

insert t2 values (1, 'avi')
insert t2 values (2, 'ravi')
insert t2 values (4, 'naveen')
insert t2 values (6, 'raja')

Query to find unmatched rows

Option 1:
Select Id, Name
from T1
where Id not in (select Id from T2)
UNION
Select Id, Name
from t2

where Id not in (select Id from t1)

Option 2:

select isnull(t1.id, t2.id), isnull(t1.name,t2.name) from t1
full outer join t2 on t1.id = t2.id
where t2.Id is null or t1.id is null

Monday, May 16, 2016

How to calculate running total

How to calculate running total ?

create table store(Id int primary key,productname varchar(20), sales int)

insert into store values (101,'grapes', 4000),
(102,'apples', 10000),
(103,'carrot', 14000),
(104,'watermelon', 9000),
(105,'mangoes', 12000)

select * from store

Below is the query to find the running total sales

select ID, productname,sales
,(select sum(sales) from store
where ID <= S.ID)
'Running Total'
from store S

How to find a Leap year by using User Defined functions

Leap year by using User Defined functions

Function:

CREATE FUNCTION [dbo].[LeapYear] ( @Date DATETIME )

RETURNS VARCHAR(100)

BEGIN

IF (YEAR( @Date ) % 4 = 0 AND YEAR( @Date ) % 100 != 0) OR

YEAR( @Date ) % 400 = 0

RETURN 'Leap Year'

RETURN 'Non-Leap Year'

END

To execute the above function

select [dbo].[LeapYear] ('2016')

Once you practice delete all these stuff

drop function [dbo].[LeapYear]

Monday, April 4, 2016

Stored Procedure in Sql Server

what is a stored procedure ?
A stored procedure is group of T-SQL statements. whenever we want write a same query over and over again,we can save the specific query as a stored procedure and call just by its name.

Lets take a simple example

we have table named employee .we want to get name and department of employee every time .

for that we need to write a query and run every time

query :select name, department from employee

Instead of writing the query every time, we can wrap up this query inside stored procedure and call that .
To create a stored procedure

create procedure getemployeedtls
as
begin

select name, department from employee
end

and execute the stored procedure by using its name getemployeedtls or else

execute getemployeedtls

we can execute the stored procedure whenever we want the details of employee table no need to run the query.

Actually stored procedure has many uses

In reality the stored procedure will be long ,there are queries 2000 to 3000 lines.it is very difficult to run query every time . by using the stored procedure we can speed up the process.
It will reduce the network traffic .
security also main aspect of stored procedure.

Stored procedure with parameters

create procedure employeeiddept
@department varchar(20)
as
begin
select name ,salary,department from employee where department=@department
end

to execute the above procedure we should pass parameter

employeeiddept 'finance'

we get all the names and salary of employee table belongs to finance department.

We can alter the stored procedure by using the alter command

For example we want the result order by salary

Alter procedure employeeiddept
@department varchar(20)
as
begin
select name ,salary,department from employee where department=@department order by salary
end

Thursday, March 31, 2016

Performance Tuning of SQL SERVER

Performance Tuning of SQL SERVER

Best performance is the main concern to develop a successful application. Like a coin database is the tail side (back-end) of an application. A good database design provides best performance during data manipulation which results into the best performance of an application.

During database designing and data manipulation we should consider the following key points:

1. Choose Appropriate Data Type

Choose appropriate SQL Data Type to store your data since it also helps in to improve the query performance. Example: To store strings use varchar in place of text data type since varchar performs better than text. Use text data type, whenever you required storing of large text data (more than 8000 characters). Up to 8000 characters data you can store in varchar.

2. Avoid nchar and nvarchar

Practice to avoid nchar and nvarchar data type since both the data types takes just double memory as char and varchar. Use nchar and nvarchar when you required to store Unicode (16-bit characters) data like as Hindi, Chinese characters etc.

3. Avoid NULL in fixed-length field

Practice to avoid the insertion of NULL values in the fixed-length (char) field. Since, NULL takes the same space as desired input value for that field. In case of requirement of NULL, use variable-length (varchar) field that takes less space for NULL.

4. Avoid * in SELECT statement

Practice to avoid * in Select statement since SQL Server converts the * to columns name before query execution. One more thing, instead of querying all columns by using * in select statement, give the name of columns which you required.

1. -- Avoid

2. SELECT * FROM tblName

3. --Best practice

4. SELECT col1,col2,col3 FROM tblName

5. Use EXISTS instead of IN

Practice to use EXISTS to check existence instead of IN since EXISTS is faster than IN.

1. -- Avoid
2. SELECT Name, Price FROM tblProduct
3. where ProductID IN (Select distinct ProductID from tblOrder)
4. --Best practice
5. SELECT Name,Price FROM tblProduct
6. where ProductID EXISTS (Select distinct ProductID from tblOrder)

6. Avoid Having Clause

Practice to avoid Having Clause since it acts as filter over selected rows. Having clause is required if you further wish to filter the result of an aggregations. Don't use HAVING clause for any other purpose.

7. Create Clustered and Non-Clustered Indexes

Practice to create clustered and non-clustered index since indexes helps in to access data fastly. But be careful, more indexes on a tables will slow the INSERT, UPDATE, DELETE operations. Hence try to keep small no of indexes on a table.

8. Keep clustered index small

Practice to keep clustered index as much as possible since the fields used in clustered index may also used in nonclustered index and data in the database is also stored in the order of clustered index. Hence a large clustered index on a table with a large number of rows increase the size significantly.

9. Avoid Cursors

Practice to avoid cursor since cursor are very slow in performance. Always try to use SQL Server cursor alternative.

10. Use Table variable inplace of Temp table

Practice to use Table varible in place of Temp table since Temp table resides in the TempDb database. Hence use of Temp tables required interaction with TempDb database that is a little bit time taking task.

11. Use UNION ALL inplace of UNION

Practice to use UNION ALL in place of UNION since it is faster than UNION as it doesn't sort the result set for distinguished values.

12. Use Schema name before SQL objects name

Practice to use schema name before SQL object name followed by "." since it helps the SQL Server for finding that object in a specific schema. As a result performance is best.

1. --Here dbo is schema name

2. SELECT col1,col2 from dbo.tblName

3. -- Avoid

4. SELECT col1,col2 from tblName

13. Keep Transaction small

Practice to keep transaction as small as possible since transaction lock the processing tables data during its life. Sometimes long transaction may results into deadlocks.

14. SET NOCOUNT ON

Practice to set NOCOUNT ON since SQL Server returns number of rows effected by SELECT, INSERT, UPDATE and DELETE statement. We can stop this by setting NOCOUNT ON like as:

1. CREATE PROCEDURE dbo.MyTestProc

2. AS

3. SET NOCOUNT ON

4. BEGIN

5. .

6. .

7. END

15. Use TRY-Catch

Practice to use TRY-CATCH for handling errors in T-SQL statements. Sometimes an error in a running transaction may cause deadlock if you have no handle error by using TRY-CATCH.

16. Use Stored Procedure for frequently used data and more complex queries

Practice to create stored procedure for quaery that is required to access data frequently. We also created stored procedure for resolving more complex task.

17. Avoid prefix "sp_" with user defined stored procedure name

Practice to avoid prefix "sp_" with user defined stored procedure name since system defined stored procedure name starts with prefix "sp_". Hence SQL server first search the user defined procedure in the master database and after that in the current session database. This is time consuming and may give unexcepted result if system defined stored procedure have the same name as your defined procedure.

Tuesday, March 29, 2016

What is a subquery, Nested Subqueries&Correlated Subqueries?

What is a subquery?

A subquery is a SELECT statement with another SQL statement, like in the example below:

select *
from employee
where id in
(select employee_id
from employee
where id = 15);

Subqueries are further classified as either a correlated sub query or a nested sub query

Nested Subqueries
Nested sub-queries are sub queries that don't rely on an outer query. In other words, both queries in a nested sub query may be run as separate queries.
This type of sub query could be used almost everywhere

Select *
from employee
where city in (select city
from employer);

The example subquery returns all clients that are from the same city as the product providers.
The IN operator checks if the value is within the table and retrieves the matching rows.

Correlated Subqueries
Subqueries are correlated when the inner and outer queries are interdependent, that is, when the outer query is a query that contains a subquery and the subquery itself is an inner query. Users that know programming concepts may compare it to a nested loop structure.

Let's start with a simple example. The inner query calculates the average value and returns it. In the outer query’s where clause, we filter only those purchases which have a value greater than the inner query’s returned value.

select id
from employee E1
where joiningdate > '2016-01-15'
and salary > (select avg(salary)
from employee E2
where E1.date = E2.date)

Wednesday, March 16, 2016

What is the use of COALESCE in SQL Server

Use of COALESCE in SQL Server:

Let us understand this by taking an example

create table candidate
( id int identity ,primaryemail varchar(20),secondaryemail varchar(20))

insert into candidate values ('avi@gmail.com',null)
insert into candidate values ('siva@gmail.com',null)
insert into candidate values ('ravii@gmail.com',null)
insert into candidate values ('bavi@gmail.com',null)
insert into candidate values (null,'chiull@gmail.com')
insert into candidate values (null,'boht@gmai.com')
insert into candidate values ('pandu@gmail.com','kaya@gmail.com')

QUERY:
select coalesce(primaryemail,secondaryemail) from candidate

Here in this example, the Candidate table is shown to include three columns with information about a Candidate:

1. id
2. PrimaryEmail
3. SecondaryEmail

COALESCE in the SELECT statement , selects the PrimaryEmail if it is not null. If the PrimaryEmail is null then SecondaryEmail will be selected. If both PrimaryEmail and SecondaryEmail is present then only PrimaryEmail is selected. So, COALESCE returns the first nonnull column among the list of columns passed. If both PrimaryEmail and SecondaryEmail is NULL, COALESCE returns NULL.

Differenece between isnull and coalesce ?

Coalesce is use for more than two columns where as Isnull works only for two column. Also you can use coalesce function in joining to get first not null value from more than two different table.

Reference:
http://venkatsqlinterview.blogspot.in/

Monday, March 14, 2016

How to find the Nth highest salary

To find the Nth highest salary

Step 1: Create a schema of a table named "Employee" in your database as in the following:

create table Employee

(

ID int identity,

Name varchar(20),

Salary money,

Department varchar(20)

)

Step 2:Insert some values in the table according to the column behaviour like Name and Data Type.

Insert into Employee(Name,Salary,Department)values('Avi',20000,'Finance')

Insert into Employee(Name,Salary,Department)values('Bhar',10000,'Finance')

Insert into Employee(Name,Salary,Department)values('Chait',28000,'IT')

Insert into Employee(Name,Salary,Department)values('Dhar',15000,'Finance')

Insert into Employee(Name,Salary,Department)values('Einstein',39000,'Finance')

Insert into Employee(Name,Salary,Department)values('Fedarer',12000,'Finance')

Insert into Employee(Name,Salary,Department)values('Giri',40000,'IT')

Insert into Employee(Name,Salary,Department)values('Hari',32000,'Finance')

Insert into Employee(Name,Salary,Department)values('Irfa',56000,'IT')

Insert into Employee(Name,Salary,Department)values('Janaki',29000,'Finance')

Select * from Employee

To find the 3 rd highest salary

Query : By using Top

SELECT TOP 1 salary

FROM (

SELECT DISTINCT TOP 3 salary

FROM employee

ORDER BY salary DESC

) a

ORDER BY salary

method 2 : By Using MIN

SELECT MIN(Salary)

FROM employee

WHERE Salary IN(SELECT TOP 3 Salary FROM employee ORDER BY Salary DESC)

Method 3 BY using Dense rank

SELECT Salary,Name

FROM

(

SELECT Salary,Name,DENSE_RANK() OVER(ORDER BY Salary DESC) Rno from EMPLOYEE

) tbl

WHERE Rno=3

By using above examples you can find any nth highest salary .You can get any highest value by replace 3 with your value .