Noise Words have been Silenced

April 20, 2008

SQL Server 2008 introduces STOPLISTS as a replacement to the infamous Noise Word files. So, how easy is it to have your very own STOPLIST? Well, as it turns out very easy indeed. Although there are no tools as such for this, the T-SQL commands are very easy to use.

So, first things first – I like to create my own company STOPLIST (the semi colon is required at the end):

CREATE FULLTEXT STOPLIST MYSTOPLIST FROM SYSTEM STOPLIST;

Lets just make sure our STOPLIST has been created. We can do this simply by:

SELECT * FROM sys.fulltext_stoplists 
What has that done for us then? Well, we created a new STOPLIST from the SYSTEM STOPLIST. So all the default STOPWORDS are included automatically. Er, Whats a STOPWORD? Well a STOPLIST contains one or more words that are called STOPWORDS. Each STOPWORD is associated with a LANGUAGE, so you can have a single STOPLIST which has all your STOPWORDS for each LANGUAGE you support. You might consider not basing your own STOPLIST on the system one as it will include ALL LANGUAGE’s STOPWORDS and on our test system here that means over 14K STOPWORDS.    

If you want to see a list of the STOPWORDS for your STOPLIST, you can run something like this (replace the STOPLIST_ID with your own and change the LANGUAGE if needed).

 
SELECT * FROM sys.fulltext_stopwords
WHERE
stoplist_id = 5
AND
language_id = 2057

So, how do I add or remove STOPWORDS from my own STOPLIST? Again, this is really simple.

This script adds the STOPWORD ‘job’ to the STOPLIST we created earlier. To confirm the STOPWORD was added, you can run the previous query. Note that the language is also specified.

ALTER FULLTEXT STOPLIST  MYSTOPLIST
ADD ‘job’ LANGUAGE 2057;

To remove a STOPWORD , you would do this:

ALTER

FULLTEXT STOPLIST MYSTOPLIST
DROP ‘b’ LANGUAGE 2057;

 

Now the only thing to say is that you need to specify your STOPLIST when you create your FULL TEXT INDEX. As its good practice to DROP and CREATE your Full Text Indexes in 2008 (if upgraded or restored from a 2005 database), this is not a problem either. I really like STOPLISTS, its another one of the simple enhancements in 2008 that although not a life changing feature it does make accessing another aspect of Full Text a lot easier.

 


Database Snapshots: Quick as a Flash

March 27, 2008

The advent of SQL 2005 gave us the power of Database Snaphots. What are Snapshots? They’re just great! Someone with a lot more time than me has already written a detailed blog post on the subject: http://www.simple-talk.com/sql/database-administration/sql-server-2005-snapshots/. Unfortunately Management Studio did not come with GUI support for them, which is a real shame considering how often I use them.

However, I have written a script which I keep as a SQL Server Template (easily accessible via Template Explorer). Maybe it’s useful to you also..?

/*
Description:           Script to create a Database Snapshot of the current database
Author:                                   Nick Allan
Date:                                       Mar 2008
*/

DECLARE @Reference nvarchar(10)

SET @Reference = ‘FBCXXXX’

DECLARE @SQL nvarchar(max), @SnapshotSuffix nvarchar(100) ,@SnapshotName sysname

SELECT @SnapshotSuffix = ‘_Snapshot_’
+ IsNull(@Reference + ‘_’, )
+ Substring(suser_name(), CharIndex(‘\’, suser_name()) + 1, Len(suser_name())),
@SnapshotName = DB_Name() + @SnapshotSuffix

IF DB_ID(@SnapshotName) IS NOT NULL

BEGIN
PRINT ‘–Database already exists…     DROP DATABASE ‘ + DB_Name() + @SnapshotSuffix
RETURN
END

 SELECT @SQL =
CREATE DATABASE ‘
+ @SnapshotName +
ON (
NAME = ‘
+ [name] + ‘,
FILENAME = ”’
+ LEFT(physical_name, Len(physical_name) - 4) + @SnapshotSuffix + ‘.ss”
)
AS SNAPSHOT OF ‘
+ DB_Name()

  FROM sys.master_files
WHERE database_id = DB_ID()   AND type = 0 –Data

 BEGIN TRY
EXEC
(@SQL)
PRINT
‘Created DB snapshot: ‘ + @SnapshotName
END
TRY

BEGIN CATCH
DECLARE @ErrorMessage nvarchar(max),  @ErrorSeverity int
SELECT @ErrorMessage = ERROR_MESSAGE(),  @ErrorSeverity = ERROR_SEVERITY()

  RAISERROR ( @ErrorMessage,@ErrorSeverity,1 –Error State
)

PRINT @SQL
END
CATCH


Properly Cleaning Up Dropped Subscriptions

March 25, 2008

If you are working in an environment utilising replication, you may find orphaned subscriptions show up on subscribers long after they have been dropped at the publisher. Well, why they aren’t cleared up when one drops a subscription or subsequently the publication itself.. you may ask… that’s a different question all together.

You can run a not so widely known command at the subscriber to clear out any orphaned subscriptions still listed on the subscribers. In fact, should I dare suggest it would actually be a great idea to implement this within your drop subscription process.

The command you can use is “sp_subscription_cleanup”. It has the following syntax:

sp_subscription_cleanup

[ @publisher = ] publisher ,

[ @publisher_db = ] publisher_db

[ , [ @publication = ] publication]
where,

@publisher => is the name of the Publisher

@publisher_db => is the name of the Publisher database

@publication => is the name of the publication, with a default of NULL. If NULL, subscriptions using a shared agent publication in the publishing database will be deleted

At the Subscriber on the subscription database, execute sp_subscription_cleanup to remove replication metadata in the subscription database.


SQL Server Table Partitioning without Enterprise Edition

March 16, 2008

I am going to show a way to have table partitioning in SQL Server databases without the need for the Enterprise Edition of SQL Server that can easily to used in 2000, 2005 or 2008.

As some of you know true Table Partioning was introduced in 2005 but this requires the Enterprise Edition, this is unfortunate as that particular version is rather expensive and out of reach for many small-medium sized businesses. However, the ability to partition/split data is a common request – especially as data grows and there are natural splits within your schema to take advantage of.

To achieve partitioning without EE it turns out is relatively simple and offers nearly all of the benefits of true partioning.

Lets say we have a ficticious company called NurseBank. NurseBank is operated in 3 Regions; England, Scotland and Wales. Every time a Nurse visits a Patient a Visit record is created and this is assigned to a Region who deals with the booking of Nurses for that Region.

In our original schema we would have had something like this for the Visit table:

CREATE TABLE Visit

(

VisitID INT IDENTITY(1,1) NOT NULL,
RegionId INT NOT NULL,
VisitDate DATETIME NOT NULL,
NurseId INT NOT NULL,
ClientId INT NOT NULL,
CONSTRAINT PK_Visit_VisitID PRIMARY KEY (VisitId)

)

Now, if you created that table; drop it. We are going to create 3 new tables to cater for the new partitions.

CREATE TABLE dbo.Visit_England

(

VisitId INT NOT NULL ,
RegionId INT NOT NULL
CONSTRAINT
CK_Visit_England CHECK (RegionId = 1),
VisitDate DATETIME NOT NULL,
NurseId INT NOT NULL,
ClientId INT NOT NULL,
CONSTRAINT PK_Visit_England PRIMARY KEY (VisitId,RegionId)

)

CREATE TABLE dbo.Visit_Scotland

(

VisitId INT NOT NULL ,
RegionId INT NOT NULL
CONSTRAINT CK_Visit_Scotland CHECK (RegionId = 2),
VisitDate DATETIME NOT NULL,
NurseId INT NOT NULL,
ClientId INT NOT NULL,
CONSTRAINT PK_Visit_Scotland PRIMARY KEY (VisitId,RegionId)

)

CREATE TABLE dbo.Visit_Wales

(

VisitId INT NOT NULL ,
RegionId INT NOT NULL
CONSTRAINT CK_Visit_Wales CHECK (RegionId = 3),
VisitDate DATETIME NOT NULL,
NurseId INT NOT NULL,
ClientId INT NOT NULL,
CONSTRAINT PK_Visit_Wales PRIMARY KEY (VisitId,RegionId)

)

With these three tables created. We can create the Partitioned View that will allow us to select, update and insert data into the partitions.

CREATE VIEW dbo.Visit
WITH SCHEMABINDING
AS
SELECT VisitId, RegionId, VisitDate,NurseId, ClientId FROM
dbo.Visit_Scotland
UNION ALL
SELECT VisitId, RegionId, VisitDate,NurseId, ClientId FROM
dbo.Visit_England
UNION ALL
SELECT VisitId, RegionId, VisitDate,NurseId, ClientId FROM
dbo.Visit_Wales

You can now insert data into the Visit view as we would have done previously with the original table (with one exception that I will cover shortly).

 

– INSERT INTO England Region Visit table
INSERT INTO Visit (VisitId, RegionId, VisitDate, NurseId, ClientId)
VALUES (1000,1,GETDATE(),4122,5211)

– INSERT INTO Scotland Region Visit table
INSERT INTO Visit (VisitId, RegionId, VisitDate, NurseId, ClientId)
VALUES (1001,2,GETDATE(),4123,5212)

The interesting and important aspect is what SQL Server makes of all of this and what query plan is used.

SELECT VisitID FROM Visit WHERE RegionId = 1

This will produce the following query plan:


As you can see it correctly uses the Visit_England partitioned table. If the Region is omitted from the WHERE clause, the query plan is vastly different and of course cannot determine the correct table to use so basically it UNION joins them all as shown below.

sqlplan3.jpg

I don’t think I need to say but this is bad. However, the whole point of partitioning the data is when you have a need to do so (large volumes of data) and you have a viable partitioning scheme that will avoid the above type of plan as much as possible.

Lastly, I wanted to mention the exception I hinted at before.

For the partitioned view to fully work (as in allow inserts), the partitioned table cant have an identity column.

What! I hear you say, whats the point of that? How I can use this?. Well, in most situations the partitioning of the data is done to improve READ performance of queries. If this is the case then the perfect scenario for this type of partitioning is to partition the data on a replicated separate database. The way this works is that you have your inserts performed on your main database or cluster and when the data is replicated to the other database (lets call this the readonly but it isn’t a physical readonly – its just we wont be doing writes on it) the replication procedure is modified to insert into the Partitioned View and the original table thats replicated. That way you don’t need an identity column on the partitioned tables as this would be known and passed into the table via replication.

I hope you find this example useful, if you have any comments or questions regarding this please feel free to contact me via this blog. This is by no means the only way to improve performance and you really must consider all available options but I thought this was a useful and relatively painless method to try and one we currently use in our business.


CTE’s…What are they and why would I ever want to use one?

March 16, 2008

CTE’s were a new introduction in SQL 2005 and even now we’re looking down the throat on SQL 2008 there are still many people who are unclear as to what CTE’s are, how they would use them and when.

So what are they?

CTE is short for Common Table Expression and is basically and expression that returns a temporary result set inside of another statement. In this way they are very similar to using a declared temporary table which you then reference. They are part of the ANSI SQL 99 standard and they are particularly useful for doing recursive queries.

A recursive query is one which can be said to call itself many times over. An example would be a query which allows you to find the Manager and staff of every employee in a company. In the Adventure Works database the Employee table holds details for each employee and the ManagerId of the staff member they report to. These managerid are self joined to the respective employeeid’s in the same table. A CTE to find the information could look something like:

USE AdventureWorks ;
GO

WITH DirectReports(LoginID, ManagerID, EmployeeID) AS
(

SELECT LoginID, ManagerID, EmployeeID
FROM HumanResources.Employee
WHERE ManagerID IS NULL

UNION ALL

SELECT e.LoginID, e.ManagerID, e.EmployeeID
FROM HumanResources.Employee e
INNER JOIN DirectReports d
ON e.ManagerID = d.EmployeeID
)

SELECT *
FROM
DirectReports ;
GO

This gives us the login ID, Manager Id and the EmployeeID for each user.

This statement can be broken down into three parts. The first is the ‘Root’ or ‘Anchor Member’ :

SELECT LoginID, ManagerID, EmployeeID
FROM HumanResources.Employee
WHERE ManagerID IS NULL

This tells us our top, starting level in the hierarchy.

The second part is what makes the CTE recursive as it references itself using the join : INNER JOIN DirectReports d
ON e.ManagerID = d.EmployeeID

The statements are linked together by a UNION

UNION ALL

SELECT e.LoginID, e.ManagerID, e.EmployeeID
FROM HumanResources.Employee e
INNER JOIN DirectReports d
ON e.ManagerID = d.EmployeeID

Finally we want to see our result set and this is done using the select statement:

SELECT *
FROM
DirectReports ;

 

It is important to remember that the select statement must follow the CTE syntax immediatley. Try running the following as an example of how this fails when not called immediately after the CTE:

 

USE AdventureWorks ;
GO

WITH DirectReports(LoginID, ManagerID, EmployeeID) AS
(
SELECT LoginID, ManagerID, EmployeeID
FROM HumanResources.Employee
WHERE ManagerID IS NULL

UNION ALL

SELECT e.LoginID, e.ManagerID, e.EmployeeID
FROM HumanResources.Employee e
INNER JOIN DirectReports d
ON e.ManagerID = d.EmployeeID
)

 

SELECT * FROM HumanResources.Employee

SELECT *
FROM
DirectReports ;
GO

The following error is thrown:

 

Msg 208, Level 16, State 1, Line 15
Invalid object name ‘DirectReports’.

 

CTE’s are just another weapon in the developers arsenal. There will be times when it is appropriate to use them and times when there is a better solution. I recently read a great article by Peter He about the performance implications of using CTE against general looping constructs. Its worth a read and can be found here.

 


SQL 2008 MERGE Functionality

March 11, 2008

One of the new SQL 2008 T-SQL features is MERGE.

The great thing about MERGE is it allows you to perform simultaneous UPDATE, INSERT and or DELETE operations on one table. Its supported by new physical operators that combine these operations so that they can be performed in a single scan rather than multiple scans.

This means that for the first time you can assign the contents of one table or query to another in a single operation

Here’s a small example which uses a stored proc:


/********Lets create the table object*******/

IF OBJECT_ID(‘TeamMember’) IS NOT NULL

BEGIN
DROP
TABLE TeamMember
END

GO

CREATE TABLE TeamMember

(

EmpId INT PRIMARY KEY IDENTITY (1,1),

FirstName VARCHAR(200),

LastName VARCHAR(200),

Department VARCHAR(200)

)

GO

/********Insert some Team Members*******/
/********here using the new insert multi row functionality available in 2008*******/

INSERT INTO TeamMember (FirstName,LastName,Department)
VALUES

(‘James’,‘Hill’,‘IT’),

(‘Sarah’,‘Parker’,‘Sales’),

(‘Nick’,‘Thompson’,‘Marketing’)

GO

/********Create a procedure that might do some Team Member admin in ‘the REAL world’ !*******/

IF OBJECT_ID(‘up_InsertTeamMember’) IS NOT NULL
BEGIN
DROP
PROCEDURE up_InsertTeamMember
END

GO

CREATE PROCEDURE up_InsertTeamMember
(

@FirstName VARCHAR(200),
@LastName VARCHAR(200),
@Department VARCHAR(200),
@EmpId INT OUTPUT

)

AS

BEGIN

/********Lets do the MERGE here*******/
MERGE
TeamMember
USING (SELECT @FirstName,@LastName)
tm (FirstName,LastName)
ON
TeamMember.FirstName = @FirstName AND
TeamMember.LastName = @LastName

/********If its not there lets create a new Team Member*******/

WHEN NOT MATCHED THEN
INSERT
(FirstName,LastName,Department)
VALUES
(@FirstName,@LastName,@Department)

/********If it is there lets just set the existing id to the ouput variable*******/

WHEN MATCHED THEN
UPDATE
SET @EmpId=EmpId,Department = @Department;

/********Grab the new ID*******/
SET
@EmpID = COALESCE(SCOPE_IDENTITY(),@EmpID);

END

RETURN;

go

/********Heres the procedure, try adding a new memberand then a team member with the same FirstName and LastName but different department*******/

DECLARE @EmpId INT,@Message VARCHAR(100)

EXEC up_InsertTeamMember @FirstName = ‘James’,@LastName = ‘Smith’,@Department = ‘Sales’,@EmpId = @EmpId OUTPUT

PRINT @EmpId

GO

SELECT * FROM TeamMember

GO

 


Insert multiple rows

March 10, 2008

How many times have we had to write multiple INSERT INTO statements? Well, SQL Server 2008 comes to the rescue. Its a small feature but still very useful.

You can now insert multiple rows into a table in a single statement.

INSERT INTO T1 (Col1,Col2,Col3)
VALUES

(‘Row1′, ‘Value’,‘Test’),
(‘Row2′, ‘Value’,‘Test’),
(‘Row3′, ‘Value’,‘Test’)

This inserts 3 rows into the T1 table. Certainly time-saving and makes those sometimes hefty insert scripts smaller.


SQL Bits II

March 9, 2008

Last Saturday (1st March 2008) was the second SQL Bits meet, this time in Birmingham.  Once again it turned out to be a great event with interesting speakers and all for the grand price of…zilch!!

So I would personally like to extend my thanks for everyone that helped put it together and I look forward to the next one.

I notice that Chris Webb has set up a SQL Bits Facebook group so for those on FB, check that out.


Activation Procedure Being Disabled

March 7, 2008

Today I had a situation where Service Broker appeared to be running successfully but the activation procedure was not being called and no receive processes were being seen in Profiler. It turned out that the activation procedure had ’switched off’ although there was nothing I could see that indicated this other than the lack of activity. The process was restarted by simpling calling the activation procedure from MS.

The problem in my scenario started when the broker attempted to run a CLR function on a box on which CLR was not enabled (doh!) and by the time this had been enabled and the queue had grown significantly, the activation procedure had shut down.


Service Broker Deployment Numero Uno!

March 7, 2008

We have just deployed my first Service Broker project to the live environment and touch wood everything appears to have gone ok. Sure it took a little massaging to get it through and we certainly learned a few things along the way but thats all part of it right.

Working for a well known job board we are using service broker in conjunction with SQL CLR to do asynnchronous string manipulations to job adverts.  Orignally this was something we did through replication but it became clear very quickly that fast as CLR is, replication latency would be effected.  So instead we had a re-think and moved the process offline.  We’re sure that there are many other applications for Service Broker within our business now.