Noise Words have been Silenced

SQL Server 2008 introduces STOPLISTS as a replacement to the infamous Noise Word files. So, how easy is it to have your very own STOPLIST? Well, as it turns out very easy indeed. Although there are no tools as such for this, the T-SQL commands are very easy to use.

So, first things first - I like to create my own company STOPLIST (the semi colon is required at the end):

CREATE FULLTEXT STOPLIST MYSTOPLIST FROM SYSTEM STOPLIST;

Lets just make sure our STOPLIST has been created. We can do this simply by:

SELECT * FROM sys.fulltext_stoplists 
What has that done for us then? Well, we created a new STOPLIST from the SYSTEM STOPLIST. So all the default STOPWORDS are included automatically. Er, Whats a STOPWORD? Well a STOPLIST contains one or more words that are called STOPWORDS. Each STOPWORD is associated with a LANGUAGE, so you can have a single STOPLIST which has all your STOPWORDS for each LANGUAGE you support. You might consider not basing your own STOPLIST on the system one as it will include ALL LANGUAGE’s STOPWORDS and on our test system here that means over 14K STOPWORDS.    

If you want to see a list of the STOPWORDS for your STOPLIST, you can run something like this (replace the STOPLIST_ID with your own and change the LANGUAGE if needed).

 
SELECT * FROM sys.fulltext_stopwords
WHERE
stoplist_id = 5
AND
language_id = 2057

So, how do I add or remove STOPWORDS from my own STOPLIST? Again, this is really simple.

This script adds the STOPWORD ‘job’ to the STOPLIST we created earlier. To confirm the STOPWORD was added, you can run the previous query. Note that the language is also specified.

ALTER FULLTEXT STOPLIST  MYSTOPLIST
ADD ‘job’ LANGUAGE 2057;

To remove a STOPWORD , you would do this:

ALTER

FULLTEXT STOPLIST MYSTOPLIST
DROP ‘b’ LANGUAGE 2057;

 

Now the only thing to say is that you need to specify your STOPLIST when you create your FULL TEXT INDEX. As its good practice to DROP and CREATE your Full Text Indexes in 2008 (if upgraded or restored from a 2005 database), this is not a problem either. I really like STOPLISTS, its another one of the simple enhancements in 2008 that although not a life changing feature it does make accessing another aspect of Full Text a lot easier.

 

One Response to “Noise Words have been Silenced”

  1. Full Text System Views « SQL Dev’s Playground Says:

    [...] For more information on STOPLISTS, check out my SQL Agent’s post at: http://sqlagents.wordpress.com/2008/04/20/noise-words-have-been-silenced/. [...]

Leave a Reply