Last night I was trying to cleanup the spammers from the database of
devpinoy.org and while I was evaluating the result sets i was able to
conclude that aside from using common spam text like 'cheap', 'buy',
'free', 'deal', 'viagra', 'prozac' that 30% of the false emails that
spam accounts are using multiple dots on their email address. A good
example is a subset below from the list of offenders that I found in the
devpinoy db.

Having
found that fact I immediately created a sql script that will delete
users from the database if they have more than 2 dots in their email
address.
Enough with the side note and here is some code.
DECLARE @string2check varchar(50)
DECLARE @character2find char
SET @string2check = 'this is a very long string'
SET @character2find = 'i'
PRINT LEN(@string2check) - LEN(REPLACE(@string2check, @character2find, ''))
What
the code above is doing is that it is removing the characters that
matched our search key and then subracts the length of that string to
the original string to find the total occurrence of the character we are
looking for.
Now, if you want to use this as a function you can
use this:
CREATE FUNCTION udf_CountCharOccurence
( @string2check varchar(500)
, @character2find char
)RETURNS INT
BEGIN
RETURN (LEN(@string2check) - LEN(REPLACE(
@string2check,
@character2find)
)
)
END
GO
The code above works great but there's a catch. If you
are concerned with case sensitivity then the code above wont work. The
way around it is to use COLLATION which is supported by the SQL function
below:
CREATE FUNCTION udf_CountCharOccurenceCaseSensitive
( @string2check varchar(500)
, @character2find char
)RETURNS INT
BEGIN
RETURN (LEN(@string2check) - LEN(REPLACE(
@string2check COLLATE SQL_Latin1_General_Cp1_CS_AS,
@character2find COLLATE SQL_Latin1_General_Cp1_CS_AS, '')
)
)
END
GO
In order to use this in your query all you need to do is
PRINT dbo.udf_CountCharOccurenceCaseSensitive('This is a long text','i')
Or
if you want to put it to use to meet the criteria that I mentioned
about dots on emails you can do it this way:
SELECT * FROM Users
WHERE dbo.udf_CountCharOccurenceCaseSensitive(EmailAddress,'.') > 2
HTH
Posted
06-09-2010 3:03 PM
by
keithrull