Strextract UDF - Retrieves String Between Two Delimiters

topic: 

The ufn_StrExtract() UDF for SQL Server is similar to the VFP StrExtract() function.

$SAMPLECODE$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

-- ufn_StrExtract ---
IF EXISTS (SELECT * FROM sysobjects WHERE  name = N'ufn_StrExtract')
	DROP FUNCTION ufn_StrExtract
GO

CREATE FUNCTION ufn_StrExtract (
	@StrIn varchar(8000),
	@StartDelim varchar(128),
	@EndDelim varchar(128),
	@Flag int=0)
									
RETURNS varchar(8000)
AS
/* Extracts part of a string between two delimiters
 Parameters:
	@StrIn  - source character expression 
	@StartDelim, @EndDelim - start and end delimiters
	@Flag - modifies function behavior:
			  2 - End delimiter not required. If it's not found returns the rest of the string
			  4 - Delimiters are included in result
*/
BEGIN
	DECLARE @StartPos int, @EndPos int, @LenStartDelim int, @LenEndDelim int
	DECLARE @StrOut varchar(8000)
	
	-- DATALENGTH() returns actual length including trailing spaces
	SET @LenStartDelim = DATALENGTH(@StartDelim)
	SET @LenEndDelim   = DATALENGTH(@EndDelim)

	IF @LenStartDelim = 0  
		-- If Start delimiter is empty, extract from the beginning of the string 
		SET @StartPos = 1
	ELSE
		-- Find Start delimiter position in the string
		SET @StartPos = CHARINDEX(@StartDelim, @StrIn)

	IF @StartPos > 0	BEGIN	
		-- Look for End delimiter position
		IF @LenEndDelim = 0  
			-- If End delimiter is empty, extract to the end of the string 
			SET @EndPos = DATALENGTH(@StrIn) 
		ELSE BEGIN
			-- Find End delimiter position in the string
			SET @EndPos = CHARINDEX(@EndDelim, @StrIn, @StartPos + 1) - 1
			IF (@EndPos <= 0) AND (@flag & 2)>0
				-- The End delimiter is not found but flag 2 is set
				--	extract to the end of the string	
				SET @EndPos = DATALENGTH(@StrIn) 
		END
	END	
	
	IF (@StartPos > 0) AND (@EndPos > 0) BEGIN
		-- We found both delimiters
		IF (@flag & 4)=0
			-- Flag 4 is not set, don't include Start delimiter into result
			SET @StartPos = @StartPos + @LenStartDelim
		IF @EndPos < DATALENGTH(@StrIn)  AND (@flag & 4)>0
			-- Flag 4 is set, include End delimiter into result
			SET @EndPos = @EndPos + @LenEndDelim
			
		-- Extract substring 
		SET @StrOut = SUBSTRING(@StrIn, @StartPos, @EndPos - @StartPos + 1)

	END
	ELSE
		-- One of the delimiters or both not found, return empty string
		SET @StrOut = ''

	RETURN @StrOut
END
GO

-- Sample of using ufn_StrExtract UDF
SELECT dbo.ufn_StrExtract('The quick brown fox jumps over the lazy dog', 'fox', 'dog', 4)
SELECT dbo.ufn_StrExtract('The quick brown fox jumps over the lazy dog', 'fox', 'dog', DEFAULT)

Comments

Is there a way to do this for multiple delimiters? I have data that looks like this: "0001; 0002; 0003; 0004" and I want to return a recordset of each number in the string.

Check <a href="/content/pages/sql/microsoft-sql-server/alines-udf-string-split">Alines UDF - String Split</a> and related comments.

Is there any way to do for more than one occurrence, for e.g string = '[A]345[/A]textstringvalue[A]356[/A]and having another value[A]565[/A]'

now i want a function to extract these number between delimiter [A] [/A]may be into a table

Check <a href="/content/pages/sql/microsoft-sql-server/alines-udf-string-split">Alines UDF - String Split</a> and related comments.

That's great!
That is exactly what I was looking for in order to extract data from html/xml string.
Thank you!

Thank you Sergey for great articles.
In some situations your code produces incorrect @EndPos and becomes smaller than @StartPos in final SUBSTRING.
It happens when End delimiter can be found somewhere in the string before Start delimiter.

Bug:
-- Find End delimiter position in the string
SET @EndPos = CHARINDEX(@EndDelim, @StrIn, @StartPos + 1) - 1

Correct:
-- Find End delimiter position in the string
SET @EndPos = CHARINDEX(@EndDelim, @StrIn, @StartPos <strong>+ @LenStartDelim</strong> + 1) - 1

I found that problem parsing string with more spaces between words in it and the End delimiter happen to be a space sign too.