Splitting a Table Full of Strings into Columns
Published: Mar 28, 2024
This was a fun afternoon I spent getting this working nicely. I don’t know if there are better ways of doing this, but if there are then the internet isn’t the place to look because if you type in string split you’ll get a whole bunch of pages about splitting a string… but just an individual string.

What I wanted was to be able to take a table full of delimited strings and make them into columns so that I could insert them into a table. So here’s what happened…

Firstly, it’ll be good to follow along with this one so if you have AdventureWorks (and SQL 2019, otherwise swap out the CONCAT_WS for something that works in your version) then you can create a set of delimited data as follows:

select top 1000 concat_ws('|', salesOrderID, SalesOrderNumber, AccountNumber, CustomerID, SalesPersonID, TotalDue)
from
AdventureWorks2012.Sales.SalesOrderHeader
where
SalesPersonID is not null
go



Now what we’re going to do is pretend that this output is data we have received in a txt file from a client (because that’s what happened in my case).

Now, here’s where a huge caveat comes in… I COULD use SSIS to import this mystery txt file into SQL Server BUT this wasn’t the only file. Files came in by the dozen, all with differing numbers of columns etc. Therefore, I COULD have used SSIS, but the overhead in making a package which catered for every eventuality (and of which more variations could arise in future), was too onerous.

So I thought I’d give SQL Server a shot at it.

Let’s drop and re-create a table to store our new data:

drop table if exists importSales
go

create table
importSales
(
    
salesRowID int identity(1, 1),
    
salesRaw varchar(1000)
)

go


Now we can get going.

Firstly, we know that to split a string we can now (in SQL Server 2016+) use string_split.

select value
from
string_split('43659|SO43659|10-4020-000676|29825|279|23153.23', '|')



This works great for a single string, but what about a whole table of strings?

select *
from
importSales i
cross apply string_split(salesRaw, '|') s



Not a problem.

So what we can do now is add a partitioned row_number to the equation in order to effectively create a ColumnID on our data:

select row_number() over(partition by salesRowID order by salesRowID) columnID, i.salesRowID, i.salesRaw, s.value
from
importSales i
cross apply string_split(salesRaw, '|') s



We’ll put this into a temporary table in order to make this a little easier to work with going forwards:

drop table if exists #temp

select row_number() over(partition by salesRowID order by salesRowID) columnID, i.salesRowID, i.salesRaw, s.value
into
#temp
from
importSales i
cross apply
string_split(salesRaw, '|') s


Now we’ve got a lot of rows, but we need this in a column format in order to place it into our table.

Conceptually, this is quite easy:

select *
from
#temp
where
columnID = 1

select *
from
#temp
where
columnID = 2



We can now easily get each column’s data for each row. In the above we can join these two tables together on salesRowID and be done:

select t1.salesRowID, t1.salesRaw,t1.columnID, t1.value,t2.columnID, t2.value
from
#temp t1
join
#temp t2
on
t1.salesRowID = t2.salesRowID
where
t1.columnID = 1
and
t2.columnID = 2



Likewise to add in another column:

select t1.salesRowID, t1.salesRaw,t1.columnID, t1.value,t2.columnID, t2.value,t3.columnID, t3.value
from
#temp t1
join
#temp t2
on
t1.salesRowID = t2.salesRowID
join
#temp t3
on
t1.salesRowID = t3.salesRowID
where
t1.columnID = 1
and
t2.columnID = 2
and
t3.columnID = 3



Hopefully you’re seeing a pattern now?

Therefore why don’t we use some dynamic SQL in order to build that pattern out?

We need a starting point, a basis for our query to be built:

declare @select varchar(max), @tables varchar(max), @where varchar(max)

select @select = 'select t1.value'
select
@tables = 'from #temp t1'
select
@where = 'where t1.columnID = 1'


Now that we have our starting point we know, based on the columnID, how many joins we need to make… it’s the same as the number of columns we have:

declare @numJoins int = (select max(columnID) from #temp)


We’re now going to write a WHILE loop in order to build our statement to the pattern we’ve already determined. You can see that here:

declare @joinCounter int = 2-- we know how many joins we need, so go add them...

while @joinCounter <= @numJoins
begin
     select
@select += (', t' + convert(varchar(10), @joinCounter) + '.value'),
        
        -- we're using the pattern ', tX.value'
    
                
@tables += ('join #temp t' + convert(varchar(10), @joinCounter) + 'on t1.salesRowID = t' + convert(varchar(10), @joinCounter) + '.salesRowID'),
                -- using the pattern:-- join #temp tX-- on t1.salesRowID = tX.salesRowID
@where += ('and t' + convert(varchar(10), @joinCounter) + '.columnID = ' + convert(varchar(10), @joinCounter))
                -- using the pattern:-- and tX.columnID = X
from (select distinct columnID

    from
#temp
    where
columnID = @joinCounter) x
                -- add the values for each column required


    select
@joinCounter += 1
end


And if we look at the output of our PRINT statement you can see we were successful:


So now let’s run the statement instead of just printing the results:

declare @numJoins int = (select max(columnID) from #temp)

declare
@select varchar(max), @tables varchar(max), @where varchar(max)

select
@select = 'select t1.value'
select
@tables = 'from #temp t1'
select
@where = 'where t1.columnID = 1'

declare
@joinCounter int = 2

while
@joinCounter <= @numJoins
begin
    select
@select += (', t' + convert(varchar(10), @joinCounter) + '.value'),
                @tables
+= ('join #temp t' + convert(varchar(10), @joinCounter) + 'on t1.salesRowID = t' + convert(varchar(10), @joinCounter) + '.salesRowID'),
                @where
+= ('and t' + convert(varchar(10), @joinCounter)) + '.columnID = ' + convert(varchar(10), @joinCounter)
    from
(select distinct columnID from #temp where columnID = @joinCounter) x
    
    select
@joinCounter += 1
end

declare
@sql varchar(max)

select
@sql = @select + '' + @tables + '' + @where

print
@sql
exec
(@sql)
go



We have now turned our delimited strings into a table. Exactly what we wanted.

So, how is this a solution? Well… imagine we’ve imported our data into some tables with standard names:

-- Make imports generic

drop table if exists testImport1, testImport2
go

create table
testImport1
(
    
importID int identity(1, 1),
    importData
varchar(1000)
)

create table testImport2
(

    importID
int identity(1, 1),
    importData
varchar(1000)
)
go

insert into
testImport1
select top 50 concat_ws('|', salesOrderID, SalesOrderNumber, TotalDue)
from
AdventureWorks2012.Sales.SalesOrderHeader

insert into testImport2
select top
50 concat_ws('|', salesOrderID, AccountNumber, CustomerID, TerritoryID, CreditCardApprovalCode, TotalDue)
from
AdventureWorks2012.Sales.SalesOrderHeader
go

select
*
from
testImport1

select
*
from
testImport2
go



You can see that we now have two tables with similar data (two columns with the same column names), yet completely different sized data and therefore different columnar outputs.

So what we’ll do now is to wrap our logic into a procedure:

-- Wrap into a proc

drop procedure if exists columnSplit
go

create procedure
columnSplit
     @tableName
varchar(250)
as
    create table
#temp
    (
        
columnID int,
         importID
int,
         importData
varchar(1000)
    )
    
    
declare @table varchar(max)
    
    
select @table = 'select row_number() over(partition by importID order by importID) columnID, i.importID, s.value
from '
+ @tableName + ' i
cross apply string_split(importData, ''|'') s'
    
    
insert into #temp
     exec
(@table)
    
     declare
@numJoins int = (select max(columnID) from #temp)
    
     declare
@select varchar(max), @tables varchar(max), @where varchar(max)
    
     select
@select = 'select t1.importData'
     select
@tables = 'from #temp t1'
     select
@where = 'where t1.columnID = 1'
    
     declare
@joinCounter int = 2
    
     while
@joinCounter <= @numJoins
     begin
         select
@select += (', t' + convert(varchar(10), @joinCounter) + '.importData'),
                     @tables
+= ('join #temp t' + convert(varchar(10), @joinCounter) + 'on t1.importID = t' + convert(varchar(10), @joinCounter) + '.importID'),
                     @where
+= ('and t' + convert(varchar(10), @joinCounter)) + '.columnID = ' + convert(varchar(10), @joinCounter)
         from
(select distinct columnID from #temp where columnID = @joinCounter) x
        
         select
@joinCounter += 1
     end
    
     declare
@sql varchar(max)
    
     select
@sql = @select + '' + @tables + '' + @where
    
     print
@sql
    
     exec
(@sql)
go


Once that’s done we can now simply call our procedure, passing in the name of each of our raw data tables, no matter what the data inside them or the resulting schema, and our procedure will turn them into tables for us. Awesome:

exec columnSplit 'testImport1'
go

exec
columnSplit 'testImport2'
go



As I said, I don’t know if there’s a better way to do this (there likely is, there are some clever folks out there), but this amused me for an afternoon so I thought I’d share in case it’s ever useful.

Leave a Comment
Your email address will not be published. All fields are mandatory.
NB: Comments will only appear once they have been moderated.

Kutech  CEO
Kevin  Urquhart

CEO

I am a SQL Server DBA, Architect, Developer, Trainer, and the owner and CEO of Kutech. This blog has been going for nearly 10 years now over several guises and this is its new home. I hope you find it useful whatever your endeavour.

Categories


© Copyright 2021 Kutech Solutions Ltd.

Our site uses cookies only to personalise content and analyse traffic. We do not use cookies for advertising. Policy