Tuesday, June 9, 2020

SQL Server: CONCAT, CONCAT_WS - Concatenating fields to hash row

Version: SQL Server 2019 Developer Edition

1. CONCAT ( string_value1, string_value2 [, string_valueN ] )


The CONCAT function is for concatenating values together. Because the CONCAT function implicitly converts null values to empty strings, sometimes we are faced with a situation where concatenated values get the same output and generate identical hashed outputs in the end. One possible solution is to place delimiter between fields like below:

WITH CTEConcatTest AS (
    SELECT * 
      FROM (
	VALUES('a', null, 'bcd'), ('ab', 'c', 'd')
      ) AS T(col1, col2, col3)
)
SELECT col1
       ,col2
       ,col3
       ,CONCAT(col1, '|', col2, '|', col3) AS Concated
       ,HASHBYTES('SHA2_256', CONCAT(col1, '|',
                                     col2, '|', 
                                     col3)) AS HashConcated
  FROM CTEConcatTest








2. CONCAT_WS ( separator, argument1, argument2 [, argumentN]... )


Starting from 2017, CONCAT_WS(Concatenate With Separator) function can be helpful for handling the above situation. 

WITH CTEConcatTest AS (
    SELECT * 
      FROM (
	VALUES('a', null, 'bcd'), ('ab', 'c', 'd')
      ) AS T(col1, col2, col3)
)
SELECT col1
       ,col2
       ,col3
       ,CONCAT_WS('|', col1, col2, col3) AS Concated
       ,HASHBYTES('SHA2_256', CONCAT_WS('|', 
				col1, col2, col3)) AS HashConcated
  FROM CTEConcatTest

CONCAT_WS

But we need to pay attention to hashed output from first record('a', null, 'bcd'). CONCAT_WS hashed value is different from CONCAT version. Let's compare concatenated value from "Concated" field. CONCAT version shows "a|(pipe)|(pipe)bcd", but CONCAT_WS version shows "a|(pipe)bcd". 😕 According to MS SQL Docs, CONCAT_WS ignores null values. This behavior is different from CONCAT which converts null values to empty strings implicitly. To make CONCAT_WS consider null field as an empty string like CONCAT, we can wrap the field with the ISNULL() function.

WITH CTEConcatTest AS (
    SELECT * 
      FROM (
	VALUES('a', null, 'bcd'), ('ab', 'c', 'd')
      ) AS T(col1, col2, col3)
)
SELECT col1
       ,col2
       ,col3
       ,CONCAT_WS('|', col1, col2, col3) AS Concated
       ,HASHBYTES('SHA2_256', CONCAT_WS('|', ISNULL(col1, ''), 
ISNULL(col2, ''),              ISNULL(col3, ''))) AS Concated FROM CTEConcatTest

CONCAT_WS with ISNULL

I hope CONCAT_WS function will have the feature for query developers to have a choice to consider or ignore null values in the future. 

If you have experience in other SQL dialects or recently upgraded SQL Server version, and consider concatenating fields, I strongly recommend reading the 
below MS SQL Docs at first.

No comments:

Post a Comment