Thursday, June 18, 2020

SQL Server: Generate table through Dynamic SQL / Stored Procedures

Version: SQL Server 2019 Developer Edition


Dynamic SQL is the technique which allows users to write SQL statements as strings and execute them. SQL Server supports the Dynamic SQL feature and SQL developers can utilize this feature to write flexible and programmatic queries. However, there is still a lot of controversy over SQL Injection and performance in Dynamic SQL. We will discuss these issues in a later post. In this post, we will create a table dynamically using Dynamic SQL.

First, let's create a table to store schema and input data.
-- CREATE Table statement
CREATE TABLE [dbo].[TableSchema](
	[Id] [int] IDENTITY(1,1) PRIMARY KEY NOT NULL,
	[TableName] [varchar](128) NOT NULL,
	[ColumnName] [varchar](256) NOT NULL,
	[IsPrimaryKey] [bit] NULL,
	[DataType] [varchar](32) NOT NULL,
	[DataLength] [int] NOT NULL,
	[Nullable] [bit] NOT NULL,
	[InsertedDtTm] [datetime2](7) DEFAULT(GetDate())
)

-- INSERT data
INSERT INTO [dbo].[TableSchema] (TableName, ColumnName, IsprimaryKey, DataType, DataLength, Nullable) 
VALUES
('PersonSlim','BusinessEntityID','1','bigint','','0'),
('PersonSlim','PersonType','','varchar','64','1'),
('PersonSlim','Title','','varchar','32','1'),
('PersonSlim','FirstName','','varchar','64','1'),
('PersonSlim','LastName','','varchar','64','1'),
('PersonSlim','Email','','varchar','128','1')












The below chart briefly explains the [TableSchema] table above.

TableName

The name of table

ColumnName

The name of field

IsprimaryKey

If the field is primary key, then 1, else 0

DataType

The data type of the field

DataLength

The data length of the field

Nullable

If the field allows NULL values, then 1, else 0


After saving the data, you must use it to build a CREATE TABLE statement. Before moving to the next step make sure you have permission to create or generate a table (CREATE TABLE statement is a DDL(Data Definition Language.). Once you have confirmed that, we will create a CREATE TABLE statement as a string and put it in VARCHAR type variable and run the Dynamic SQL through the EXECUTE function.

We will create a CREATE TABLE statement as a string with the following template here.

CREATE TABLE [dbo].[PersonSlim]  (
    /* Part 1-Columns */
    [<Columns>],
    
    /* Part 2-Hash rows (concatenating all columns) */
    [_HashValue] AS HASHBYTES(''SHA2_256'', CONCAT_WS(''|''' + <HashValue> +')), 
PRIMARY KEY CLUSTERED (
    
    /* Part 3-Primary Key */
    [ <Primary Key> ] ASC 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF
    , ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]




PART1 - [<Columns>]: concatenating field definitions


DECLARE @tablename VARCHAR(50) = 'PersonSlim'
DECLARE @cols VARCHAR(MAX) = '';

SELECT @cols = @cols + '[' + [ColumnName] + ']' + ' ' + '[' + [DataType] + ']' 
		+ IIF([DataType] = 'varchar', '('+ CAST([DataLength] AS VARCHAR) +')',  '') + ' ' 
		+ IIF([Nullable] = 1, 'NULL', 'NOT NULL')
		+ ',' + CHAR(13)+CHAR(10)
FROM [dbo].[TableSchema]
WHERE TableName = @tablename
ORDER BY [id]

PRINT @cols -- Debug

Point1) @cols  = @cols + [<string | fields>]
There are several ways to concatenate values from multiple rows into single text. I will use “+=”(String Concatenation Assignment) to concatenate values. For reference, SET @var += ‘test_string’ is equal with @var = @var + ‘test_string’.

Point2) CHAR(13)+CHAR(10)
CHAR(13) means carriage return, and CHAR(10) means line break. Here, it was used to improve the readability of the PRINT @cols output statement.

Point3) IIF function is a function of TSQL and can be thought of as a condition statement of general programming.

IIF ( boolean_expression, true_value, false_value ) 

Execute) The above query will output the string below:













PART2 - [_HashValue] AS HASHBYTES(''SHA2_256'', CONCAT_WS(''|''' + <HashValue> +')): Calculate field to hash row by concatenating all columns

DECLARE @tablename VARCHAR(50) = 'PersonSlim'
DECLARE @hash VARCHAR(MAX) = '';

SELECT @hash = @hash + ',ISNULL(CAST(' + [ColumnName] + ' AS VARCHAR), '''')' + CHAR(13)+CHAR(10)
  FROM [dbo].[TableSchema]
 WHERE TableName = @tablename
 ORDER BY [id]
 PRINT @hash -- Debug

Point1) @hash = @hash + [<string | fields>]
I used the same concept which is from Point1 in Part1 - <columns>.

Point2) I used CONCAT_WS function to concatenate fields with pipe(“|”) delimiter. (Reference Post: SQL Server: CONCAT, CONCAT_WS - Concatenating fields to hash row)

Execute) The above query will output the string below:














PART3 - [ <Primary Key> ] ASC: Set Primary Key

DECLARE @tablename VARCHAR(50) = 'PersonSlim'
DECLARE @pk   VARCHAR(32);

SELECT @pk = [ColumnName] 
  FROM [dbo].[TableSchema]
 WHERE [TableName] = @tablename
   AND [IsPrimaryKey] = 1

PRINT @pk -- Debug

There are no specific points here
Execute) The above query will output the below string:






I combined the 3 separate parts above and turned it into one stored procedure (see below).

CREATE OR ALTER PROCEDURE [dbo].[usp_CreateTable]
	@tablename VARCHAR(50)
AS
BEGIN
	DECLARE @cols VARCHAR(MAX) = '';
	DECLARE @hash VARCHAR(MAX) = '';
	DECLARE @pk   VARCHAR(32);

	SELECT	@hash = @hash + ',ISNULL(CAST(' + [ColumnName] + ' AS VARCHAR), '''')' + CHAR(13)+CHAR(10)
	FROM	[dbo].[TableSchema]
	WHERE	TableName = @tablename
	ORDER BY [id]

	SELECT	@cols = @cols + '[' + [ColumnName] + ']' + ' ' + '[' + [DataType] + ']' 
				+ IIF([DataType] = 'varchar', '('+ CAST([DataLength] AS VARCHAR) +')',  '') + ' ' 
				+ IIF([Nullable] = 1, 'NULL', 'NOT NULL')
				+ ',' + CHAR(13)+CHAR(10)
	FROM	[dbo].[TableSchema]
	WHERE	TableName = @tablename
	ORDER BY [id]

	SELECT	@pk = [ColumnName] 
	FROM	[dbo].[TableSchema]
	WHERE	[TableName] = @tablename
	AND		[IsPrimaryKey] = 1

    DECLARE @Sql VARCHAR(MAX) = '
    IF OBJECT_ID(''[dbo].['+@tablename+']'') IS NULL
    BEGIN
        CREATE TABLE [dbo].['+@tablename+']
        (
			'+@cols+'
			[_HashValue] AS HASHBYTES(''SHA2_256'', CONCAT_WS(''|''' + @hash +')),
		PRIMARY KEY CLUSTERED 
		(
			['+@pk+'] ASC
		)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
		) ON [PRIMARY]
    END
    '
    EXEC(@Sql)
END

EXEC dbo.usp_CreateTable @tablename = 'PersonSlim'

The stored procedure will generate the “PersonSlim” table like the screenshot below:























I hope this content helped you to understand how to use Dynamic SQL. There is still much debate about Dynamic SQL. The flexibility of Dynamic SQL makes it a very advantageous and powerful tool for SQL developers. As mentioned  before, this post highlights what is Dynamic SQL and demonstrates how to use it in a simple case. (* If you are interested in the Dynamic SQL, I recommend researching “Static SQL Vs. Dynamic SQL”)

# Reference


Monday, June 15, 2020

SQL Server: NULL NULL NULL

Version: SQL Server 2019 Developer Edition
Sample Database: AdventureWorks2017


Sometimes NULL values trigger issues not only in SQL Servers, but also in most database engines. When I train my colleagues to sharpen their SQL skills, I often present examples of NULL behaviors. By understanding NULL behaviors we can reduce a lot of bugs. Even after demonstrating factors and understanding what may cause NULL behaviors, I see many developers, including myself, continue to encounter them.😓 I am sharing some cases that I came across:

#1. NULL + 10 / N'STR' + NULL / NULL + NULL

In SQL, "NULL" means data doesn't exist in the field. It does not mean zero(0) or empty('') field.
In most SQL dialects when NULL is used in arithmetic operations, or string concatenations, the output returns NULL. SQL Server has the same behaviors. [ie. NULL + [string] query will return NULL] Although these rules are very simple, it is important to understand these behaviors to reduce bugs and create optimal data sets.

***If you are using concatenating stings with SQL server 2012 or later. I recommend researching and learning more about CONCAT() function instead of using "+".

#2. JOIN with NULL values

This is related with case #1 above. In SQL Server, NULL = NULL and NULL <> NULL is considered as UNKNOWN. When we use a nullable column on INNER JOIN clause, the number of row counts can be different than what we expect. 
Because of this, we need to validate output more carefully when a query uses INNER JOIN clause with multiple columns which include columns that are nullable.

When I need to join nullable columns to generate reports, I like to create sample cases by converting NULL values into unique values. I do this by using COALESCE, or ISNULL functions. After I confirm that the output is what my clients want, I run query on the actual data.

...
  FROM [Table1] AS T01
  JOIN [Table2] AS T02
    ON ISNULL(T01.[Nullable Column], '9090909') = ISNULL(T02.[Nullable Column], '9090909')
...

#3. COUNT from Nullable field

Although heading #3 only mentions COUNT from Nullable field, this topic is also related with all aggregated functions.

Many users are using COUNT(*) or COUNT(1) function to get the number of rows in a table. 

However when a nullable field name is entered into the function, it is possible to get an unexpected or different number of rows  due to NULL values. 

If you look at the screenshot with the two queries above, we can see that both queries are the same except the bottom query has a nullable field name "MiddleName." As you can see, the bottom query returns less number of rows.

In SQL Server, aggregate functions ignore NULL values except COUNT(*) or COUNT(1). I usually check for this type of NULL behavior when I review my coworker's queries. I covered the negative characteristics of this NULL behavior on aggregate functions, however, this NULL behavior can also be used as a powerful feature on calculated fields. (I will discuss this on a different page.) 

#4. ORDER BY with NULL

If we use ORDER BY clause with the DESC clause, NULL values will be placed last. If we use ORDER BY clause with ASC (default) clause, NULL values will be placed first. Some SQL dialects has a syntax: (NULLS LAST/FIRST), to control the order of the NULL values. Unfortunately, SQL Server does not have this type of syntax. If we need to control the order of the NULL values using the ORDER BY clause, the query below can be helpful.

SELECT TOP 10 MiddleName
  FROM AdventureWorks2017.Person.Person
 ORDER BY IIF(MiddleName IS NULL, 1, 0), MiddleName














Friday, June 12, 2020

SQL Server: OPENJSON to explode array in JSON format

Version: SQL Server 2019 Developer Edition


Sometimes, I come across cases where list of values are stored in JSON(JavaScript Object Notation) format. Examples include: 
  • Values from multiple selected/checked items from forms.
  • Survey related data. (ie. questions and unstructured answers)
Since the beginning of SQL Server 2016, SQL Server has had capabilities to handle JSON data format. Today, I will post a memo about how to explode array in JSON values. I created a sample table to save into JSON format data. It includes the field("Skills) which allows only JSON format. I put CHECK constraint to make sure the target string has valid JSON format.

CREATE TABLE json_test (
	Id	         INTEGER IDENTITY(1,1) PRIMARY KEY,
	FirstName	 VARCHAR(64),
	LastName	 VARCHAR(64),
	Email		 VARCHAR(64),
	Skills		 VARCHAR(MAX) CHECK (ISJSON(Skills)>0)
)

Some samples inserted
INSERT INTO json_test VALUES
('Mike', 'Lee', 'ml@email.com', N'{"skills": ["Excel", "TSQL", "MS Word"]}'),
('Robert', 'Sean', 'rs@email.com', N'{"skills": ["TSQL", "Python", "Spark"]}'),
('Ken', 'Macy', 'km@email.com', N'{"skills": ["PL/SQL", "TSQL", "Access"]}'),
('Lisa', 'Pang', 'lp@email.com', N'{"skills": ["Excel", "PowerShell"]}')

JSON Sample








Let's assume that our goal to report about the frequency distribution of skills. The "Skills" field has a list of skills in array format in JSON. To accomplish our goal, we need to first explode this list of skills. 

OPENJSON() function helps with parsing JSON text. When we run the query below, the output table displays the objects and properties: "key", "value", and "type" fields.

SELECT *
  FROM json_test AS jt
 CROSS APPLY OPENJSON(jt.skills) 










According to MS Doc(link) "type = 4" means array data type. This means that OPENJSON() operator can expands the array of values and second arguments(path) can handle this task.

SELECT *
  FROM json_test AS jt
 CROSS APPLY OPENJSON(jt.skills, '$.skills') 

After we explode the array, we can add aggregate queries to reach our goal which is to report about the frequency distribution of skills like below.

SELECT value, COUNT(*)
  FROM json_test AS jt
 CROSS APPLY OPENJSON(jt.skills, '$.skills') 
 GROUP BY value
 ORDER BY COUNT(*) DESC















JSON is a popular data format used in web applications. Some NoSQL database use this JSON format to store unstructured data. It is common these days to see companies operating on both relational and NoSQL conceptual databases while, at the same time, managing data with both structured and complex structured ways. JSON functions in SQL Server provide developers with more flexible options to handle data.

Wednesday, June 10, 2020

SQL Server: STRING_AGG append columns to display sample values

Version: SQL Server 2019 Developer Edition
Sample Database: AdventureWorks2017

When validating data from aggregate queries, we want to see or display some sample related primary key, Id, or value. For example, IT manager requests the report of the list of internal computers which have the most suspicious outbound traffic, and see some sample destination address or host name like below format. 

PC Name 
( Field for group by) 
Count of suspicious traffic 
(Aggregate field)
Sample Destination
(Sample records)
 PC_1  300 IP1, IP2...
 PC_2  287 IP1, IP3...
 PC_3  266 IP1, IP4... 
 ... ...   ...

From BI area, maybe there are nice tools and way to display samples on the visualization level. From this page, I post the simple adhoc query with some concept:

SELECT StateProvinceName
       ,City
       ,COUNT(BusinessEntityID) AS CountId
       ,REPLICATE('<3', COUNT(BusinessEntityID)/3)
       ,STRING_AGG(IIF(RN < 3, FullName, null), N',') AS Samples FROM (SELECT StateProvinceName ,City        ,BusinessEntityID ,CONCAT(FirstName, ' ', LastName) AS FullName ,ROW_NUMBER() OVER( PARTITION BY StateProvinceName, City ORDER BY NEWID()) AS RN         FROM AdventureWorks2017.HumanResources.vEmployee) AS M GROUP BY StateProvinceName, City ORDER BY CountId DESC

  • In subquery, ROW_NUMBER() window functions assign a sequential integer each row. 
  • From TSQL, ORDER BY clause is required argument from ROW_NUMBER() function.  From this query, NEWID() function helps pick random samples each execution.
  • STRING_AGG() is aggregate function to help concatenate strings on rows into a single line string (not list). In this example, the query picking two samples (RN < 3) per each group.
  • About REPLICATE() function, it is just for fun. 😁

For more information, MS SQL Doc link is always helpful.



Tuesday, June 9, 2020

SQL Server: CONCAT, CONCAT_WS - Concatenating fields to hash row

Version: SQL Server 2019 Developer Edition

1. CONCAT ( string_value1, string_value2 [, string_valueN ] )


The CONCAT function is for concatenating values together. Because the CONCAT function implicitly converts null values to empty strings, sometimes we are faced with a situation where concatenated values get the same output and generate identical hashed outputs in the end. One possible solution is to place delimiter between fields like below:

WITH CTEConcatTest AS (
    SELECT * 
      FROM (
	VALUES('a', null, 'bcd'), ('ab', 'c', 'd')
      ) AS T(col1, col2, col3)
)
SELECT col1
       ,col2
       ,col3
       ,CONCAT(col1, '|', col2, '|', col3) AS Concated
       ,HASHBYTES('SHA2_256', CONCAT(col1, '|',
                                     col2, '|', 
                                     col3)) AS HashConcated
  FROM CTEConcatTest








2. CONCAT_WS ( separator, argument1, argument2 [, argumentN]... )


Starting from 2017, CONCAT_WS(Concatenate With Separator) function can be helpful for handling the above situation. 

WITH CTEConcatTest AS (
    SELECT * 
      FROM (
	VALUES('a', null, 'bcd'), ('ab', 'c', 'd')
      ) AS T(col1, col2, col3)
)
SELECT col1
       ,col2
       ,col3
       ,CONCAT_WS('|', col1, col2, col3) AS Concated
       ,HASHBYTES('SHA2_256', CONCAT_WS('|', 
				col1, col2, col3)) AS HashConcated
  FROM CTEConcatTest

CONCAT_WS

But we need to pay attention to hashed output from first record('a', null, 'bcd'). CONCAT_WS hashed value is different from CONCAT version. Let's compare concatenated value from "Concated" field. CONCAT version shows "a|(pipe)|(pipe)bcd", but CONCAT_WS version shows "a|(pipe)bcd". 😕 According to MS SQL Docs, CONCAT_WS ignores null values. This behavior is different from CONCAT which converts null values to empty strings implicitly. To make CONCAT_WS consider null field as an empty string like CONCAT, we can wrap the field with the ISNULL() function.

WITH CTEConcatTest AS (
    SELECT * 
      FROM (
	VALUES('a', null, 'bcd'), ('ab', 'c', 'd')
      ) AS T(col1, col2, col3)
)
SELECT col1
       ,col2
       ,col3
       ,CONCAT_WS('|', col1, col2, col3) AS Concated
       ,HASHBYTES('SHA2_256', CONCAT_WS('|', ISNULL(col1, ''), 
ISNULL(col2, ''),              ISNULL(col3, ''))) AS Concated FROM CTEConcatTest

CONCAT_WS with ISNULL

I hope CONCAT_WS function will have the feature for query developers to have a choice to consider or ignore null values in the future. 

If you have experience in other SQL dialects or recently upgraded SQL Server version, and consider concatenating fields, I strongly recommend reading the 
below MS SQL Docs at first.