Using SQL Server Data Compression
Over time, things come along that you think “that would have been so useful” in the past.
Data Compression is one of those features within SQL Server.
Data Compression has been around for a few years. It was introduced in SQL Server 2008. I remember a conversation with Tony Rogerson (UK SQL Server User Group) at the time. I was asking for reasons that I could put forward to justify an upgrade to SQL Server 2008 – Data Compression was right up there. Unfortunately, it was an Enterprise only feature, making my case for upgrading even more difficult – no-one was prepared to invest in the Upgrade, let along the additional cost of going to Enterprise Edition.
Eight years on, SQL Server 2016 SP1 changed all of that, when Microsoft included Data Compression in Standard Edition, bringing it to the masses (so long as you upgrade SQL Server, that is).
Data Compression isn’t for everyone, but I’ve had great success using it in a Data Warehouse environment. It’s extremely effective in reducing storage size on this type of Data, with typical compression rates of around 80 – 90 percent.
Compressed Data means less storage requirements – on disk and in memory. This makes it particularly useful in situations where you are low on storage or memory, provided that you have sufficient processor power for the magic to happen. It often produces performance gains too as more data is held in cache (fast) and less physical reads are required from disk (slow!).
There are trade-offs with Data Compression however:
- Not all types of Data will compress well.
- CPU utilisation will increase as more work is required in order to do the compression
- Additional space is required in your Data File while Compressing a Table
- Care needs to be taken to implement Compression in all environments
It’s worth bearing in mind that the use of Data Compression could be masking another problem.
I’ve spoken of this before as “Databloat” – the use of inappropriate DataTypes for storage within SQL Server. For example, defining a column as an Integer when the nature of the Data contained within only ranges from 1 – 100. This would cause that column to take 8 bytes for every row whereas use of a more appropriate DataType, such TinyInt would only require 1 byte per row.
Using Data Compression would reduce the size of this Table considerably, and many of the benefits of the smaller table would be realised – less pages on disk / in memory etc. The issue would come where SQL Server is creating a Query Plan. Query Plans are based on the Data Types stored and the Optimizer will use the “logical size” of the Table structure to work out the Query Plan, ignoring the fact that it’s compressed. This can mess with the memory grant. If the original Table structure is huge then memory request is also going to be huge and this may not be desirable, causing spills to tempdb or worse still, spills to disk. Still, compared against the re-development of code and subsequent testing then using Data Compression can certainly be a cost-effective stop gap.
Data Compression Types
There are two “general” levels of compression, Row and Page. Row compression removes unused storage space in fixed-length data, whereas Page compression removes duplicate values within a Data Page. In my experience, Row Compression can be good, but Page compression is better – providing better storage gains, although a bit more CPU power is required to compress / de-compress the Data. In fact, Page Compression implements Row Compression, and then does it’s work.
Data Compression is implemented on a Table by Table basis. I often wondered why you couldn’t just set a Database to be Compressed and forget about it, but it isn’t effective in every scenario.
You will notice that not everything will compress well – variable width character strings don’t and “special” Data Types such as BLOBs aren’t great. Over time, you’ll get to know where you’re likely to get the best savings.
Estimating gains through Data Compression
You can use the “sp_estimate_data_compress” Stored Procedure to get estimates on the final size of compressed Tables, using either Row or Page compression. I use the following script to create scripts to run sp_estimate_data_compress on all Tables in the current Database. The results help me choose the best candidates for compression, together with the type of compression.
As ever, run the scripts and then test, test, test! Once you’ve Compressed a Table, you may find yourself with a fair bit of space in the Database. If you’ve got natural Data growth going on or if you don’t need the space back, then best leave the Database alone. Otherwise, consider whether this is a good candidate for Shrinking.
-- Build Dynamic SQL Scripts to obtain estimated Compression savings (Row and Page) SELECT sch.[NAME] AS 'SchemaName' , tbl.[NAME] AS 'TableName' , 'EXEC sp_estimate_data_compression_savings [' + sch.[name] + '], [' + tbl.[name] + '], ' + CONVERT(VARCHAR(5), ind.index_Id) + ', NULL, ROW;' AS 'ScriptToEstimateRowCompression' , 'EXEC sp_estimate_data_compression_savings [' + sch.[name] + '], [' + tbl.[name] + '], ' + CONVERT(VARCHAR(5), ind.index_Id) + ', NULL, PAGE;' AS 'ScriptToEstimatePageCompression' FROM sys.tables tbl JOIN sys.schemas sch ON tbl.schema_id = sch.schema_id JOIN sys.indexes ind ON tbl.object_id = ind.object_id ORDER BY sch.schema_id, tbl.object_id, ind.index_id;
Environments and Deployments
It’s worth mentioning consistency across environments at this point. There’s some excellent deployment tools that will perform comparisons between environments before creating scripts to automatically “make it so”. Depending on the tool used, and the settings selected, this can include storage details too. I’ve experienced a situation whereby the deployment team had a a mismatch between UAT (Uncompressed) and Production (Compressed) that resulted in SQL Server attempting to make the environments consistent – by de-compressing the “mismatched” Tables… filling the Data file… filling the Drive! Disk space alerting picked this one up and I was able to address it before any major issues, but it could have been nasty.