WHAT’S A VLF AND WHY DO I CARE?

Some Background

SQL Server ensures that all changes to the database, either data or structures such as indexes, are preserved by first writing to the transaction log file, than then writing to the physical database.

SQL Server only uses one transaction log per database at a single point in time. It starts writing copies of the changes made at the “front” of the log file, and as changes are committed (written to the physical database) it releases space in the transaction log for reuse. This is called a “round robin” architecture.

SQL Server manages space in the transaction log via VLFs – virtual log files. Inside the one tlog file are a bunch of VLFs – smaller chunks of space that are easier for the SQL Server to manage. Despite the size of a single database, there’s no fixed size for a VLF, and no fixed number of VLFs per physical tlog file, although it’s not uncommon for a tlog file to start life with 4 or 8 VLFs.

SQL Server tlog with VLFs

The space in a VLF is reused when all the transactions/changes stored in that VLF have been written to the physical database following COMMIT records that have been issued for these operations.

What causes the number of VLFs to grow?

Lack of tlog backups and open transactions, mostly.

IF the SQL Recovery Model is set to FULL or BULK-LOGGED and transaction log backups are never taken or are taken too infrequently, then SQL Server never has a chance to clear out the VLFs which have committed changes, so it must continue adding new VLFs to the stack.

IF there is an open transaction, i.e. no COMMIT statement has been issued to “close out” the transaction, then SQL Server has no option but to leave those VLFs which contain portions of the open transaction (“active” VLFs) untouched. And, since it must continue writing copies of other changes that are taking place in the database, it must add new VLFs to the stack.

The only way that SQL Server can add new VLFs to the stack is if auto-grow is enabled. Otherwise, SQL Server will report an “out of space in the log” error and stop processing transactions.

Trivia Fact: when the SQL Server tlog file (auto-)grows between 1-64MB, it adds 4 VLFs to the stack. When it grows between 64-9999MB, it adds 8 VLFs to the stack. When it grows more than 1GB, it adds 16 VLFs to the stack. The default SQL Server auto-grow size is 10%, which means that, as the tlog continues to grow, each set of VLFs created will be a slightly larger size than the previous set.

Trivia Question: if a SQL Server database is created with the default values (tlog file = 10MB), by the time the tlog file has auto-grown to 500MB (1/2 GB), how many VLFs will there be?

Trivia Answer: 200 VLFs

Why is this such a big deal?

Today’s databases are usually large, measuring in GBs (1000 MB = 1 GB) rather than MBs. It’s not unusual to have a 50GB tlog file, and if it’s auto-growing at 10MB per growth segment, the VLF count will be on the order of 20,000.

Too many VLFs impact database operations:

  • Inserts, updates and deletes take longer;
  • Database startup takes longer;
  • Database restores take longer;
  • Database backups are larger because all “active” VLFs (from open transactions) must be backed up;
  • Tlog file fragmentation; SQL Server must search for the appropriate VLF whenever writing to the log file or clearing a VLF for reuse.

 

Is there a problem if VLFs get too big?

Yes. Overly large VLFs (4GB, for instance) can lead to performance problems, also. Clearing a very large VLF takes a long time and takes resources away from transaction processing. Very large VLFs also take more time to be released from the “active” status, since they store so many more changes than a regularly-sized VLF, so this may result in SQL Server having to create even more very large VLFs.

What’s the Solution?

There is no hard and fast rule as to how many VLFs any database should have.

  • Too many VLFs result in tlog fragmentation and the impacts listed above;
  • Too few and very large VLFs cause slow performance.

The goal: have the tlog be as large as necessary, with a reasonable autogrow increment (example: 128MB, or some similar number divisible by 4 or 8 or 16).

How to “Fix” a Tlog File With Lots of VLFs…

  1. Figure out how large the log size should be for the database in question:
    1. How much data is being added to the database in one year’s time?
    2. How often are index rebuilds being done?
    3. Are there excessively-long transactions that are running against the database? Can they be shortened?
    4. Is data being bulk-loaded into the database on a regular basis? How much and when?
    5. What’s the disk usage history? (applies only to mature databases…)
  2. Shrink the tlog to the smallest size possible.
    1. Use the dbcc shrinkfile command to shrink the log file, and do it in increments, within a maintenance window.
  3. Re-grow the tlog file to an appropriate size, again in increments, within the same maintenance window.
    1. Growing too fast can result in too-large VLFs
    2. Use the alter database modify file command to incrementally grow the existing log file
  4. Check the auto-grow settings and adjust to the new database tlog sizing.

 

CODE:

— get the file layout, the amount of space used AND the number of VLFs for each database ————

EXECUTE master.dbo.sp_MSforeachdb ‘USE [?];

exec sp_helpfile; exec sp_spaceused; dbcc loginfo;

 

 

— get a full report on file conditions for a database ————

Select DB_NAME() AS [DatabaseName], Name, physical_name,

Cast(Cast(Round(cast(size as decimal) * 8.0/1024.0,2) as decimal(18,2)) as nvarchar) Size,

Cast(Cast(Round(cast(size as decimal) * 8.0/1024.0,2)

as decimal(18,2)) – Cast(FILEPROPERTY(name, ‘SpaceUsed’) * 8.0/1024.0 as decimal(18,2)) as nvarchar) As FreeSpace

From sys.database_files

 

— check the number of VLFs and force a checkpoint ———-

dbcc loginfo

 

checkpoint

 

 

/*

NOTE: if the database is small enough to be backed up twice during the maintenance window, then switching the recovery model from FULL to SIMPLE back to FULL will work. Otherwise, don’t switch, leave at FULL or BULK-LOGGED, and incrementally shrink the log file, running the dbcc shrinkfile command multiple times.

*/

 

— take full db backup —————————-

BACKUP DATABASE model

TO DISK = N’Z:\SSRSbkups\model\model_full_bkup_20151116_before.BAK’

WITH NOFORMAT, INIT, NAME = N’model Full Database Backup’,

SKIP, NOREWIND, NOUNLOAD, STATS = 25

 

— if the recovery model is not SIMPLE then back up the translog —-

BACKUP LOG model

TO DISK = N’Z:\SSRSbkups\model\model\model_log_bkup.TRN’

WITH NOFORMAT, INIT, NAME = N’model Tlog Backup’,

SKIP, NOREWIND, NOUNLOAD, STATS = 25

 

 

— determine the recovery model —————

select databasepropertyex(‘model’, ‘recovery’)

 

 

— set db to simple recovery model —————————-

ALTER DATABASE model

SET RECOVERY SIMPLE

 

— shrink the log file —————————-

— VLFs are created in sets of 4, 8 or 16 ———

DBCC SHRINKFILE (N’modellog’ , 4)

GO

 

 

— check the number of VLFs

dbcc loginfo                — 8

 

 

— set db back to its normal recovery model —————————-

ALTER DATABASE model

SET RECOVERY FULL

 

 

— check the recovery model —————

select databasepropertyex(‘model’, ‘recovery’)

 

 

Increase the size of the transaction log file:

USE [master]

GO

ALTER DATABASE [AdventureWorks2014] MODIFY FILE (NAME = N’AdventureWorks2014_Log’, SIZE = 20480KB )

GO

 

— take full db backup —————————-

— since the recovery model was switched from FULL to SIMPLE back to FULL,

— it’s highly probable that any transaction log backup taken from this point forward

— would not work with the full backup taken prior to the start of this process…

————————————————————————————–

BACKUP DATABASE model

TO DISK = N’Z:\SSRSbkups\model\model_full_bkup_20151116_after.BAK’

WITH NOFORMAT, INIT, NAME = N’model Full Database Backup’,

SKIP, NOREWIND, NOUNLOAD, STATS = 25

 

— if the recovery model is not SIMPLE then back up the translog —-

BACKUP LOG model

TO DISK = N’Z:\SSRSbkups\model\model\model_log_bkup.TRN’

WITH NOFORMAT, INIT, NAME = N’model Tlog Backup’,

SKIP, NOREWIND, NOUNLOAD, STATS = 25

 

/*  an extra bit of code */

———————————————————

— if you suspect an open transaction, identify it

———————————————————

USE master

GO

SELECT * FROM sysprocesses WHERE open_tran = 1 order by last_batch desc

 

USE Model

GO

DBCC OPENTRAN(‘Model’)

GO

To Learn More About SQL Server Transaction Log Files:

Brent Ozar – Jess Borland video (20min): http://brentozar.com/go/vlf

Thomas LaRock: http://thomaslarock.com/2012/08/the-trouble-with-transaction-logs/

Sebastian Meine: http://sqlity.net/en/1805/eight-reasons-transaction-log-files-keep-growing/

Michelle Gutzeit: http://searchsqlserver.techtarget.com/tip/Testing-transaction-log-autogrowth-behavior-in-SQL-Server

Greg Larsen: https://www.simple-talk.com/sql/database-administration/sql-server-transaction-log-fragmentation-a-primer/

Paul S. Randall: https://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx

Ivan Stankovic: http://www.sqlshack.com/top-10-articles-sql-server-transaction-log/

 

Teaching is Learning, or How to Survive the SQL Server Learning Curve

teaching-learningDan Crisan, @dandancrisan, a full-time student and tech blogger, has stumbled upon a great way to learn new material — and that’s to write about it. He did just that, taking the material from his Intro to Database Systems course and crafting it into a series of postings called “A Tiny Intro to Database Systems”, http://blog.dancrisan.com/a-tiny-intro-to-database-systems.

Learning by teaching is not a new concept. Dan explained how he stumbled upon Daniel Lemire’s blog posting “How to Learn Efficiently” (http://lemire.me/blog/archives/2014/12/30/how-to-learn-efficiently/), in which Lemire states: “[W]riting … would be a very effective way to learn the material. Teaching is a great way to learn, because it challenges you.” This, in itself, is a good blog, and if you’re like the majority of SQL Server DBAs, always on a continuous learning curve(will Microsoft never give us a moment to relax? I just got a copy of the SQL Server 2016 data sheet from the Ignite conference in Chicago! (And thinking about the continuous learning curve, this gives SQL Server’s “Always On” a whole new meaning…) Argh!), adopting this learning style may be a big help when trying to assimilate the newest, brightest and best features of SQL Server.

A little history about Ted Codd's relational model, circa 1970, http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/reldb/

A little history about Ted Codd’s relational model, circa 1970, http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/reldb/

Where was I?  Oh yes — the purpose of this posting is to point you to a sweet space where you can quickly review what you learned in database class all those years ago. The dashboard site to “A Tiny Intro to Database Systems” is not the place to learn about databases for the first time. Rather, it’s the place to go for a refresher, a review of concepts and terminology that we all knew at one time but have forgotten in the crush of everyday business.

And why bother to review? Because every now and then it’s good to go back to the beginning, to understand why SQL Server and it’s relational counterparts are so valuable. The material presented in these tiny overviews is not specific to SQL Server, but in most cases it’s the way SQL Server works underneath the hood. So grab your sandwich and spend a lunchtime with this walk down memory lane.

Thanks for reading. I publish links to everything I write (my blog, my SQL DBA Knowledge Base, and my LinkedIn Publishing) on my Facebook page, visit it for more goodies. Please like, comment, and share.

https://www.facebook.com/MountVernonDataSystems?ref=hl

Michelle Poolet, Zachman-Certified™ Enterprise Architect
Michelle@MountVernonDataSystems.com
@SQLinTheCloud

SQL Server Data Compression – Right for You?

Data compression is a technology that’s been around since SQL Server 2008. The idea of data compression is that you can selectively choose tables, indexes, or partitions within a database and shrink the amount of space used to store the data.

Now, before you get too excited about this scalability and performance feature, at the time of this writing data compression is limited to the Enterprise Edition of all versions of SQL Server which are currently on the market. The Developer Edition is a single-user SQL Server Enterprise, and it, too, supports data compression.

If you’re still reading, then you’re probably using SQL Server Enterprise, and if you need to squeeze a bit more space out of your disk farm, then you might want to consider data compression.

With the advent of SQL Server 2014, we now speak of data in storage as either “rowstore” or “columnstore”. If you’re managing columnstore tables and indexes then that’s already using columnstore compression, which you can’t configure. The tips and reasoning in this posting reference only rowstore (or regular, as you’ve always know it) data storage.

There are plenty of “how to” posts on SQL Server data compression, but teasing out the “should I or shouldn’t I” is a little trickier. Some of the better blog posts and article have the “should I or shouldn’t I” points interspersed with descriptions of how the technology works. Other posts just show you how to compress data, and don’t really take into consideration the pluses and minuses of data compression.

First, data compression takes resources, lots of resources – CPU and memory, specifically – so if your SQL Server is already CPU-bound or memory-constrained, forget this idea. Go get more memory, faster CPUs, or tune the queries, but don’t add more load to the SQL Server, or your DBA career will be history.

Assuming that you’ve got unused CPU (in the 10-20% range) and available memory capacity, then you’ll want to consider the type of data compression to use – row or page.

Row compression compresses data records, effectively squeezing more records into a single page, so that with each physical I/O you’re reading more records into memory than if the data were uncompressed. That’s highly efficient, since the single most expensive operation in a conventional computer is physical I/O (excluding SSDs and flash drives).

So how does it work?

Glad you asked. Every data row in storage has record headers which contain metadata – information about each field in the record, its length and offset, etc. – this is called row overhead. Sometimes the row or metadata overhead takes more space than the data. Row compression reduces the amount of space required to store the metadata overhead (Microsoft isn’t telling us how it does this), plus it changes the data types in storage. Say, for instance, you have a table with several fixed-length columns, both character and numeric. Row compression will change those data types from fixed to variable-length. Char, integer, decimal, float, datetime, and money data types are all converted to their variable-length equivalents, sometimes resulting in massive space savings. The details for each data type and how they’re converted are covered at https://msdn.microsoft.com/en-us/library/cc280576.aspx.

Page compression enhances row compression by adding prefix compression and then dictionary compression to row compression…so, in order, a page compression operation first compresses the row (row compression, converting fixed-length data types to variable-length and minimizing the metadata overhead); then it determines a valid prefix for data on each page and compresses using that algorithm (prefix compression); then it looks for repeating values on each page, and compresses using that algorithm(dictionary compression).

To get a better idea of how page compression works, a picture is literally worth a thousand words. I’ve borrowed these illustrations from the MSDN library, https://msdn.microsoft.com/en-us/library/cc280464.aspx. Thank you, MSDN.

Data compression pix_1

An uncompressed page of data.

This first figure represents a page of data in storage. Prefix and dictionary compression work page-by-page, determining the best compression “operator” for each page, storing these operators in something called a “CI” structure that is the last entry in each page header.

Notice that the data in this example page contains commonalities – many of the data values start with the same characters (prefix). This might be a common condition with employee ID values, for instance, especially if the table were clustered by these ID values.

Data compression pix_2

A page of data with prefix (step 1 of page) compression applied.

The next figure shows what the data values look like in storage after SQL Server has gone through and done prefix compression. The prefix operators are stored at the end of the page header. The first column, first row, can be decompressed by following the pointer (4) and adding the first four characters of the associated prefix operator to the remaining character in the column, thereby rebuilding the original data value (aaabb). Of course, the pointers are going to be much more complex than this illustration – but you get the idea. The storage savings are pretty phenomenal, close to 25% by my simplistic calculations for this random data set. If you allow a single byte for each character in the data row, and another character for each prefix value in the CI section of the page header, the uncompressed data = 45 bytes, the compressed data = 34 bytes, a 25% reduction.

Data compression pix_3

A page of data after dictionary (step 2 of page) compression has been applied.

Following prefix compression, SQL Server then applies dictionary compression, as it looks for repeating values anywhere on the page and replaces them with a dictionary compression operator and pointers, storing these in the CI structure of the page header.

If your eyes haven’t crossed by now, you’ll notice that the compressed byte count has dropped to 31, or maybe 29, depending on how those zeros in the top row are handled. Nevertheless, the storage reduction is somewhere between 35% and 40%, which could be reason enough to try implementing page data compression.

 

Bottom line: the less randomized the data set, the more consistent the prefixes, and the more common strings in each data value, then the greater the storage savings that you can expect.

When does data get uncompressed?

While the answer to this question doesn’t strictly fit with the reason I’m writing this post, I’m glad you asked, because it shows you’re curious, and that’s good.

Data does not get uncompressed in memory – there is no “clear text” version of the data that some creepy hacker trolling memory could get hold of. Data is compressed on disk and it stays compressed while in memory until one of two things happen: data is read as part of a filter, sort, join or query response, or the data is updated – and then it’s not the entire page that’s uncompressed, just the data fields which are being read or updated.

We’ve talked about how uncompressing data consumes CPU, but those extra CPU cycles may be compensated for by the reduction in both physical and logical I/O. Because more rows of compressed data fit on a single page, more data is read into memory with each physical I/O, resulting in fewer I/Os overall. More data can remain cached in the buffer pool, and smaller volumes of data are written back to the disk. If the data is already in memory, fewer logical I/Os result in less work for the CPUs – possibly compensating for the cost of the CPUs compressing and uncompressing the data. The best savings in physical and logical I/O is during table and index scans. Using execution plans to determine the predominant query resolution operation (scans v. lookups) will help you determine best candidates for compression.

What’s a good candidate?

Now that you understand how data compression works – what’s happening internally, even at a very high level, you can better appreciate the CPU and memory overhead that’s required to decompress the data so that it can be delivered back to the calling applications in some understandable form. So, which situations are best suited for data compression, and which are not?

The ideal data set for data compression – other than having the characteristics I’ve just cited above – is data that is write once, read many. The best candidates are large tables and indexes where the scan-to-update operation is roughly a 3:1 or 4:1 ratio, or higher. If operations on a large table are 75% (or more) scans and 20% (or fewer) updates, then this may be a very good candidate for data compression. Data warehouses, and fact tables, especially, may be excellent candidates for data compression.

This is data compression, so patterns in the data matter. If you have a table with numeric/decimal or fixed-length character data types, where most values don’t need all the space allocated by the data type definition – these columns will row-compress very nicely. Nullable columns where most of the rows are NULL, and columns where the data values repeat (state codes, for instance), are also going to compress well.

What doesn’t compress so well? Filestream data; text and image data types (which, incidentally, are being deprecated with SQL Server 2014) – anything that is stored out of row (data in overflow areas is not compressed); repeating strings within the data value, but without repeating prefixes; data with high variability and little repeatability; variable-length data types.

What can you compress? Anything in the following list that meets the above criteria…

  • A table with no organization (“heap” structure);
  • A table organized as a clustered index;
  • A nonclustered index;
  • An indexed view (remember, these do persist on disk);
  • Partitioned tables and indexes (each partition can be configured differently);

Where are the gotchas?

Data compression is done one time, to whichever data sets (tables, indexes, partitions) on whichever SQL Server instances (production, development, or test) you choose. So what happens if you create a new index on a compressed table – do you think that the new index “inherits” the data compression?

Nope.

That’s right, now you’ve got a table that’s compressed but an index on that table that’s not. It might take you a little while to figure out why the index you just created isn’t giving you the performance boost you were expecting. Well, you say, I can just check with SQL Server Management Studio and to see what’s going on.

Data compression pix_4

Table person.contact has been page compressed but its non-clustered indexes have not – can you tell?

Not easily you can’t.

In the figure to the right, the table Person.Contact is page-compressed; the clustered index PK_Contact_ContactID is also (well, yes, since the data is stored at the leaf level of the clustered index), but the other indexes are not.

It would be great if compressed objects would turn blue or something like the compressed files in Windows Explorer, but that doesn’t happen.

So here’s how you find out what’s compressed, and what’s not. When you execute this code…

SELECT t.name AS [Table], i.name AS [Index], i.type_desc AS IndexType, p.partition_number AS [Partition], p.data_compression_desc AS [Compression]

FROM sys.partitions p

INNER JOIN sys.tables t ON t.object_id = p.object_id

INNER JOIN sys.indexes i ON i.object_id = p.object_id AND i.index_id = p.index_id

ORDER BY t.name, i.type_desc, i.name

…then you get a list of tables, indexes and partitions, and their compression status. It’s much, much easier than clicking through SQL Server Management Studio, checking Storage>Manage Compression for each object.

Back to the issue at hand, and that’s keeping everyone and everything synched. Having your dev and test environments compressed, but your production environment uncompressed is going to produce some interesting and unexpected results in production. The worst case is to have a scattering of objects in some environments compressed, but the same objects in other environments uncompressed. Better to have them all the same.

A recommendation – when embracing data compression, adopt a process something like:

  1. Determine which objects (tables, indexes, partitions) need compression;
  2. Compress in all environments (production, development, test);
  3. Stage the compressions (don’t do them all at once) and compress during low-activity time periods;
  4. Institute a change management process so that objects which are added after initial compression can also be compressed, if needed;
  5. Automate as much of this process so that you’re generating reports periodically, then review those reports to ensure that your systems are in sync.

The next “gotcha” is the case where compressing the data actually causes storage to increase. As you saw at the beginning of this posting, data compression adds extra space to each page header – mostly this is the metadata that decodes the prefix and dictionary portions of page compression. If the data values are too varied, if there’s not a consistency in the first or “prefix” portion of the strings, if there are minimal repeating value strings within the data sets, then it’s possible that this metadata overhead, as it’s called, can be greater than the space saved. In this case, you’ll want to back out of page compression and apply just row compression. If that doesn’t alleviate the problem, uncompress the data completely, and start campaigning for SSD storage devices.

Combining compression

I’ll bet you’re asking yourself if you should bother with compression when you’re doing SQL Server backups now that you’ve got data compression turned on. The short answer is yes.

Backup compression is quite different from data compression; its algorithm is a lot like zipping a file. Data compression and backup compression co-exist nicely, and can result in additional storage space savings, so go ahead and indulge.

There are most certainly some of you who are using SQL Server data compression and you’re considering trading in your spindle-based hard disks for SSDs or flash drive arrays, now that the price of SSDs is dropping while the reliability is skyrocketing. The best advice that I can offer you, based on colleagues who are already doing this, is to keep the data compression in place. Not only are they stuffing each I/O with more records than if the data were uncompressed, they’re now circumventing the I/O bottleneck. And they’re doing the DBA happy dance.

This all sounds pretty cool, but as with all things SQL Server, there are tradeoffs. The greater the compression, the greater the load on CPU (and memory) as data is exchanged with the calling application. If the apps generate a lot of backing and forthing between the client and the SQL Server, with continuous data updating, this could be a very negative feature. If you’re already CPU-bound, if you can’t spare 10-15% CPU capacity, then don’t try this. If you’re already maxed out with memory, address that issue first, and then try data compression. But, if you have lots of unused CPU capacity, memory to spare, and you know that you’re I/O bound, data compression is something that may give your SQL Servers (and you) the relief needed.

Thanks for reading this blog, I hope this has helped answer questions that you might have had about when and when not to use SQL Server data compression. If you’d like to learn more about our products and services, give me a call today at +1 303.526.1839 or click over to www.MountVernonDataSystems.com. If you’re new at the database administration game, check out Coach-in-a-Can® at https://mountvernondatasystems.com/coach-in-a-can.html.

The CoachMichelle Poolet, The Coach

Friday, 13 March, 2015

 

 

 

WHAT’S EATING UP MY HARD DISK?

I have a bone to pick with Microsoft. I’m not a Windows Systems Admin, and I don’t think I should have to be in order to use a Windows operating system. I’ll tell my story, and you decide if sysadmin certification is a pre-req to using – in this case – Windows 7.

Lately I’ve had runaway disk consumption on my Windows 7 laptop.  As a SQL Server DBA, I use this little laptop for all my client apps — SQL Server Management Studio, Microsoft Office, personal file storage, etc. It’s not a big system, the local hard disk is is only 278GB.

pink spikey monster-cropWithin the last month I’ve noticed that the free space on this hard disk has been shrinking at an alarming rate, and when the space available “red light” came on, I got really concerned. Granted, I had been saving files to this disk left and right, without considering space requirements — but hey, 278GB can hold a lot of files, right? Apparently not?

Subsequently, I spent many hours moving my files to network storage, and still I was only able to recover less than 30GB of local storage!

WHAT WAS EATING UP MY HARD DISK?

I searched that hard disk for missing space, but I could find nothing. No user folder contained more than a few hundred MBs worth of files.  ARGH!

This is the point in this story when I say that I’m glad I’ve been around computers for as long as I have. Back in the dirty dark days of DOS I had a little utility program that would list folders and their sizes, largest to smallest. Hard disks in those days were measured in a few tens of MB, so file control was critical.

Was there such a utility in today’s market?  OH YES! I found TreeSize (www.jam-software.com/treesize_free/), downloaded, installed, and almost immediately found the culprit — which, incidentally, I could never have done using just Microsoft tools.

Have you heard of this thing called System Volume Information?

If you’re a sysadmin, of course you have! But as a DBA, why would I care? My sysadmin will take care of sysvol stuff, right?

Sure.

SysVol_pix 1

What happens when you try to look into the System Volume Information folder…

I had noticed this sysvol entry in the Windows Explorer tree, and had attempted to check it out, thinking maybe there was a bunch of system stuff that was sucking up the space on disk. Take a look at what you see when you try to check the folder and see what it contains — not a very friendly greeting.

OK, I really didn’t care what it contained at this point, I just wanted to know how large the folder contents were…so, check Properties…

SysVol_pix 2 Zero bytes. This must be one of those folders that Windows uses for temporary files or something.  Except, TreeSize told me that this folder contained about 233 GB worth of files…HUH?

WHY?

Research time!

It didn’t take long to determine what was going on…lots and lots of entries regarding sysvol info, stretching back to the days of Windows XP:

Clean out system volume information folder – Microsoft  https://social.technet.microsoft.com/Forums/en-US/92775152-7330-4da4-8af9-10a04e0bc783/clean-out-system-volume-information-folder

How to reduce the size of the System Volume information (my favorite for concise instructions), http://www.tomshardware.com/answers/id-1813658/reduce-size-system-volume-information-folder.html

How to reduce the size of the System Volume information, http://www.sevenforums.com/performance-maintenance/222588-how-delete-system-volume-information-folder-win-7-a.html

…and even Wikipedia gets into the act…  https://en.wikipedia.org/wiki/System_Restore

BUT — I take exception with this wiki entry. Under Disk Space Consumption, it states:

“The amount of disk space System Restore consumes can be configured. Starting with Windows XP, the disk space allotted is configurable per volume and the data stores are also stored per volume. File[s] are stored using NTFS compression and a Disk Cleanup handler allows deleting all but the most recent Restore Point to free up disk space. System Restore can be disabled completely to regain disk space. It automatically disables itself if the disk free space is too low for it to operate.”

Three points:

1. The contention that files are stored using NT compression — I believe this is not true. You can attempt to apply compression to this folder, but it does nothing — there’s no indication that compression is being done, and it doesn’t even throw an error message. In addition, the display in Windows Explorer is not that of a compressed file.

2. Restore Points are not being managed, and disk space is not being freed up. It’s my understanding that the default config is to allot a maximum of 50% of disk space to sysvol, but in my experience sysvol operations have taken well beyond 50%.  For example, a 278GB hard disk containing 35GB of user data files; when these data files were removed to an external drive, the disk still showed 233GB space used.

3. The contention that “It automatically disables itself if the disk free space is too low for it to operate”  is not true, IMNSHO. My hard disk space was in the red zone, yet GBs were still being consumed — even while user files were being moved off the local hard drive onto the external device.

I’m not sure why this problem surfaced now — perhaps it’s a combination of enough time has elapsed and I’ve been doing more work so that the space consumed by sysvol info has become an issue, who knows? The point is, this is a situation that could trip up a lot of DBAs and ordinary users.  Back in the old days, we could take ownership of the sysvol info folder and administer directly — but starting with Windows 7 that’s no longer possible. Now  you have to jump through appropriate hoops in order to manage the size (see links above for instructions), and how to do this is not obvious. Actually, without TreeSize to tip me off to the sysvol info problem, I would have missed it entirely!

One word of caution: I did reset the sysvol max disk space allocation according to the directions, but when I went back in to check (trust but verify?), the settings had returned to the 50% default. Great.

I can turn off sysvol info entirely — that is an option — but then my system would loose it’s auto-recovery features, and since there have been a couple of auto-recoveries in this laptop’s lifetime, I really don’t want to do that. I just want to use this laptop as it was created…as a shared resource.

So, you tell me, should I or any other non-system admin user need to know about this stuff?  Really?

Thank you for listening to this story. I hope you don’t have this problem with your hard disk, but if you do, I hope this has helped. If you’d like to learn more about our products and services, give me a call today at +1 303.526.1839 or click over to www.MountVernonDataSystems.com. If you’re new at the database administration game, check out Coach-in-a-Can® at https://mountvernondatasystems.com/coach-in-a-can.html

The Coach

 

 

 

SQL Server 2014 – Enterprise Features to Get Excited About

Workhorse 7006869978_cb337bb7da_zSQL Server 2014 is a new breed of workhorse in the Microsoft stables. Last week I attended a webcast given by Mike Otey, Senior Consultant for SQL Server Pro magazine and President of TECA Inc., in which he highlights the amazing new and improved features offered by the Enterprise Edition of 2014.

If you haven’t had a chance to start investigating SQL Server 2014 yet, this is a quick (and painless) way to get started. Mike will give you plenty of reason to research further, and if you’re considering a move to 2014 Enterprise, this will help orient you to the 2014 feature set.

Mike’s webcast, along with presentations from his colleagues at Cisco, is available on demand for the next year at
http://bit.ly/1FlBRqo

Thank you for your interest in SQL Server 2014 Enterprise and its features. I hope you enjoy Mike’s webcast. If you’d like to learn more about our products and services, give me a call today at +1 303.526.1839 or click over to www.MountVernonDataSystems.com. If you’re new at the database administration game, check out Coach-in-a-Can® at https://mountvernondatasystems.com/coach-in-a-can.html

WORKHORSE-STILL-1

 

Making the Jump from SQL Server 2000 to 2014 – IT AIN’T EASY!

An Unexpected Journey: SQL Server 2000 to 2014

 

Making the jump from SQL Server 2000 to 2014 — this was an unexpected journey, not for me, but for my client. For nearly 15 years they’d been using SQL Server 2000 as their main production system, and they were happy to do so. Not much changed in their business model, and the database management system was sufficient.

And then their world was rocked. With the obsolescence of the Windows operation system that hosted SQL Server 2000, and knowing that SQL Server 2000 would not function properly on any of the modern versions of the Windows operating systems,

THEY HAD TO DO SOMETHING.

The attached handouts are slides and notes from a presentation that I gave on February 11th, 2015, for the Denver Women in Technology group. I hope you enjoy the preso, and more importantly, I hope you get value from it.

An Unexpected Journey – SQL 2000 to 2014

Data Normalization – Is It Still Relevant In A World of NOSQL?

Data ModelSome years ago, I wrote an article for this magazine entitled “SQL by Design: Why You Need Database Normalization”. At that time, NoSQL databases were a tiny percentage of the installations, and relational ruled the land.

Today the landscape has changed, and we have relational database management systems like SQL Server sharing the spotlight with variations of the NoSQL technology. So, I thought we could revisit the original reasons for normalizing data and see if they’re still relevant.

Assumptions: when you normalize a database, you’re targeting the following goals:
– arranging data into logical groupings such that each group describes a small part of the whole;
– minimizing the amount of duplicate data stored in a database;
– organizing the data such that, when you modify it, you make the change in only one place;
– building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data in storage.

The original article contended that “Data normalization is primarily important in the transactional, or online transactional processing (OLTP), database world, in which data modifications (e.g., inserts, updates, deletes) occur rapidly and randomly throughout the stored data.” It then went on to point out that data warehouses, which often contain volumes of denormalized and precalculated/summarized data, often ignore the rules of data normalization.

To the data warehouse model we can add many of the current installations of NoSQL databases — databases that are engineered to capture semi-structured data, from data sources over which quality we have little if any control, which data meaning can actually shift over time.  It seems to me that if we want to be able to integrate NoSQL data into our own highly transactional, well-defined, conventionally-normalized financial data, for instance, then yes — we need to understand the NoSQL data, its meaning, where it came from, its quality, its integrity. Well, if we want accurate answers, that is, we need to know.

Are you thinking that the NoSQL installation at your shop is a stand-alone system which is being mined as the data is gathered, and that nothing else will ever be done with it? Good luck with that idea! If IT history has shown us nothing else, it’s clear that data is being used and reused in ways that were never imagined when the collection systems were put in place.

If you’re working in a hybrid (SQL plus NoSQL) environment, I believe that the highly-structured SQL data is your stake in the ground; it’ll give you the basis on which to build additional knowledge and marketing expertise when integrated correctly with the NoSQL data that’s being collected in concert with it. And that’s why I think you still need to normalize your (SQL) data.

Got other ideas? Let’s hear them!


Read the original article “SQL by Design: Why You Need Database Normalization” at http://sqlmag.com/database-performance-tuning/sql-design-why-you-need-database-normalization

Tempdb: Checksum or Torn Page Detection or None?

Tempdb performance tuning can be an art unto itself. We probably all know the standard stuff:

SQL Server's Tempdb database

SQL Server has only one tempdb database, which is used for a variety of internal operations.

  • put tempdb on separate disks if you’re still working with direct attached storage — on it’s own spindle if at all possible, and fastest one(s) you have. If you can, put the tempdb data file on one disk and the tempdb log file on a second disk.
  • if you have the capability to determine RAID level for tempdb, RAID 1+0 is best (there’s only one tempdb per SQL Server, so if you loose the tempdb you’re SOL — hardware mirroring is one way of preventing unnecessary downtime). Otherwise, opt for RAID 0 (striping).
  • if your database employs a lot of the objects which are managed in tempdb (temporary tables – local and global, table variables, cursors, sorts inside queries, index rebuilding operations, online indexing, table-valued functions), it’s a good idea to create one tempdb file for every 4 CPUs, up to 8 tempdb data files. Then observe performance and use of the tempdb files, and increase the number of data files if necessary. And, of course, restart the SQL Server so that the new tempdb data file will be recognized and put to use.
  • never configure tempdb with the default initial size and auto-growth as defined in the model database, especially older versions of SQL Server as it came out of the box. Pre-size the tempdb files when you’re installing SQL Server — all the same size for optimal load distribution.

Recently I read a posting about the Page Verify option of database properties, and tempdb not allowing any option other than CHECKSUM, which is not true for SQL Server 2008 R2 and 2012.  This got me to thinking…why would you want to use an option other than CHECKSUM?

TORN PAGE DETECTION is reputedly not as robust as CHECKSUM in terms of data integrity, but might be quicker…is it really?

NONE would be the quickest option if you were focused entirely on fast performance, and really didn’t care about the odd data inaccuracy.

I honestly don’t know — I’ve always gone with safety first and opted for CHECKSUM on all databases, user and system.  Am I missing something by being too conservative?

Have you ever thought about this before? What do you figure would be the conditions under which you’d want to use one property v. another? And obviously, this is NOT something you want to try on ANY production system, not without volumes of testing! Let me know what you think…

To link directly to this post in future:

https://www.mountvernondatasystems.com/blog/tempdb-checksum-or-torn-page-detection-or-none

 

BIG DATA, TO MODEL OR NOT TO MODEL?

Big Data

 

 

Today’s posting is for those BIG DATA dudes and dudettes, and those folks who are starting to assemble big data stores, who are faced with issues of how to wrangle database design and data modeling in their big data world.

If you’ve been a follower of the SQLmag for any length of time, you know that my forte is data modeling and database design. Lately, I’ve been encountering discussions about big data, NOSQL, (or Not Only SQL, as some call it) and how, in a big data world, data modeling is not required.

In my dim and distant past, I was heavily involved in big data modeling and processing with oil and gas exploration. By today’s standards, the volume and velocity of data gathered would be considered small, but for the computing power of the day, it qualified as “big data”. We definitely modeled the data, otherwise there would have been no way to analyze and use the data being gathered, and we would never have been able to intelligently spot the next drill site.

Which is why I don’t understand the NOSQL concept of the “no data modeling” paradigm…what exactly does that mean? Does that mean no data modeling at all and all of the data is just dumped into a big pile (heap structure) so that when you retrieve data you have to start at the beginning and search through, reading all the data to the end until you find what you’re looking for (table scan)? Probably not. Obviously there’s got to be data modeling or data organization going on somewhere.

In the business world, transactional data is modeled via a set of rules and methods so that updates, inserts, and deletes don’t invalidate the integrity of the data, and select operations are optimized (we hope!). But what do you do for big data? How do you handle data modeling in a business Big Data World? How are you addressing organization within the massive scale that big data presents?

What is your approach? I’d really like to hear from  you, I’d really like to get your input on this subject. I’m very curious!

To view this post in my company blog go to https://www.mountvernondatasystems.com/blog/big-data-to-model-or-not-to-model

Why Are We All Not Great SQL Server DBAs?

We know we’re good; we know we can be better. But too many times, stuff gets in our way. So what is it that’s preventing us from doing not just a good job, but a great job; what is holding us back from becoming a great DBA?

– Is there not enough time in the day to address all the issues that you have on your to-do list?

– Are you not feeling comfortable with the challenges presented to you? I know, this is a hard one to admit, but it hits us — each and every one of us — at some point in our careers!

– Do you feel that there’s a lack of communication with your peer group? (Who is your peer group?)

– Do you feel that you are experiencing a lack of communication and understanding with your management?

– Do you feel like you have a lack of control over your working environment? How would you like to change that?

– Do you have a lack of tools which might – could – would help you get the work done right – better – best?

– Do you feel like you lack certain skills, or that the technology is moving faster than your ability to assimilate the information? I know I feel that way, every time Microsoft releases a new version of SQL Server!

– Do you feel like you’re lacking skilled staffers, and in which quarters?

– Do you feel like you’re lacking adequate support from your peers and/or your management. What would you like to happen, that might be able to fix that?

– And lastly, fill in the blank: Do you feel that ____________________________.

No, this is not a discussion about techniques and tips, but it will help expose some of the common issues which are prevalent among SQL Server DBAs. In doing so, maybe we can get to the some of the root causes of why we’re not all great DBAs!