September 2014 – Mount Vernon Data Systems

Tempdb performance tuning can be an art unto itself. We probably all know the standard stuff:

SQL Server has only one tempdb database, which is used for a variety of internal operations.

put tempdb on separate disks if you’re still working with direct attached storage — on it’s own spindle if at all possible, and fastest one(s) you have. If you can, put the tempdb data file on one disk and the tempdb log file on a second disk.

if you have the capability to determine RAID level for tempdb, RAID 1+0 is best (there’s only one tempdb per SQL Server, so if you loose the tempdb you’re SOL — hardware mirroring is one way of preventing unnecessary downtime). Otherwise, opt for RAID 0 (striping).
if your database employs a lot of the objects which are managed in tempdb (temporary tables – local and global, table variables, cursors, sorts inside queries, index rebuilding operations, online indexing, table-valued functions), it’s a good idea to create one tempdb file for every 4 CPUs, up to 8 tempdb data files. Then observe performance and use of the tempdb files, and increase the number of data files if necessary. And, of course, restart the SQL Server so that the new tempdb data file will be recognized and put to use.
never configure tempdb with the default initial size and auto-growth as defined in the model database, especially older versions of SQL Server as it came out of the box. Pre-size the tempdb files when you’re installing SQL Server — all the same size for optimal load distribution.

Recently I read a posting about the Page Verify option of database properties, and tempdb not allowing any option other than CHECKSUM, which is not true for SQL Server 2008 R2 and 2012. This got me to thinking…why would you want to use an option other than CHECKSUM?

TORN PAGE DETECTION is reputedly not as robust as CHECKSUM in terms of data integrity, but might be quicker…is it really?

NONE would be the quickest option if you were focused entirely on fast performance, and really didn’t care about the odd data inaccuracy.

I honestly don’t know — I’ve always gone with safety first and opted for CHECKSUM on all databases, user and system. Am I missing something by being too conservative?

Have you ever thought about this before? What do you figure would be the conditions under which you’d want to use one property v. another? And obviously, this is NOT something you want to try on ANY production system, not without volumes of testing! Let me know what you think…

To link directly to this post in future:

https://www.mountvernondatasystems.com/blog/tempdb-checksum-or-torn-page-detection-or-none

Today’s posting is for those BIG DATA dudes and dudettes, and those folks who are starting to assemble big data stores, who are faced with issues of how to wrangle database design and data modeling in their big data world.

If you’ve been a follower of the SQLmag for any length of time, you know that my forte is data modeling and database design. Lately, I’ve been encountering discussions about big data, NOSQL, (or Not Only SQL, as some call it) and how, in a big data world, data modeling is not required.

In my dim and distant past, I was heavily involved in big data modeling and processing with oil and gas exploration. By today’s standards, the volume and velocity of data gathered would be considered small, but for the computing power of the day, it qualified as “big data”. We definitely modeled the data, otherwise there would have been no way to analyze and use the data being gathered, and we would never have been able to intelligently spot the next drill site.

Which is why I don’t understand the NOSQL concept of the “no data modeling” paradigm…what exactly does that mean? Does that mean no data modeling at all and all of the data is just dumped into a big pile (heap structure) so that when you retrieve data you have to start at the beginning and search through, reading all the data to the end until you find what you’re looking for (table scan)? Probably not. Obviously there’s got to be data modeling or data organization going on somewhere.

In the business world, transactional data is modeled via a set of rules and methods so that updates, inserts, and deletes don’t invalidate the integrity of the data, and select operations are optimized (we hope!). But what do you do for big data? How do you handle data modeling in a business Big Data World? How are you addressing organization within the massive scale that big data presents?

What is your approach? I’d really like to hear from you, I’d really like to get your input on this subject. I’m very curious!

To view this post in my company blog go to https://www.mountvernondatasystems.com/blog/big-data-to-model-or-not-to-model

Tempdb: Checksum or Torn Page Detection or None?

BIG DATA, TO MODEL OR NOT TO MODEL?