Teaching is Learning, or How to Survive the SQL Server Learning Curve

teaching-learningDan Crisan, @dandancrisan, a full-time student and tech blogger, has stumbled upon a great way to learn new material — and that’s to write about it. He did just that, taking the material from his Intro to Database Systems course and crafting it into a series of postings called “A Tiny Intro to Database Systems”, http://blog.dancrisan.com/a-tiny-intro-to-database-systems.

Learning by teaching is not a new concept. Dan explained how he stumbled upon Daniel Lemire’s blog posting “How to Learn Efficiently” (http://lemire.me/blog/archives/2014/12/30/how-to-learn-efficiently/), in which Lemire states: “[W]riting … would be a very effective way to learn the material. Teaching is a great way to learn, because it challenges you.” This, in itself, is a good blog, and if you’re like the majority of SQL Server DBAs, always on a continuous learning curve(will Microsoft never give us a moment to relax? I just got a copy of the SQL Server 2016 data sheet from the Ignite conference in Chicago! (And thinking about the continuous learning curve, this gives SQL Server’s “Always On” a whole new meaning…) Argh!), adopting this learning style may be a big help when trying to assimilate the newest, brightest and best features of SQL Server.

A little history about Ted Codd's relational model, circa 1970, http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/reldb/

A little history about Ted Codd’s relational model, circa 1970, http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/reldb/

Where was I?  Oh yes — the purpose of this posting is to point you to a sweet space where you can quickly review what you learned in database class all those years ago. The dashboard site to “A Tiny Intro to Database Systems” is not the place to learn about databases for the first time. Rather, it’s the place to go for a refresher, a review of concepts and terminology that we all knew at one time but have forgotten in the crush of everyday business.

And why bother to review? Because every now and then it’s good to go back to the beginning, to understand why SQL Server and it’s relational counterparts are so valuable. The material presented in these tiny overviews is not specific to SQL Server, but in most cases it’s the way SQL Server works underneath the hood. So grab your sandwich and spend a lunchtime with this walk down memory lane.

Thanks for reading. I publish links to everything I write (my blog, my SQL DBA Knowledge Base, and my LinkedIn Publishing) on my Facebook page, visit it for more goodies. Please like, comment, and share.

https://www.facebook.com/MountVernonDataSystems?ref=hl

Michelle Poolet, Zachman-Certified™ Enterprise Architect
Michelle@MountVernonDataSystems.com
@SQLinTheCloud

Data Normalization – Is It Still Relevant In A World of NOSQL?

Data ModelSome years ago, I wrote an article for this magazine entitled “SQL by Design: Why You Need Database Normalization”. At that time, NoSQL databases were a tiny percentage of the installations, and relational ruled the land.

Today the landscape has changed, and we have relational database management systems like SQL Server sharing the spotlight with variations of the NoSQL technology. So, I thought we could revisit the original reasons for normalizing data and see if they’re still relevant.

Assumptions: when you normalize a database, you’re targeting the following goals:
– arranging data into logical groupings such that each group describes a small part of the whole;
– minimizing the amount of duplicate data stored in a database;
– organizing the data such that, when you modify it, you make the change in only one place;
– building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data in storage.

The original article contended that “Data normalization is primarily important in the transactional, or online transactional processing (OLTP), database world, in which data modifications (e.g., inserts, updates, deletes) occur rapidly and randomly throughout the stored data.” It then went on to point out that data warehouses, which often contain volumes of denormalized and precalculated/summarized data, often ignore the rules of data normalization.

To the data warehouse model we can add many of the current installations of NoSQL databases — databases that are engineered to capture semi-structured data, from data sources over which quality we have little if any control, which data meaning can actually shift over time.  It seems to me that if we want to be able to integrate NoSQL data into our own highly transactional, well-defined, conventionally-normalized financial data, for instance, then yes — we need to understand the NoSQL data, its meaning, where it came from, its quality, its integrity. Well, if we want accurate answers, that is, we need to know.

Are you thinking that the NoSQL installation at your shop is a stand-alone system which is being mined as the data is gathered, and that nothing else will ever be done with it? Good luck with that idea! If IT history has shown us nothing else, it’s clear that data is being used and reused in ways that were never imagined when the collection systems were put in place.

If you’re working in a hybrid (SQL plus NoSQL) environment, I believe that the highly-structured SQL data is your stake in the ground; it’ll give you the basis on which to build additional knowledge and marketing expertise when integrated correctly with the NoSQL data that’s being collected in concert with it. And that’s why I think you still need to normalize your (SQL) data.

Got other ideas? Let’s hear them!


Read the original article “SQL by Design: Why You Need Database Normalization” at http://sqlmag.com/database-performance-tuning/sql-design-why-you-need-database-normalization

BIG DATA, TO MODEL OR NOT TO MODEL?

Big Data

 

 

Today’s posting is for those BIG DATA dudes and dudettes, and those folks who are starting to assemble big data stores, who are faced with issues of how to wrangle database design and data modeling in their big data world.

If you’ve been a follower of the SQLmag for any length of time, you know that my forte is data modeling and database design. Lately, I’ve been encountering discussions about big data, NOSQL, (or Not Only SQL, as some call it) and how, in a big data world, data modeling is not required.

In my dim and distant past, I was heavily involved in big data modeling and processing with oil and gas exploration. By today’s standards, the volume and velocity of data gathered would be considered small, but for the computing power of the day, it qualified as “big data”. We definitely modeled the data, otherwise there would have been no way to analyze and use the data being gathered, and we would never have been able to intelligently spot the next drill site.

Which is why I don’t understand the NOSQL concept of the “no data modeling” paradigm…what exactly does that mean? Does that mean no data modeling at all and all of the data is just dumped into a big pile (heap structure) so that when you retrieve data you have to start at the beginning and search through, reading all the data to the end until you find what you’re looking for (table scan)? Probably not. Obviously there’s got to be data modeling or data organization going on somewhere.

In the business world, transactional data is modeled via a set of rules and methods so that updates, inserts, and deletes don’t invalidate the integrity of the data, and select operations are optimized (we hope!). But what do you do for big data? How do you handle data modeling in a business Big Data World? How are you addressing organization within the massive scale that big data presents?

What is your approach? I’d really like to hear from  you, I’d really like to get your input on this subject. I’m very curious!

To view this post in my company blog go to https://www.mountvernondatasystems.com/blog/big-data-to-model-or-not-to-model

The 833, What Is It?

Here’s a story that I’d like to share, about the infamous SQL Server 833. A while ago I was on a performance-tuning call at a client site; they’d been experiencing event 833 from some of their production SQL Servers, and they wanted to know if this was a SQL Server problem or a problem with the disk farm.

To back up just a little, an event 833 (or error 833, as it’s sometimes called) is a signal from the SQL Server that it’s waiting for records that it’s requested. An 833 message reads like “SQL Server has encountered 49 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file […file full path…] in database […database name…]…”.

This message is often interpreted as being a disk I/O issue, and in many cases the 833 can be resolved by optimizing disk operations or just replacing older, slower equipment with the new fast disks. But this one case didn’t resolve quite so easily. You see, the disk farm attached to the SQL Servers in question were composed of flash drives — a flash SAN.

Let’s complicate this situation:

SAN storage serverThe production SQL Servers which were getting the 833s are on the far right of this system diagram. The 833s were coming from different database data files and log files, sometimes during heavy transaction activity and other times during very light transaction activity…so, in other words, the 833s were sporadic and unpredictable.

Given all the “upstream” data stores, processes, and source data implications (no one on staff seemed to really understand how the production system worked), it’s not surprising that there would be some sort of issue in the transaction stream. The stream starts at point-of-sale terminals and data aggregators on the far left and proceeds through load balancers and “soft routers” (this is a distributed environment) and on to error-handling routines before finally arriving at a place where the SQL Servers can begin to do their job.  The good news is that this system is able to handle many millions of transactions a day with only the occasional 833 hiccup. The 833 “wait” always resolves itself within a minute or two, but by then, the upstream congestion is significant.

The client set up a test environment with the intent of replicating the 833 situation to determine root cause. There was only one problem with the test setup…it bore little if any resemblance to production. And, while I was on-site, we were never allowed to use advanced analysis tools on the production systems.

Spoiler Alert! the exact root cause of the 833s was never determined…

So now the challenge to you, dear reader: given what you know so far, what would you surmise might be the cause of the 833s which are being thrown by the SQL Servers?  Remember, that SAN is full of flash drives…

 

Database Data Modeling: Can You Afford Pre-built Data Models?

I just read a blog post about “Data Models: Build or Buy”, by Jason Haley …

He has some interesting points.  If I could recap for brevity and clarity …

  • Good data models that accurately reflect business requirements are critical to the success of a database project, whether transactional or data warehouse, web-based retail or BI;
  • Pre-built data models are available, often from software vendors;
  • Pre-built data models are like off-the-rack suits; they may not fit very well, and will need some alterations to better fit the buyer;
  • Altering a pre-built data model is the realm of a small group of highly talented and (usually) expensive people called (surprise!) data modelers and data integrators;
  • Since budgets are tight, it’s hard to justify hiring one of these highly skilled professionals as a full-time employee, and once the project is complete, what do you do with this skill set? And the person?
  • The risks associated with poorly-modeled databases and bad integration are substantial, and include but are not limited to incorrect information and difficulty of use;

He is absolutely right on all counts.  If you’re getting started on a project and you think you’ll need modifications to an existing database or even an entirely new database, and you don’t want to deal with the problems inherent in an “off-the-rack” pre-built model, investigate adding a database data modeling consultant to your team.  These people are usually happy to work on contract for a set period of time, so when the project is over you don’t have the excess headcount.  But you will have had the benefit of this person’s skill set.  It’s a win-win for everyone.

Come talk to us … the Folks at Mount Vernon Data Systems, where we do it all … database design, support & protection … at www.MountVernonDataSystems.com. We’re affordable, and we’re good.

 

Do You Design Databases as Part of Your Job?

If you design databases as part of your job, you’re not alone.  Recently, the SQL Server Magazine ran an “insta-poll”, where they asked “Do you consider database design to be part of your job function?”   The results from the 102 responses:

– Yes, I’m primarily a database designer–1%
– Yes, I consider myself to be a database designer/developer–52%
– Yes, I consider myself to be a database designer/BI pro–8%
– Yes, I consider myself to be a database designer/DBA–28%
– No, I don’t consider database design to be part of my job–11%

Now, despite the fact that with any opt-in poll there’s a high probability of skew due to differential interest in the topic at hand, and the count of persons polled was really very low (some would say statistically insignificant), 89% of respondents claim that database design is part of their job function.  That’s impressive.

So, if you’re a DBA or a database developer and you’re expected to “do” design as part of your job, do you know what the downside of inadequate design could be? Let me count the ways…. lost performance for the database group and lost revenue for the company are just a starting point.

If you check the stats in the list, Dev is the component with the highest level of design overlap, but not necessarily because it’s creative.  Developers in the pre-database world built their own file structures to hold the input data needed by the programs they wrote, and to store the output data.   That mentality has (gawd help us) been carried forward into today.

There are a lot of managers who just don’t get (or who don’t want to get, because it’ll cost more) that developers may not have the skill set to create efficient, effective, and long-lasting (extensible) database schemas. It’s easier (and cheaper in the short run) to just whip the Dev donkey to get the job done.  Developers want to do it right and, left to their own devices, they want to do a good job — and they will.  But if you have a supervisor or manager who’s not invested in anything except making the numbers on the bottom line, you, Mr. or Ms. Developer, are not even going to get a chance to do it right.

In addition to the Developer, any SQL Server DBA who’s been around for any length of time will be responsible for some database design… thus the 28% in their category.  And any DBA who values performance will monitor and sign off on schema changes to the databases under his/her control.

Bad or inadequate design is arguably the #1 reason for poor performance, not only with SQL Server, but with any DBMS platform.  But once the design is put into production, rectifying it may not be possible, so the DBA now has to tinker with index management, I/O throughput acceleration, adjusting and balancing memory resources, query optimization, etc.

That’s my opinion.  What’s yours?