Data Normalization – Is It Still Relevant In A World of NOSQL?

Data ModelSome years ago, I wrote an article for this magazine entitled “SQL by Design: Why You Need Database Normalization”. At that time, NoSQL databases were a tiny percentage of the installations, and relational ruled the land.

Today the landscape has changed, and we have relational database management systems like SQL Server sharing the spotlight with variations of the NoSQL technology. So, I thought we could revisit the original reasons for normalizing data and see if they’re still relevant.

Assumptions: when you normalize a database, you’re targeting the following goals:
– arranging data into logical groupings such that each group describes a small part of the whole;
– minimizing the amount of duplicate data stored in a database;
– organizing the data such that, when you modify it, you make the change in only one place;
– building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data in storage.

The original article contended that “Data normalization is primarily important in the transactional, or online transactional processing (OLTP), database world, in which data modifications (e.g., inserts, updates, deletes) occur rapidly and randomly throughout the stored data.” It then went on to point out that data warehouses, which often contain volumes of denormalized and precalculated/summarized data, often ignore the rules of data normalization.

To the data warehouse model we can add many of the current installations of NoSQL databases — databases that are engineered to capture semi-structured data, from data sources over which quality we have little if any control, which data meaning can actually shift over time.  It seems to me that if we want to be able to integrate NoSQL data into our own highly transactional, well-defined, conventionally-normalized financial data, for instance, then yes — we need to understand the NoSQL data, its meaning, where it came from, its quality, its integrity. Well, if we want accurate answers, that is, we need to know.

Are you thinking that the NoSQL installation at your shop is a stand-alone system which is being mined as the data is gathered, and that nothing else will ever be done with it? Good luck with that idea! If IT history has shown us nothing else, it’s clear that data is being used and reused in ways that were never imagined when the collection systems were put in place.

If you’re working in a hybrid (SQL plus NoSQL) environment, I believe that the highly-structured SQL data is your stake in the ground; it’ll give you the basis on which to build additional knowledge and marketing expertise when integrated correctly with the NoSQL data that’s being collected in concert with it. And that’s why I think you still need to normalize your (SQL) data.

Got other ideas? Let’s hear them!


Read the original article “SQL by Design: Why You Need Database Normalization” at http://sqlmag.com/database-performance-tuning/sql-design-why-you-need-database-normalization

BIG DATA, TO MODEL OR NOT TO MODEL?

Big Data

 

 

Today’s posting is for those BIG DATA dudes and dudettes, and those folks who are starting to assemble big data stores, who are faced with issues of how to wrangle database design and data modeling in their big data world.

If you’ve been a follower of the SQLmag for any length of time, you know that my forte is data modeling and database design. Lately, I’ve been encountering discussions about big data, NOSQL, (or Not Only SQL, as some call it) and how, in a big data world, data modeling is not required.

In my dim and distant past, I was heavily involved in big data modeling and processing with oil and gas exploration. By today’s standards, the volume and velocity of data gathered would be considered small, but for the computing power of the day, it qualified as “big data”. We definitely modeled the data, otherwise there would have been no way to analyze and use the data being gathered, and we would never have been able to intelligently spot the next drill site.

Which is why I don’t understand the NOSQL concept of the “no data modeling” paradigm…what exactly does that mean? Does that mean no data modeling at all and all of the data is just dumped into a big pile (heap structure) so that when you retrieve data you have to start at the beginning and search through, reading all the data to the end until you find what you’re looking for (table scan)? Probably not. Obviously there’s got to be data modeling or data organization going on somewhere.

In the business world, transactional data is modeled via a set of rules and methods so that updates, inserts, and deletes don’t invalidate the integrity of the data, and select operations are optimized (we hope!). But what do you do for big data? How do you handle data modeling in a business Big Data World? How are you addressing organization within the massive scale that big data presents?

What is your approach? I’d really like to hear from  you, I’d really like to get your input on this subject. I’m very curious!

To view this post in my company blog go to https://www.mountvernondatasystems.com/blog/big-data-to-model-or-not-to-model