BIG DATA, TO MODEL OR NOT TO MODEL?

Big Data

 

 

Today’s posting is for those BIG DATA dudes and dudettes, and those folks who are starting to assemble big data stores, who are faced with issues of how to wrangle database design and data modeling in their big data world.

If you’ve been a follower of the SQLmag for any length of time, you know that my forte is data modeling and database design. Lately, I’ve been encountering discussions about big data, NOSQL, (or Not Only SQL, as some call it) and how, in a big data world, data modeling is not required.

In my dim and distant past, I was heavily involved in big data modeling and processing with oil and gas exploration. By today’s standards, the volume and velocity of data gathered would be considered small, but for the computing power of the day, it qualified as “big data”. We definitely modeled the data, otherwise there would have been no way to analyze and use the data being gathered, and we would never have been able to intelligently spot the next drill site.

Which is why I don’t understand the NOSQL concept of the “no data modeling” paradigm…what exactly does that mean? Does that mean no data modeling at all and all of the data is just dumped into a big pile (heap structure) so that when you retrieve data you have to start at the beginning and search through, reading all the data to the end until you find what you’re looking for (table scan)? Probably not. Obviously there’s got to be data modeling or data organization going on somewhere.

In the business world, transactional data is modeled via a set of rules and methods so that updates, inserts, and deletes don’t invalidate the integrity of the data, and select operations are optimized (we hope!). But what do you do for big data? How do you handle data modeling in a business Big Data World? How are you addressing organization within the massive scale that big data presents?

What is your approach? I’d really like to hear from  you, I’d really like to get your input on this subject. I’m very curious!

To view this post in my company blog go to https://www.mountvernondatasystems.com/blog/big-data-to-model-or-not-to-model

Database Data Modeling: Can You Afford Pre-built Data Models?

I just read a blog post about “Data Models: Build or Buy”, by Jason Haley …

He has some interesting points.  If I could recap for brevity and clarity …

  • Good data models that accurately reflect business requirements are critical to the success of a database project, whether transactional or data warehouse, web-based retail or BI;
  • Pre-built data models are available, often from software vendors;
  • Pre-built data models are like off-the-rack suits; they may not fit very well, and will need some alterations to better fit the buyer;
  • Altering a pre-built data model is the realm of a small group of highly talented and (usually) expensive people called (surprise!) data modelers and data integrators;
  • Since budgets are tight, it’s hard to justify hiring one of these highly skilled professionals as a full-time employee, and once the project is complete, what do you do with this skill set? And the person?
  • The risks associated with poorly-modeled databases and bad integration are substantial, and include but are not limited to incorrect information and difficulty of use;

He is absolutely right on all counts.  If you’re getting started on a project and you think you’ll need modifications to an existing database or even an entirely new database, and you don’t want to deal with the problems inherent in an “off-the-rack” pre-built model, investigate adding a database data modeling consultant to your team.  These people are usually happy to work on contract for a set period of time, so when the project is over you don’t have the excess headcount.  But you will have had the benefit of this person’s skill set.  It’s a win-win for everyone.

Come talk to us … the Folks at Mount Vernon Data Systems, where we do it all … database design, support & protection … at www.MountVernonDataSystems.com. We’re affordable, and we’re good.

 

Cloud Computing A-Z

Had to share this finding — SearchCloudComputing.com’s “Cloud computing from A to Z”.  I’m not saying that this is the definitive dictionary of cloud computing terms, and it’s a little dated, but it’s a good resource.  Check it out at http://bit.ly/cPRoTA

Over time I’ll be adding references to cloud computing resources to this blog category. Even though most of use some sort of cloud service, adopting a new way of doing things can be trying.  It’s my goal to make this process a little less nerve-wracking by providing you with resources that you can use to better evaluate your options.

Wishing you sunny skies!

 

50 Reasons to Virtualize Your SQL Server

The whole SQL Server world seems like it’s moving towards virtualization, and you’re wondering “why would anyone want to do that?”

Change is hard; change is uncomfortable.  It’s a whole lot easier just doing what you’ve always done, but, dear reader, I promise you, if you hesitate for too long, you’re going to get trampled by the stampeding herd. The early adopters1 jumped aboard the virtualization bandwagon years ago; the mainstream majority is now in the process of embracing the technology. Wait much longer, and you’ll be accused of being a laggard, and we wouldn’t want that, would we?  Especially if hesitating were to compromise database performance, increase exposure to disastrous downtime, and/or cost more.

There’s a lot of reasons to move SQL Server– and all your servers — to a virtualized environment; here are 50 reasons that I can think of…

  1. It’s free if you pick the right virtualization product.
  2. You can have quicker access to your servers because the virtualization software has a central manager that makes it so.
  3. Deploying servers is faster when they’re virtual servers.
  4. Quicker CD/DVD mount.
  5. You can quickly allocate additional RAM or disk storage to a VM (virtual machine).
  6. You can move virtual servers from one VM to another.
  7. You can restore from image backups that are routinely taken as part of the VM environment.
  8. You can deploy applications to production more quickly, and with more flexibility, on a VM.
  9. You can increase your hardware utilization by 50% or more if you virtualize.
  10. Your hardware and software capital costs can be decreased by 40% or more if you virtualize.
  11. You can reduce operating costs by 50% or more with virtualization.
  12. You can be green: virtualizing can save energy, which maps to a smaller carbon footprint.
  13. You can reduce your TCO (total cost of ownership) by virtualizing.
  14. You can re-direct the capital that you saved when you virtualized the servers to other projects.
  15. Virtualization allows you to redirect unspent capital to other projects.
  16. When you virtualize you get to play with a SAN (hopefully).
  17. Instead of “100 servers, 100 databases” you can say “1 server, 100 databases” — of course, you may still have 100 VMs…
  18. You can move database “guests” from hardware to hardware, quickly, without crashing the system, and with minimum disruption to online customers.
  19. You’re justified in kicking other “guest” systems off your VM— SQL Server needs resources for good performance — and move them over to their own VM.
  20. You get to use cool new acronyms (like LUN or HBA) when you use a SAN.
  21. You get to help design multi-pathing networks for your SAN, unless you’re the SAN administrator, in which case you get to design it yourself.
  22. You get to whiteboard the network (or forever be lost in the nests of wiring!)
  23. When you virtualize you don’t have to be apologetic about asking others for their use patterns, scheduled jobs, anti-virus applications, patch deployment schedules, etc.
  24. You get to compare SAN-specs at the next geek-fest (”My SAN is bigger than your SAN…”)
  25. You can use cool tools like SQLIO to test the SAN speed.
  26. You may get SAN through-put bragging rights.
  27. You get to play peek and poke trying to figure out why the page file looks normal but SQL Server performance is in the toilet. (Don’t worry, there’s an app for that!)
  28. You get to monitor how long SQL Server backup jobs take as a measure of SAN performance (”the canary in the coal mine”).
  29. You get to be best friends with Windows PerfMon and the three best metrics for maintaining virtualization:
    • Physical Disk: Avg Disk sec/Read and Avg Disk sec/Write, measures SAN performance
    • System: Processor Queue Length, measures CPU contention
    • SQL Server Buffer Manager: Page Life Expectancy, measures memory allocation
  30. You may be able to change jobs, from database administrator to SAN administrator.
  31. You may get to replace ordinary database backups with SAN storage snapshot backups — they run SO much faster than regular backups!
  32. You get to brag about how expensive your SAN was and how cool you were when arguing to get it in the first place.
  33. You get to practice for the eventual shift to Cloud computing!
  34. You can give developers their own sandbox(es) to play in by simply spinning up one or more VMs.
  35. You can build a world-class test environment in less than a day.
  36. You can run out of SQL Server licenses faster than you ever thought possible!
  37. You can take an occasional day off from work — the whole weekend, maybe?
  38. You can train and mentor your “grasshoppers” by giving them their own sandbox, so they won’t bring production crashing down.
  39. You can experiment with and compare different server configurations without having to buy more hardware.
  40. You can feel cool and ahead of the crowd — well, ahead of the laggards, anyway.
  41. You can reduce face time with difficult end-users — just send over a new image and they’re up, up and away!
  42. You can reduce face time with geeks — when they have a problem you can spin up an image and they can test to their heart’s content!
  43. You have bragging rights for exploiting the latest cost containment solution — aren’t you clever?
  44. You can provide red meat for your geeks; happy talent = retained talent.
  45. The business continuity program gets a lift from new knowledge applied to minimize your company’s IT risk exposure — that makes your boss happy.
  46. You can manage your servers and the VM cluster from your Smartphone.
  47. Clients who require compliance can actually “inspect” your system without risk of harm to your production servers.
  48. You can be the first on the block to adopt server virtualization (even now).
  49. You can show your peers and colleagues how to scale back on data center space and do more with less.
  50. You can quickly clone a very complex configuration that would have taken weeks to set up if you weren’t using virtualization.

1 Crossing the Chasm (1991, revised 1999), by Geoffrey A. Moore

 

NIST Gives the [Public Sector] Cloud Thumbs Up

NIST (the National Institute of Standards and Technology) http://csrc.nist.gov/ has unveiled sets of guidelines for managing security and privacy issues in the Cloud. These proposals contain guidelines that I really like.  While my focus is in the private sector, not the public, you can bet that I’ll be encouraging my clients and the vendors that I work with to adopt these NIST guidelines.

This is still a draft proposal “NIST Guidelines on Security and Privacy in Public Cloud Computing, SP800-144” and while NIST is focusing its efforts at public-sector organizations (government agencies, etc.), it won’t take long for the private sector to catch on and begin adopting these NIST Cloud computing standards.

Service Level Agreement (SLA): today’s Cloud provider typically dictates the terms of service to the subscriber. NIST is strongly recommending that this model be inverted, that the subscriber negotiates the SLA to better fit their organization’s concerns about security and privacy. What do you need to look for?

  • Self-service: how much of the work will the subscriber have to do, how much will the Cloud provider do, and what’s the cost breakdown?
  • Quota management: how are resources allocated, and what happens when you need or less?
  • Resource metering: data flows can vary according to time of day, day of the week, week of the year. How will these variations be met?
  • Hypervisor: is it secure? Is it efficient? Is it a mainstream product?
  • Guest virtual machines: how many VMs, who are the subscribers, what is the continuity strategy for each VM?
  • Supporting middleware: is it secure?
  • Deployed applications: where will they reside? (hopefully not on a VM that’s supported by the same physical server as the database that serves up the data for these apps…)
  • Data storage: how much, were (physically) is it located, are the storage disks being shared between subscriber organizations?

What to Negotiate? What questions do you need to ask when considering moving operations to the cloud?

  • Are the Cloud provider’s employees vetted?
  • Who owns the data, and what are the exit rights & procedures?
  • Tenant applications: as a Cloud subscriber you will most likely be sharing a server and disk storage with other subscribers; how will the various subscribers’ applications and data be isolated from each another?
  • Will your data be encrypted, and how? Will it be encrypted just at rest, or also while in transit? Will it be segregated from other companies’ data, and how?
  • What kind of tracking and reporting services can you expect?
  • Is the Cloud provider in full compliance with all laws and regulations that you, as an organization, are bound by?
  • Does the Cloud provider use products (software, hardware, etc.) that meet federal & national standards? How do you know this?

Hold the Cloud provider accountable; make sure that audit mechanisms and tools are in place to:

  • Determine how data is stored, used, and protected;
  • Validate services (you don’t want to be under- or over-charged, after all);
  • Verify policy enforcement (just because they say they do something…).

Where will your data be located? Cloud providers generally have a network of disk farms on which subscribers’ data is stored. Usually, detailed information about where the data is physically located is unavailable or not disclosed to the subscriber. Note that when data crosses national borders, legal, privacy and regulatory rules – and possibly even security rules – can be ambiguous or non-enforceable.

Earth to Enterprise, don’t forget the client side. Mobile devices connected to Cloud-based applications & databases can make maintaining physical and logical security very troublesome.

NIST gives the Cloud a thumbs up. Despite the obvious room for improvement, NIST ascertains that “…cloud computing is a compelling computing paradigm that agencies need to incorporate as part [of] their information technology set.”

For the original breaking news report go to CIO Insight, Security Slideshow, NIST Cloud Security Guidelines, www.cioinsight.com/c/a/Security/NIST-Cloud-Security-Guidelines-591748/

Best of luck with your Cloud projects, wishing you sunny days ahead!

 


For more reading on cloud computing emerging guidelines and standards: Read more