The Unending Challenge of Managing “The Database”

the-unending-challenge-of-managing-the-databaseIf you’re a large organization, somewhere in the bowels of the IT department is “The Database”. It doesn’t matter what kind of database server it runs on, maybe it’s SQL Server, maybe it’s MySQL, but The Database has something big in it – the output of that exhaustive customer survey you did, or the company’s complete sales history back to 1986. But whatever it is, it has gobs of data, and you likely have no idea what’s in it.

The IT department probably even bugs you about it from time to time. “Hey, do you still need this database? It hasn’t been read from in three years, and it’s taking up half a terabyte of storage!” You can’t delete it, of course – that’s irreplaceable company history, or it’s something the company spent a lot of time or effort to assemble.

There could be extremely valuable insights to be gleaned by analyzing the information in The Database. But who’s going to do it? Do you have somebody who knows SQL and can query a database? That’s good, but is he or she a person also trained in statistics sufficiently to spot meaningful trends and patterns in the data that comes out of the query? (Don’t look at me, stats class in college was some of the most expensive naps I ever took.)

Do you have a programmer on staff? Somebody who can build GUI interfaces? Excellent – that’s a good start. But will he or she be able to take the data coming from the database professional to correctly forecast the next quarter or two based on that existing data? How are their data visualization skills?

Do you have any employees skilled in the fast-growing art of machine learning? These people are gifted in the ability to build programs that get progressively more sophisticated the more they analyze your company’s data. If you do, you’re in the minority today. And, do your machine learning pros really understand relational databases?

You see the problem, perhaps. Increasingly, companies need someone who brings all these skills to the table: delivering an understanding of database design, savvy querying skills, statistical acumen, data visualization artistry, and machine learning wizardry, all in one package. Do you have that consummate data analysis professional on your staff? Maybe not – and that’s a problem that Microsoft wants to fix.

Solution: The Microsoft Professional Program

If no name of a current employee with all these skills pops into your head, you aren’t alone. As early as 2012 Gartner, the IT industry analysis organization, had identified this critical shortage of experts that can mold and shape a company’s raw data into actionable business intelligence. “There is not enough talent in the industry,” said Gartner’s Peter Sondergaard, adding, “Data experts will be a scarce, valuable commodity.”

It’s not hard to see why. The IT industry is currently generating a quantity of new data each year that is measurable in zettabytes of storage. A zettabyte represents 1,024 exabytes of storage. Each of the following units of measurement describe 1,024 of the one that follows – getting smaller as we go. An exabyte is 1,024 petabytes, which in turn is 1,024 terabytes. Each of those terabytes is a bit more than the storage capacity of 200 DVD discs. So a zettabyte is somewhere around 200 billion DVDs-worth of data. Whoa.

And that’s just the new data being added each year to the zettabytes that have come before. The skills needed to manage that quantity of information are not widely spread. The potential for amazing new businesses and growth potential in existing businesses is in danger of being throttled by a lack of professionals with the complete set of skills to capitalize on new opportunities provided by big data.

What would it take to create a new generation of data analysts? Microsoft doesn’t claim it will be easy. Their curriculum of self-paced learning materials delivered through edX.com could consume 200 hours of effort to complete – or more! On top of all that is the final exam, a real-world project requiring another six to 10 hours to bring all the lessons of the program to bear. When you consider that a bachelor’s program at a university only requires 120 credit hours to complete, it becomes clear why this program is named the Microsoft Professional Program – Data Science track.

What’s in the program?

The curriculum of the program is grouped into three units. The first unit contains four courses focused on the basics of data science: querying data and applying statistical analysis techniques to the results. After an initial orientation course, the meat of Unit 1 starts with a course on mastering the Microsoft querying language Transact-SQL (T-SQL) for the purposes of retrieving and modifying data. Students are then free to choose Excel or Power BI as their tool for data visualization and analysis and will complete a course in that chosen tool. The first unit completes with an extensive section on statistics – a vitally important tool for determining when a blip in the data is the beginning of an important trend, or just a funny quirk of fate.

Unit 2 uses three courses to focus on some of the broader software development skills needed to begin building software solutions to data science problems. Students start with a course introducing them to their choice of Python, a popular general-purpose programming language, or R, a language created explicitly for statistical analysis situations.

This is followed by a course on the skills of data science proper such as learning to explore and visualize patterns in data, dealing with corrupt or incomplete data sets, and transforming that data into other forms that better support analysis. Unit 2 then concludes with a deep dive into the principles of machine learning, as students learn to write software that spots patterns in data and which gets better at understanding the data as time goes by.

Students that have made it this far are now prepared for the rigorous content of Unit 3. Only two courses are needed to complete Unit 3, but students get a number of choices along that path. Students revisit their Unit 2 programming language for a deeper explanation of their choice of R or Python. This time, basic syntax for the language is replaced with thorough explorations of how to use their language choice for organizing data for analysis.

Students then have a second big decision to make: choose between one of three available courses to conclude their formal studies. Option one is a course in advanced methods in machine learning. Option two is a course on intelligent app development building IOT and big data bots on the Azure Machine Learning system. The third option is a deep dive into the Azure HDInsight platform, providing cloud-based data analysis services on an on-demand basis.

All this prepares budding data scientists for their final challenge in Unit 4: a 12-hour-long capstone project demonstrating real-world skills as a data science professional. Students are challenged to create a project that utilizes the Cortana Intelligence Platform. This project is then scored by administrators of the Microsoft Professional Program to determine if the individual has indeed earned a degree in data science.

You’re probably wondering, how much is this going to set back my budget?

One of the most remarkable aspects of this program is how little is being charged for it. Attending the classes is, as of this writing, entirely free of charge. Students that wish to prove their completion of the various courses can purchase a verified completion certificate costing $49 for each course, but the information itself is being provided completely free.

Microsoft is placing some big bets on big data. From improvements in SQL Server, to programming language development, visualization tools and Azure-based compute clusters, Microsoft is demonstrating that they see huge potential for organizations to do business in bigger and better ways than ever before by leveraging what’s in The Database.

But that can’t happen if efforts to access an organization’s data is hampered by shoddy database design, or inadequate data scrubbing capabilities, substandard data transformation techniques, uninspiring data visualization efforts or limitations imposed by the analysis capacity of on-premises hardware.

The market-crushing businesses of the future need full-spectrum analysts who understand a company’s data intimately, and who can efficiently produce innovative solutions that surface latent data insights to company stakeholders. They need people that have had a collegiate level of immersion in every aspect of data analysis. Tomorrow’s market leaders need data scientists, and Microsoft has inexpensively delivered an amazing wealth of knowledge to nurture those data experts in your organization to fill exactly that role.

Related Courses
Analytics and Data Management Training

In this article

Join the Conversation