Big Data

The definition of Big Data is hard to pin down. Many definitions explicitly define big data as data which cannot be handled by database management systems. Those definitions preclude databases ever being part of Big Data solution no matter what the advancements. BI Voyage has helped a customer implement a 40 TB Data Warehouse and the customer certainly thought 40 TB was Big Data.

The best definition of Big Data we have found was published by Gartner in their glossary:

Big data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

This definition of Big Data does not preclude RDBMS as the solution or part of the solution. BI Voyage feels that every customer data requirement needs to be reviewed and the right solution, be that NOSQL, SQL or combination evaluated.

Big Data Technologies

The core technologies that BIV leverages for Big Data are SQL Server Parallel Data Warehouse and HDInsight (Hadoop on Windows). We think of these technologies having complimentary capabilities. Hadoop is easier to scale up, does not require complex ETL you can just drop files in, does not require schema design. SQL Server allows for tuning to optimize response times for frequently asked business questions.

BI Voyage is currently involved in a customer implementation running a 30 node Hadoop cluster as their enterprise repository of data. The customer also purchased a PDW which is leveraged as data marts. Hadoop contains all data and is a repository where analysts can ask one time questions or explore the data looking for unknown correlations; these queries will often take a long time to run but since they are asked once or infrequently it is not an issue. PDW hosts data marts optimized for reporting with the key reporting data attributes, this allows for fast queries for known questions and for ad hoc queries against known important data. Discoveries in Hadoop can lead to new data being added to the data marts going forward if the data is determined to be of repeated business value.

With PDW and HDInsight making up the complimentary data repositories, the surrounding technology from Microsoft can do the rest. Microsoft BI platform can be leveraged against both data sources. Microsoft offers Data Quality Services, Master Data Services, Data Mining capabilities on top of the industry leading BI Self Service.

Big Data is relative to the customer. If your data volumes are more than your infrastructure can currently support then Microsoft has the tools and B.I. Voyage has the experience to help you turn your Big Data problem in to a business asset.