By John L. Myers
Many organizations were part of the initial wave of big data implementations. They built Hadoop and NoSQL clusters to support multi-structured data sources. They integrated existing enterprise data environments into a combined, or hybrid, data infrastructure that shares information and processing workloads across multiple platforms as most appropriate for its operational or analytical use case requirements.
However, with any practice or discipline, big data initiatives mature and evolve. Supporting large clusters of data management platforms in the data center can be a barrier to entry for new players. The maintenance associated with those clusters can be a deterrent to organizations that make the initial leap into big data. In addition, organizations are making decisions on whether they want to implement their big data projects using configurable templates and applications from software vendors, or if they want to continue to hand-code those projects using tool sets supplied from either data management or analytical platform vendors. With the next wave of implementers and the maturation of existing environments, we see a change in the where and how big data is implemented.
Not Just a Single Platform
Since 2012, IT industry analyst firm Enterprise Management Associates (EMA), has performed an annual end-user survey of the trends and practices associated with big data implementations. In the most recent edition, EMA continues to develop the concept of the Hybrid Data Ecosystem (HDE). The EMA HDE is a representation of the data management platforms and layers that are associated with big data implementations. Confirmed by three sets of end-user data, the HDE focuses on business drivers as the main force behind big data implementations for organizations. Rather than allowing a single or pair of data management platforms to place constraints on a big data initiative, the HDE allows end users to consider eight different data management platforms to meet business requirements of implementation economics, complex workloads, speed of response, overall information loads, and multiple data format structures.
An information management layer that focuses not just on physical data integration or transfer, but also access to data and metadata across the platforms, connects these data management platforms. Finally, platforms work in concert to provide end data consumers with the best tool for the job in terms of workload processing. Data management platforms such as NoSQL and analytical appliances/databases provide near real-time processing of operational and operational analytical workloads. Enterprise data warehouses and data mart systems provide the intraday processing to meet relatively low latency processing requirements for analytics. Hadoop and external data sources enable the batch processing for exploratory workloads.
Where to Manage Data Management Platforms
During the initial stage of big data implementations, the only true option was to implement big data initiatives in the form of HDE inside the firewall in the local data center. This was initially a design choice, but over time the accumulated density, or gravity, of big data storage requirements were perceived less as a choice and more of a lack of options.
In the 2014 EMA/9sight survey, end users were asked about implementation choices for their HDE platforms in terms of bare metal data center installations, private cloud environments within the firewall, hybrid cloud environments that included public and private cloud resources, private cloud architectures, and managed service implementations. Not surprisingly, enterprise data warehouses (EDWs) were the most likely to be implemented as part of a native, bare metal installation within the data center. Yet surprisingly, EDWs were not locked into the data center. End users were making installation decisions for their data warehouses outside of the traditional implementation techniques.
Other platforms were more amenable to implementation outside of the data center. Not surprisingly, newer multi-structured data stores such as NoSQL and Hadoop were just as likely to be implemented outside of the data center firewall as within bare metal or as a private cloud.
Overall, this shows that big data initiatives are growing beyond the data center and each of the data management platforms of the EMA HDE have significant support to move beyond the constraints of internal installation, administration, and maintenance to models where external service providers encapsulate much of the overhead total cost of ownership associated with these platforms.
Buy or Build?
In addition to organizations looking for ways to limit overhead costs associated with big data platforms, they also look at ways to speed the implementation or the time to value associated with big data projects. Over the years, the EMA/9sight research showed that organizations were not simply using big data initiatives to maintain a single big data repository. Rather, these organizations were implementing multiple projects upon these big data repositories to meet various business challenges. In 2013, the average respondent had slightly less than three big data projects. In 2014, the average number of projects rose to just over three per respondent.
For these projects, the EMA/9sight survey asked how organizations were implementing their projects. Were they using templates and configurable applications from external providers such as software vendors or third-party consultants? Were they hand rolling their projects using tool sets and manual resources?
In 2014, over 20 percent of EMA/9sight panel respondents mentioned using configurable applications from external providers for big data project implementation. This shows significant interest in a faster time to implementation. This also shows that the maturing field of big data initiatives is moving away from the classic data scientist and the manual effort of hand coding and is instead moving toward a faster time to implementation and re-using components that improve the productivity of implementation teams.
This is not to say that the age of data scientists diligently preening data sets and compiling the best analytical uses for that data is gone. A significant number of projects still utilize some form of building big data projects to implement big data initiatives. Nearly 18 percent of respondents mentioned hand-rolled development as a big data project implementation strategy.
Big Data Initiatives Mature
Big data initiatives are maturing and the face of where and how organizations implement big data is changing. The EMA HDE provides organizations with a guide on how to let business requirements drive big data implementations as opposed to the technical limitations in their data management platforms. These platforms are implemented across a range of options, from inside to outside the data center, providing organizations with the ability to select the appropriate level of capital and administrative/maintenance costs for their IT infrastructures.
Highlights from the 2014 EMA/9sight research are available in a video located online at research.enterprisemanagement.com/big-data-2014-on-demand-webinar-softwaremag.html. SW
John L. Myers is managing research director, business intelligence (BI) data warehousing, EMA. He joined the firm in 2011 as senior analyst of BI. In this role, Myers delivers comprehensive coverage of the BI and data warehouse industry with a focus on database management, data integration, data visualization, and process management solutions. SW
John Myers is managing research director, business intelligence (BI) data warehousing, EMA. He joined Enterprise Management Associates in 2011 as senior analyst of BI. In this role, Myers delivers comprehensive coverage of the BI and data warehouse industry with a focus on database management, data integration, data visualization, and process management solutions.