By Olivia Cahoon
To take advantage of advanced technology and data functions, businesses utilize modernized data warehouses that hold data extracted from transaction systems, operational data stores, and external sources.
According to Gartner, data warehouses combine data in an aggregate, summary form suitable for enterprise-wide analysis and reporting for predefined business needs. Data warehouses contain data arranged into abstracted subject areas with time-variant versions of the same records, with an appropriate level of data grain or detail to make it useful across two or more different types of analyses.
Traditional data warehouses determine how data is organized to address identified questions while modernized data warehouses use analytics to impact data as it happens. Businesses should consider modernizing data warehouses when seeking new data types.
Data Warehouse Modernization
Modern data warehousing deploys technology that documents and models existing and future data landscape; including metadata, processes, and management of the metadata and data movement processes. According to Neil McGovern, senior director of product marketing, SAP, deploying technology to acquire data either via replication, real-time access, or ingestion from any data source and any data type allows access to the organization’s data.
By introducing new analytic engines, organizations perform advanced analytics like sentiment analysis or spatial analysis, says McGovern. Sentiment analysis—also referred to as opinion mining—uses statistics to determine consumer attitudes toward products. Spatial analysis analyzes spatial data dependent on the location of the analyzed objects. McGovern suggests businesses create a data structure that handles the growing volume of data as well as the velocity of incoming data streams.
Once the organization deploys technology capable of handling big data and advanced analytics, it is important that the organization understands its new capabilities as a modern data warehouse. According to McGovern, this includes more data-driven decision making and automate existing manual processes.
As a core technology, data warehousing delivers advanced business insight. “This is not changing, but rather going through a significant evolution that incorporates advanced analytic technologies such as machine learning, graph, pattern, and path analysis, time series analytics, and artificial intelligence (AI),” says Imad Birouty, director, technical product marketing, Teradata.
Modern data warehouses feature technology that easily classifies data, understands where it is, where it needs to be, and orchestrates the movement of the data, data aging, and data cleansing. “Multi-tiered data management technology stores data in-memory for very high performance needs; have disk and/or cloud-based storage for larger, less performance dependent data sets; and be flexible enough to adjust where and how data is stored while keeping the system active,” says McGovern.
According to Gartner, organizations require solutions capable of managing and processing external data in combination with their traditional internal sources, including data from the Internet of Things (IoT). The firm adds that these solutions support data for analytics under a coordinated approach that demands different types of integrated solutions and an interoperable services tier for managing and delivering data.
Traditional and Modern Data
Modern data warehouses were once data stores with greater volume and data variety, which could be analyzed with flexible methods that didn’t require rigid schemas. A traditional data warehouse dictates how the data is organized to address identified questions and analysis through SQL. “These data lakes have proven to be disappointing, with Gartner pointing out a small percentage are actually in full production,” says Jack Norris, senior VP, data and applications, MapR Technologies.
The core of modern data warehouses is the underlying data fabric that provides an enterprise-grade persistence layer for broad data sources including files, tablets, streams, videos, and sensor data, explains Norris. Data fabric supports a converged processing layer for file operations, database functions, machine learning, data exploration, and stream processing. “This supports real-time, complex data flows to drive automated processing while supporting traditional SQL for existing analysis and reporting needs.”
According to Norris, the biggest step in modernizing a data warehouse is to take analytics out of batch, historical constraints and focus on injecting analytics into business functions that can impact the business as it occurs. “The focus is on mission critical, real-time applications and data flows,” he says.
The most significant changes between traditional and modern data warehouses are that traditional data warehouses provide detailed analysis to understand what happened—modernizing this infrastructure thrusts analytics into impacting data as it happens. According to Norris, as threats occur, risks minimize; as customers engage, revenue optimizes; and as items are manufactured, quality improves. “This is the result of an underlying data fabric and converged data platform.”
In the past, businesses only queried business data coming out of business systems. Now, McGovern points out that opportunity to incorporate new sources that benefits businesses and customers—organizations can integrate a variety of new data types including sentiment and spatial, enabling businesses to know more about products, customers, and services.
“There is no longer a singular source of data. Acquisitions, mergers, regional, and departmental solutions have strapped IT with solving the problem of gaps in analysis, preventing a 360-degree view of their business,” explains McGovern. “Many of the new data types provide a large volume of data to organize and prioritize before it offers benefits.”
Modern data warehouses incorporate machine learning and pattern analysis to bring deeper insights to traditional reports. Birouty believes the combination of analytic techniques yields richer insights upon which better business decisions can be made. “Modern data warehouses also support new types of users, like scientists, who prefer tools such as machine learning and AI—along with new languages such as R and Python—so they to code analytics exactly the way they want them.”
Businesses Consider Modernization
A variety of industries seek modern data warehouses including financial services, media, retail, and healthcare industries. According to Birouty, the key differentiator is the drive for winning. Companies who strive to beat competition, provide better customer service, and drive out operational inefficiencies look for modern data warehousing.
Businesses explore modern data warehousing solutions when seeking new methods to integrate data types or address more complex queries for business users or customers, says McGovern. This includes industries like automotive, healthcare, industrial machinery, life sciences, oil and gas, public sectors, retail, telecommunications, and utilities.
Competitive pressures arise when organizations need to analyze or understand more or different data. Demands stem from business stakeholders focused on lowering risks and increasing opportunities.
“If an organization finds themselves in a situation where they need new data types to make more informed decisions or questions can’t be answered, then the existing data warehouse isn’t providing what the business needs,” says McGovern. He believes deeper statistical analysis goes beyond the capabilities of a traditional data warehouse and requires an infrastructure that integrates all data types and environments, provides advanced analytics, and is easy for IT to manage and modify for business requirements.
Businesses modernize data warehouses to better understand the greater context of customers, competitors, and ecosystem partners. “Organizations that make the most appropriate adjustments the fastest have the greatest competitive advantages,” says Norris. He believes traditional data warehouses suffer from inherent delays due to data loading, movement, and processing. “Modernizing these infrastructures provides a lower cost method to perform existing tasks while establishing a strategic platform to pursue innovative applications.”
Market changes occur quickly and Birouty believes the time for big data warehouse modernization is now. “Companies need deeper business insights to differentiate themselves from the competition and deliver better products run streamlined operations, increase revenue, and profits,” he says.
Big Data Warehouse Offerings
Several providers offer modern data warehousing solutions.
Amazon Web Services (AWS) offers Amazon Redshift, a fully managed data warehouse that analyzes data using standard SQL and existing business intelligence tools. It allows users to run complex analytic queries against petabytes of structured data using query optimization, columnar storage on high-performance local disks, and parallel query execution. Amazon Redshift includes Redshift Spectrum for users to directly run SQL queries against exabytes of unstructured data in Amazon S3.
Cloudera Enterprise Data Hub includes applications like Analytic DB, Operational DB, Data Science & Engineering, and Essentials. Powered by Apache Impala, Cloudera Analytic DB brings high-performance SQL analytics to big data. It allows users to share data and converge multiple applications, frameworks, and users with open standard tools for SQL. As an open platform, it’s designed to meet new and changing business needs.
Hortonworks Data Platform (HDP) is an open Hadoop data warehouse architecture with capabilities for data governance and integration, data management, data access, and security and operations. It’s designed for deep integration with existing data center technology and enables enterprises to deploy, integrate, and work with structured and unstructured data. HDP gains data insights in the data center and public cloud including AWS, Google Cloud Platform, or Microsoft Azure.
IBM Db2 Warehouse on Cloud is a fully managed cloud data warehouse service powered by IBM BLU Acceleration technology for increased performance and optimization of analytics. It features elastic scaling to scale cloud data warehouses to meet business performance requirements and optimize costs. Db2 Warehouse on Cloud is managed, monitored, encrypted, and backed by IBM. It’s also compatible with Netezza and Oracle and offers free tooling for edge cases.
MapR Converged Data Platform version 6.0 runs on premise and on the cloud. It integrates Hadoop, Spark, and Apache Drill with real-time database capabilities, global event streaming, and scalable enterprise storage to power big data applications. It offers enterprise grade security, reliability, and real-time performance while lowering hardware and operational costs of applications and data. According to Norris, it includes underlying data fabric and high scale, real-time processing with consistent low latency. MapR supports open source projects and uses industry-standard APIs to provide a frictionless method of developing and deploying new applications to meet stringent production runtime requirements. MapR Converged Data Platform targets Global 2000 markets.
Microsoft’s SQL Server 2017 allows users to build modern applications on premises and in the cloud on Windows, Linux, and Docker containers. It offers BI and analytic features like PolyBase for T-SQL query across Hadoop, tabular BI semantic model, master data services, data quality services, in-database advanced analytics, and end-to-end mobile BI on any device. The SQL Server 2017 features in-memory ColumnStore, real-time operation analytics, buffer pool extension to SSD, and adaptive query processing.
Oracle Autonomous Data Warehouse Cloud is built on the self-driving Oracle Autonomous Database Cloud technology and uses AI to deliver reliability, performance, and elastic management to enable data warehouse deployment in seconds. It is a fully autonomous database that uses adaptive machine learning that automatically optimizes indexing and caching—helping to reduce CPU consumption to deliver more value. Unlimited concurrent access combined with advanced clustering technology enables businesses to grow data stores without downtime.
SAP HANA Data Management Suite is a general purpose database designed around in-memory computing. It features end-to-end data orchestration and multi-tiered data management in memory, disk, cloud, and Hadoop. It also includes analytics engines and multi-model capabilities. It is deployed on-premise, hybrid, or in the cloud. SAP HANA is intended for general purpose transactional and analytic workloads scaling from data mart to large data lakes. “SAP HANA Data Management Suite was designed as the core of a modern data warehouse and as transactional and analytics workloads combine into a single data set it is uniquely positioned to handle the blended workloads of the future,” says McGovern.
Teradata Analytics Platform is designed to see customers’ needs with integrated analytics. It runs in the Teradata cloud and the public cloud as well as on premise installations on Teradata hardware or commodity hardware. “Companies can freely move their licenses any time with full credit. Analytics are bundled to allow companies to immediately start realizing business value,” says Birouty. The Teradata Analytics Platform integrates Teradata and Aster technology to offer a variety of advanced techniques to prepare and analyze data within a single workflow. The Aster engine executes more than 100 prebuilt advanced analytics techniques including graph, pattern, path, text, machine learning, and statistics. The platform is intended for Global 3000 markets.
Get Data
Businesses implement modern data warehousing solutions to classify data and understand where it is and where it needs to be. While traditional data warehouses provide detailed analysis to understand what happened, modernized data warehouses use analytics to impact data as it happens.
Mar2018, Software Magazine