By Zak Pines
An investor once told me, “if you don’t have a daily view of your business KPIs, you don’t have a business.” This view of key metrics is no longer optional, it’s mandatory. Boards of directors expect executive teams and department heads to have this view at all times to manage a successful business.
But a dashboard with incomplete or inaccurate data is just an empty box. The majority of organizations struggle to get data into a format where it can light up a powerful visual insight.
Overcoming BI Challenges
The heart of the challenge lies in unifying customer data spread across many systems. The typical small- to mid-sized business now uses more than a dozen software as a service (SaaS) applications (apps). Data is spread across customer relationship management (CRM), marketing, customer support, and finance, not to mention a bevy of specialist apps like chat, ecommerce, and subscription management. Each system has a unique data schema, and thus the process of bringing that data together and preparing it for analysis accounts for approximately 80 percent of the time spent on an analytics project.
Business intelligence (BI) software such as Amazon QuickSight, Looker, Microsoft PowerBI, Tableau, and YellowFin give companies business views that weren’t possible before.
But, for these tools to fulfill on their value to business users, IT teams or data analysts need to interject themselves between the day-to-day business applications and manually extract data, merge and normalize disparate datasets, and wrangle the data into a common format.
To get the most out of data visualization and dashboards, businesses need to automate intermediary processes and eliminate the gap between apps and analytics. Here are four key steps that businesses now have the potential to automate in this data pipeline, with tips on how technology can make it happen.
Connect Data Sources
The first step is connecting data sources. Modern solutions can now be used to automatically connect to apps via their APIs. When choosing a tool, you should be sure it has the ability to preserve the objects—e.g. lead, opportunity, ticket, campaign, contact, account, activity, company, product, order, and user—and their relationships to one another. Because the number of SaaS applications has exploded, you may have relationship mappings where one system calls an object one thing, and another system calls it something else. Mapping these objects and their relationships will be mission critical for analysis.
Consolidate and Unify Data
After you connect discrete data sources, you need to integrate and unify the various data sets. Here again, technology can speed up the process. Normalized data makes queries simpler and faster by organizing records consistently across all tables, but it doesn’t have to be done manually anymore.
As you consolidate data, sometimes there are so many duplicates in your applications that bringing data into a central system can seem daunting. In some cases, you may have duplicates within a single source application, leading you to think that your data is too messy. To avoid bringing duplicates into a consolidated data set, de-duplicate data within the data silos first or load non-duplicate data, and flag duplicates for clean-up later.
You will also want to account for inconsistent formats. Many dates and picklist values for fields such as states or countries, while they might denote the same thing, will use a different format. To create consistency, look at how data is input and create rules for standardizing the formats across systems. This reduces the amount of normalization needed later.
But what if a customer recorded in two different systems has a different address in each? Because such conflicts are so common, resolving them by hand is impractical, so there are two automated approaches to consider, System of Record and Most Recently Updated. With System of Record, you can automate which system overrides the other to ensure one record gets used for any given piece of information, ranking systems based on your business priorities. In contrast, Most Recently Updated uses the data that was last modified across systems, for a given field. So if a customer’s phone number is different in your CRM versus your support system, the most recently updated field would be used.
In addition to the consistency in formats, matching like records across systems with common identifiers is also important for modeling and crafting a standard schema. When matching contact records, for instance, an email address is a common identifier that offers the highest probability for a unique match across systems. As you automate your data pipeline, it’s possible you might incorporate multi-level de-duplicate keys such as name, company, and address.
Consolidating data is the most time intensive step, as you move from multiple datasets to a single unified data set.
Warehouse the Data
A third piece of your data pipeline worth automating is the warehousing of data. Traditionally, each system’s data would enter its own warehouse. Data from your CRM would enter its own CRM database; data from a marketing automation system, a marketing automation database; and so on for support, enterprise resource planning, and events.
This siloed approach constrains analysts, however. Rather than run SQL queries against all of these systems in one, analysts would need to be familiar with how each separate database is organized. Parsing which idiosyncrasies belong to which system makes joining tables time-consuming and analyzing data laborious. Even if your data are immaculate, the BI dashboard you use for visualization can only access one of the many SaaS applications you use at one time.
A more scalable approach is using a fused database from step two to access your set of records and all of their relationships to other objects, simultaneously. With this approach, when data inputs invariably change, connected data sources won’t get out of sync across silos.
Another advantage to this approach is creating a universal schema of all objects and their relationships to one another. Mapping relationships between these matched records into a single schema is best done using a graph database, allowing users to query, pull, and utilize graph databases without having to spend time organizing it into distinct relationships themselves.
This automated master dataset approach modernizes the traditional waterfall approach where data prep, ETL, warehousing and modeling occur as discrete steps, adding distance from the source data and the resulting dashboards. By tying it together in a continuous pipeline, data is instantly ready for analysis.
Feed Analytics, Reports and Dashboards
In the fourth step, centralized master data is be fed into your visualization tools and dashboards, and updates to the data show in near real time.
Data, Clean and Visualized
Consumers of dashboards—executives, department heads and managers—now have a view into their business, using a continuously updated, trusted data set. Analysts can spend time on analytics and insights, not the painful step of data prep. And IT teams can support a scalable automated process vs. having to put out data fires.
Automation creates a tipping point for the visualization and dashboard tools. How will we know we’ve hit this tipping point? Every business will have a view of their KPIs and customer data—not once a week or month, but through dashboards that update continuously and of their own accord.
It’s not just about giving leadership teams the visibility that boards are expecting they have. It’s also about creating sanity for analysts and IT.
By automating your data pipeline, you remove that 80 percent data wrangling step. You can put an end to the consolidated report fire drill which sounds something like “just get me this one report I really need, now!” Next time you can say, “the data is already consolidated and clean. Go take a look.” SW
Zak Pines is the Moneyball Marketer, a data-driven marketing leader closely aligned across departments around driving revenue growth. Pines is the VP, marketing, Bedrock Data, and has worked in marketing technology and business analytics for over two decades.