Five Steps to Building a Data Management System that Makes Truth Sustainable

“The truth is rarely pure, and never simple.”

– Oscar Wilde


I was meeting with the Chief Information Officer for a mid-sized community bank a couple months ago. The bank was completing a recent merger and we were discussing how they could be using their data to move forward and support various banking activities. Halfway through the conversation the CIO looked at me and said, “I just want the whole truth in one place. Why does that have to be so hard?”

The aspiration for a Single Source of Truth, and a simultaneous frustration with the process for creating it, is a common sentiment among information technology officers in banks these days. Banks are using an increasing number of systems that generate data, and access to all this data can be transformative. Data* provides the fuel for faster, more accurate decision-making and automation that can transform an organization’s potential. However, not having a complete and accurate view of customer information and activity can lead to significant opportunity cost and even large financial losses.

Creating a sustainable Single Source of Truth can be difficult. So difficult, in fact, that some organizations we’ve worked with had nearly given up hope of ever accomplishing this when we first met them. But it’s not impossible, and with the right tools may even be easier than you thought.

Building a Sustainable Source of Truth

Creating an accurate, sustainable, and accessible data management system requires a strong, but flexible process foundation that can evolve as sources and needs change.


Step 1: Connection

Data is everywhere in banking. Every system, from the bank's core processor to one-off vendor systems, generates data of one kind or another. Most of this data has limited application as long as it remains locked within its source system.

Hub-and-spoke integration vs. point-to-point connection

For a long time, banks have attempted to integrate their data through a point-to-point model, where one system is connected to another, and then another system is connected to that. This may work when only a couple data sources are being connected, but when more sources are added, the system quickly starts becoming a daisy chain of linked fields, unique identifiers, and conflicting data.

Over time, a point-to-point integration architecture leads to exponential growth in cost and maintenance burden.

The better solution is to connect systems through a hub-and-spoke model, where each system is connected to a central data hub and then data is passed through the hub and out to the component systems as needed. This approach enables greater flexibility for adding new sources and reduces the number of integrations that need to be built and maintained. (See figure 1)


Figure 1. Simplifying System Integration with a Hub + Spoke Model

Connecting data systems through a central hub exponentially reduces the cost and complexity the system. In the below example, six different sources require 30 integrations when a point-to-point integration is used, while a hub + spoke model reduces the number of integrations to 6.

Point to Point Integration

Not all hubs are created equal

The solution for many banks seeking to integrate their various data systems has been to tie their data together through their core processor. However, while core processors may excel at supporting key bank processes and functionality, they're not necessarily great at centralizing data outside of the data that they're already collecting.

Instead, consider a separate database system that has the capacity to house a lot of data, can be connected quickly to a range of technologies, and provides the flexibility to add new data sources and variables over time.

Avoid rigid database formats

Be wary of traditional data warehouse systems that tend to be rigid and require a lot of time to update anytime there's a request to add a new data source or even a single data field.

Instead, consider some of the latest advances in data technology. There are new systems now being offered, such as managed NoSQL systems that are more flexible and can store a wider range of data types and formats than the traditional data warehouse.

These new systems may actually be less costly to set up and are also able to evolve much more efficiently over time to new data sources and use cases.

Cadence of Implementation

I had a conversation with a bank CEO recently that was trying to resolve an issue integrating a service with their core processor. They were using the service's API to extract data in real time, and the cost and effort were prohibitive. However, it became clear that they didn't need a real-time data pull. Instead, they would get just as much value from a daily batch export from the core. The whole process could be implemented in a couple weeks and would cost a fraction of the work connecting to the core through an API. The consuming system (a NoSQL data manager) was set up to work with real-time data, so if the bank ever goes to the core API – it would just slip right in.

It is important to create a roadmap of features you will need, such as real-time, and organize them into an architecture. This allows you to save time and money by implementing only what you need today, while still applying this investment to the future.

Step 2: Unification

By connecting your data sources, you've put all your silos in one place. However, they are still silos. To get any utility from this data, you need to unify these silos to show the big picture.


Getting the big picture on your big data

Every system stores data differently. Each data source will have a different format, model, units, normalization, structure, definitions, etc. In an effective Unification process, the quirks of each individual system are negotiated into a single, uniform set of operational conventions.

When Datanova sets up data monetization for a bank, one of our most important meetings is our data mapping session, where we work with stakeholders from throughout an organization to buy into an ontology— a centralized and uniform naming, format, and structure etc. for the bank’s data — while also establishing the principles for bringing the data from each component system into alignment.

The mapping process should generate a meaningful picture of the data that your bank has available and what it means about the contacts, accounts, and activities that the data is describing.

Some of this mapping is simple, for example, some systems may have “Full Name” for a contact, while other systems may use a “First Name | Last Name” structure.

Other ontology concepts are more complex. For example, how does an organization define a “sale”. Is it when a new account is opened? Is it when a first deposit is made? Is it when a confirming email is received? The specifics need to be precise and aligned for data systems, so they can perform with speed on millions of records and still generate accurate results.

Consider the following example below of how Data Unification brings together multiple data sources and unifies this data into a system-wide ontology. (Figure 2)


Step 3: Fusion

Once data has been unified, the data needs to be fused, a process by which the data from one system is joined and conflated with data from another. Without Fusion, the value that can be extracted from unified data is still limited.


Joining data with and without natural unique identifiers

It’s important that records can be effectively joined across systems. Within a traditional relational database, this is done using unique keys. However, we are creating a system of systems. There are often no natural keys that connect data across a full system of systems. As a result, things can get tricky. The same entity (e.g. Joe Daniels) will have multiple contributing records from different databases.

In some cases, natural keys such as SSNs can be used to join records across unified systems. However, this option is not always available. And, unfortunately, given the number of systems that banks use today, it’s fairly common to encounter systems and sources where no identifier can be found or easily created.

Fortunately, some advanced Fusion systems provide specialized algorithms and machine learning technologies that can combine data through compound variables or even through a complex combination of variables using mathematical calculations and logic.

Ultimately, the ideal situation is to produce a Golden dataset where multiple records for an entity are be brought together and interlinked to form a single, complete picture.


In the above example, data has been pulled from multiple sources, including an exported report from a third-party credit card service provider. In a real-world situation, a fused record could include a full picture of all the accounts and account history across both sides of the balance sheet for a single customer.

Imagine doing cross-sell or fraud prevention with this kind of cohesive picture.

Step 4: Trust

Once data has been fused, there's an enormous amount that can be done with the data. But the application of this data could still be limited by one more factor: trust.


Providing transparency into the Single Source of Truth

While aggregating and fusing data can make it more institutionally valuable, it also separates it from the sources and contextual factors that help users understand its meaning and accuracy.

Imagine an analyst sending you a “cleaned” data set in an Excel document. In the sheet, a number of data sources have been brought together into a set of unique customer records. Everything looks great.

But then, as you scan down the list, you examine one of the records and it doesn’t look quite right. Maybe you just helped sort out a technical issue with the customer, and you recognize that their email is different than the one you were using to communicate with them. You pull up the bank's CRM, and the email is correct there. So where did the error come from? And how pervasive is the issue? Suddenly, your faith in the entire dataset is shot. Do you kick back the work and set the project back by weeks? Or do you hold your nose, let it go, and hope? You may never know the damage, but it could mean anything from a missed opportunity to shattered relationships. 

This is where a Data Trust system comes in. A good data management system enables users to identify the source of every datum, the full chain of custody for the datum, and how the datum is supported by other component data systems.

These are the three factors to establishing data trust:


Pedigree refers to where data comes from. For example, the pedigree of a contact's phone number could come from the bank's CRM, while the pedigree of a contact's mailing address change may come from their online banking account. Identifying pedigree for data is important because it enables users to evaluate the source, and also enables them to return to the source to provide any fixes.


Provenance refers to the chain of custody on this data. For example, who made the most recent update to the email address in the source data system, and how that data point had been established historically. This enables users to manually confirm a data point, as necessary, and it also enables system administrators and those responsible for data governance to identify any sources of regular data corruption.


Corroboration refers to how any data is supported by data from other systems. Consider our email example. Imagine that, Instead of just seeing an email that seemed wrong, there was also a record of five other systems that were using that same email, some of which you know would have generated an error if the email were inaccurate. In that case, the strange email now can be understood as an alternative email to the one that you're using, rather than a wrong email. Alternatively, if no other system had that email on record, the likelihood of that email being wrong seems much greater.

Institutionalizing advanced data management across an entire bank requires that users believe that the data is as accurate as it can possibly be. The minute users start losing trust in the proposed institutional source of truth, they start to create alternative methods to ensure the accuracy of their respective systems—that rogue customer spreadsheet, perhaps, that marketing has been using to send mailers—and the effectiveness of the entire system begins to suffer. It's a negative spiral that's hard to pull out of.

Data trust isn't a luxury. It's one of the keys to long-term sustainability.

Step 5: Utilization

You’re not building a museum.

Sure, a Source of Truth can be a profoundly satisfying thing to look at. Stand back. Enjoy it ... Done? Ok, now it's time to do something with it.


Applying Analytics, Strategic Use Cases, and Automation

Moving forward, the key to a sustainable data system comes down to how that system is activated for people. The good thing is that if you've attended to the previous steps, the issue won't be about convincing people to trust and use the data, but about ensuring that it's being used effectively and responsibly.

The right people need to be empowered to apply oversight to the quality and use of the data, and any system that uses the data to provide insights, analytics, and even automations need to be integrated and set up to use the data in ways that reflect the values of the bank, and don't create any compliance issues. For example, you don't want any customer information that indicates ethnic characteristics to enter lending decisioning systems.

There are a number of ways to proceed from here, but the one thing to consider ahead of time is that your data system is built on open standards to ensure that as your bank identifies strategic applications for data, accessing this resource is simple, easy, and flexible.


What happens when the future is now?

If, like most bank technology leaders, you've spent a vast amount of time worrying about how to get data under control, then you might be a little hesitant to even start thinking about what you could enable your bank to do once you've effectively unified data.

Here's just a short list of some of the things that are possible for banks once their data has been transformed into a Single Source of Truth, just to get the thinking started:

  • Predictive customer targeting and marketing

  • Advanced ROI measurement

  • Improved customer retention

  • Proactive service issue resolution

  • Improved compliance modeling

  • Operational error tracking

  • Enhanced loan decisioning

  • Real-time ALM reporting

We're just at the beginning of an evolution in bank data utilization.

Datanova Scientific