Anatomy of an External Data Set

Understanding the Strategic Potential and Structure of a Third-Party Data Set


Data Challenge Status

Datanova is tracking the process of creating a new business application using the EVLVE Enterprise Data Fusion system, starting with the planning and ending with activation. Follow the progress here.



External data, whether pulled from a public data source like the U.S. Census or acquired through a data broker can be a vital resource for organizations. External data, when properly gathered, cleaned, and vetted can identify new opportunities for a bank. External data can also enrich existing data sets to enable deeper analysis about the appropriate fit for products and services.

Much has been written about what to consider when acquiring external data, so we won’t cover that here. We’ll just note that data purveyors vary in quality and reliability, and finding a quality data broker is important.

Instead, we’re looking at what to do after an external data source has been identified so that the data can be folded into an organization’s processes, maximizing benefits while limiting risk, confusion, or redundancy.

As we mentioned in our prior post, we identified Leadbird as an interesting source for external data. We met them at the 2018 ABA Marketing Conference and thought that their offering was unique and interesting for banks. Leadbird collects data on newly opened businesses, which are an important segment for banks, since the opening of a business is a key time when businesses make a lot of decisions about their financial infrastructure.

Care must be taken not to overwhelm new business owners with various commercial service options, as this may actually reduce the number of services that a business ends up selecting. A seminal study of jam marketing showed that more options can inhibit a buy decision. Conversely, fewer, more targeted, options may drive significant improvement in uptake, when applied at key times in a business’s life cycle.


Analyzing the Flat Data Model

After identifying Leadbird for its strategic value to banks, we needed to look at the data itself to understand how the data can be activated and incorporated into an application. Specifically, we examined the data model from Leadbird, which is a fancy way of saying that we requested a sample data file from Leadbird and looked at the following things:

  • The data fields that are available;

  • The format of the data in each of the fields; and

  • The meaning of the data within each of these fields.

At the same time, we also conducted a basic review of the quality and completeness of the data that are within each field, since this determines the way that the data might be used.

Leadbird provided us with a 1,000 record sample file, which we used to create a data model for this source. We plan to use the following information from the LeadBird data model to help solve the data challenge.

Leadbird Flat Data Model.jpg

Understanding the Strategic Implications

The review of the flat data model provides us with some key information about how the data can be incorporated and how it might be used. Combining this model review with some core hypotheses that we have about the potential implications of this data, we identified three key areas for consideration:


1.       Integration with core bank data may require multi-variable algorithms

One of the key things to consider when examining an external data set is whether/how that data will be fused with data that the bank already has, to reduce redundancy or duplication.

Fusing this external data with bank data will help make sure that the bank does not reach out to the same business multiple times over time, without knowing about the previous contacts. It will also provide correlation with existing and new business customers at the bank, so that the impact of marketing efforts and sales engagements can be tracked over time.

Because the external data and the bank’s internal data don’t share any simple join fields (e.g., a unique identification number), more complex relationships will be required to fuse the Leadbird data with internal bank customer data.


2.       Industry (SIC) will be a key variable for analysis

Creating a new application often begins with making assumptions about what the data might prove. These assumptions are then tested and either proved true or disproved. A common-sense assumption that we’re making based upon past conversations with bank commercial service managers is that business type can have a significant impact on the kinds of services that the business will need.

Since SIC code is provided in the Leadbird data for 100% of the records, we immediately identified this variable as a strategically valuable data point that we can test to see whether it can be used to segment the records and help identify best product fit.


3.       Geographic location will be a key variable for both analysis and activation

Obviously, a business’s address is an important variable for delivering a message. Despite the move towards digital advertising, direct mail remains one of the best methods for marketing to customers. Additionally, refined digital location targeting means that address data can increasingly be used to digital messages as well as offline messages.

However, address is not just for delivering messages. Location provides important environmental context for businesses. Businesses operating in different areas may be facing different conditions, which could significantly alter their banking needs and service requirements. We’ll be testing a number of geographic factors to identify possible implications for segmenting and identifying ideal product/service fit.

Technical reviews of data sources, like the one above, are the foundation for a successful data practice. Many of the problems that end up costing organizations considerable time and money in data systems start with mistakes made in understanding the underlying data structure. A little time spent here can save a lot of time later.

Betsy Bates