Establishing the Analytical Relationship Between Internal and External Data

Data Challenge Status

We’re en-route to creating a new business application using our EVLVE Enterprise Data Fusion system. Follow the progress here.


We’ve spent the majority of the time so far on the strategic planning around organizational priorities and reviewing the potentialities of the data. Generally, in successful data activation projects, this is where most time should be spent. The planning stages are where the ideas are explored, objectives are clarified, and the foundation is set for an application as it enters development.

In an agile data and development environment, there shouldn’t be an expectation that the planning stage resolves all questions, but this period of planning and discovery does provide the starting point for the rest of the process. As we move from Planning to Analysis, we will continue to return to the original findings and process artifacts to assess whether we remain in touch with the original objectives identified at the start of the process.

Understanding analytical relationships in data

In an earlier post we looked at the data coming from the external data provider to make sure that it could support the insights we were looking for, and that the external data could be connected to the internal data. That was a structural review of the data. Without some kind of shared variables that could relate the new data to existing data, it wouldn’t be possible to align the different data sets.

At this point of the process, we initiate an analysis of the data itself to uncover what the data from a bank can tell us about the new data that we’ve acquired from a data provider.

For privacy reasons, we can’t share the specific analysis of internal bank data, but we can provide a summary of what the analysis of data from a bank can tell us about the new data set.

In the earlier review of the data structure, we identified that the two most complete variables from the new data provided by Leadbird was the business address and industry type. This might not seem like much, but powerful data insights don’t always have to rely on a huge number of variables. Sometimes great insights can come from just one or two key data points. In this case, the internal and external data can be related on the industry codes for each of the businesses, as well as the geographic location.

Using a few segmentation techniques, we can identify which are the commercial services used by existing bank customers of various industry types. Below is a sample of what this analysis might look like for Financial Advisories.


For businesses in this category, the use of each of the commercial services can be indexed according to the level of use. This indexing enables the popularity of each commercial service to be weighted within the industry category, while also relating the popularity of services across industry categories. In the above analysis, ACH Origination is the most popular commercial service among Financial Advisory businesses.

A separate analysis can identify whether there’s a specific cadence to when businesses of various types add different commercial services. In the above example of Financial Advisories, ACH Origination appears to be the most popular commercial service overall, but when a secondary analysis is conducted to evaluate when these services were added, the results may look a little different.


When business maturity is considered, the prioritization of commercial services changes a little. While ACH Origination is still popular, it’s less important for new financial advisory firms than payroll and online wire services.

Since our source of new lead data is comprised of newly opened businesses, the service recommendation model should reflect the ranking of commercial services for business less than 12 months old.

Finally, geographic location can be added to the mix to refine the analytical model, in case there’s some variation by area of operation. For banks that operate within a constrained footprint, the impact of geographic data may be statistically insignificant, but for banks that operate in notably different market areas (urban vs. rural, for example) location may have a notable impact on the services that some businesses require.

Activating the Analysis and Making Future Adjustments

With the completion of the statistical analysis, the findings can be applied to the new data as it comes from Leadbird. This analytical model will then be used to identify product fit to the new data, and incorporated into a product recommendation score.

Over time, the model should be recalibrated annually as customer behavior changes. While the specific product fit is unlikely to change drastically in the short term, over the course of a year, customer preferences may adjust as business practices change or as direct sales efforts increase the use of certain products.

In the next post we’ll look at how the analytical model is pushed back into the end output, and activated in a business application.

Datanova Scientific