Big data. Digital transformation. Agile development. There are lots of words to describe how our personal and professional lives are awash in data and how we manage it. We need to know both what this means, and what it means for us in our role in finance.
This article aims to help you familiarize yourself with key terms of the information age, and the implications for FP&A and treasury.
We need the skills to analyze to stay relevant, and the company that makes smart investments in business analytics will have a key resource
As the name implies, structured data implies a top-down approach and is part of an overall enterprise architecture: “a well-defined practice for conducting enterprise analysis, design, planning, and implementation, using a holistic approach at all times, for the successful development and execution of strategy.”
The data model is the organized structure of the data: what the data is, how it enters a database and it is accessed by users, including potential changes. The data sits in a database and has a set of rules (schema) about how to access the data.
At the root, there are tables of data, much like a single Excel spreadsheet. Just like you can have multiple spreadsheet tabs in a workbook, there can be multiple tables in a database.
Tables can be searched through a query, or an instruction to search the data tables, and this output of this query itself can become a data object. Standard queries can be come reports or views on the data.
This describes the typical relational database, where the data can be queried based on the relationship between the objects described.
Data can be thought of having “dimensions” or key characteristics. Think of a typical graph with variables plotted on the X and Y axes, each of these a dimension of the data. A third dimension would be a Z axis and you can think of a data cube.
Many relational databases today will have seven to 10 dimensions of the data that allow you to “spin the cube” of data.
The data warehouse is data gathered from multiple systems (think of transactional data here), then accessed through a data mart (think of a market where you request and query what you need).
Why It Matters
The amount of data available to analyze is growing exponentially. More data has been created in the past two years than the entire history of the human race.
While structured data is estimated to be only about 20 percent of current data, it is the main source of information that we in finance use and create. Our enterprise resource planning tools are based on structured data—GLs, point of sale, inventory.
Additionally, call center logs, Internet of Things data from equipment sensors, and website data points are all structured data. However, for all the data (structured and unstructured) we create, only 0.5 percent will be analyzed.
The availability of data is in front of us. We need the skills to analyze to stay relevant, and the company that makes smart investments in business analytics will have a key resource. This implies good hiring and training allocations for people, and constantly upgrading the systems that can harness this increasing bounty of information.
As finance professionals, we need to become partial data scientists who can dig through and find the right data. At the same time, separating out the signal from the noise will become the key human skill that will separate us from machine-learning algorithms.
However, there is a more important question: How do we weigh human judgment against infinite data? Do we build models to predict the future in order to minimize human biases, or do we take the automated outputs as guidance and then layer on judgment? And where is the line of demarcation between model and judgement?
The boom in both structured and unstructured data is the defining characteristic of our time. Finance needs to watch for two potential pitfalls
I don’t have a firm recommendation, except for this: It will become an issue when there is a large forecast variance and someone says, “That is what the model told us!” or when the entire team sweats out the forecast only to have it overruled by senior management, saying, “This can’t possibly be true… change it by X percent!”
I recommend we have this discussion, and revisit it, before compiling our forecasts and budgets.
Unstructured data is easy for people to understand, but often difficult for machines because it does not lend itself to the codified rules of a data model.
It may be textual—think of emails where individuals write their own subject line, and the body contains unique actions, summaries or content. It may be visual—think YouTube videos or the photos on your phone.
For data analysis, the breakthroughs have been in developing tools that can read this data and add structure in a way that allows for analysis. Examples include geo-tagging, facial recognition, text-recognition, and audio-to-text.
All these tools rely on intensive processing power, and have led to a wave of IT innovation. For example, Hadoop refers both to the hardware and software that takes huge data sets, distributes them across hundreds or thousands of servers through its distributed file service, and then responds to a data request while using MapReduce to locate the relevant data and process the query.
The innovation is using multiple servers and multiples of processors to solve the query in contrast to a relational database that brings fewer processors to the task (but needs fewer because its data is defined by dimensions and other tags).
It is analogous to Tom Sawyer painting a fence by himself versus having hundreds of friends simultaneously brushing. Hadoop and its ubiquitous elephant logo are governed by the Apache Foundation.
Another advent supporting big data is a so-called “data lake,” which is an architecture that seeks to store all data available, and perform the jobs of sorting, classifying and organizing at the time of analysis.
As a result, data collection and preparation time are greatly reduced as compared to a data warehouse, and the data sets are considered to be large and sometimes mistaken for being comprehensive. The analysis and manipulation of unstructured data require some tools, including NoSQL, (variously called Not- or Not-Only-SQL).
As you can tell, the underlying requirements for these tools are massive server capabilities and communication—that is, cloud computing. A query can be sent to the cloud, processed at the server farms, and a response sent back.
The boom in both structured and unstructured data is the defining characteristic of our time. Finance needs to watch for two potential pitfalls: effectively challenging the outcomes of the analysis presented by our partners, and how we invest to build our big data capability.
Spurious correlations. There is a common conception that if data is captured, then there is no longer a need for sampling or even developing theories/hypotheses about the data because we can simply test everything. This is called the “N=All” view.
We FP&A professionals need to challenge the assumption that big data has all the answers
However, huge data sets can lead analysts to find spurious correlations that don’t hold true meaning. The website Spurious Correlations lists examples of correlations that can be found in data that are coincidental rather than factual. For example, the consumption of cottage cheese is correlated with the number of PhDs in civil engineering.
Also, the assumption of complete data is not the same as actual complete data. It may be possible to analyze the transcripts of every customer service call to take the pulse of your customers, but that should not be confused with the attitudes of all your customers!
The caution here is that sample error and sample bias can exist in big data. For some historical background, check out this story involving the start of the Gallup organization. We FP&A professionals need to challenge the assumption that big data has all the answers.
Investing in capabilities. To look at an investment in big data means more than just signing a contract with a vendor. It means building a corporate capability. That implies sub-investments in people, process, and company assets.
- People: As with most new ventures, there is no substitute for expertise. Analyze your current team to see if they have the right skills and, if not, bring in experts. Organizationally, does it make sense to create new positions in the org chart such as a chief data officer? What hardware, software, or telecommunications bandwidth do you need?
- Process: Start with small projects and develop a learning methodology as you go. Develop a list of questions and hypotheses to test the capabilities and utilization of your data. Measure the amount of time spent in pre-processing (extraction/transformation, and loading) versus analysis.
- Assets: These projects may be technology-light if the servers and analytics are outsourced, or technology heavy if you host them in house. The answer depends on the interaction between the business leaders and technology leaders.
All three elements can be supplemented through third-party partners to act as a bridge or as a fixed solution. Data projects gather lots of investment dollars and interest because they can pay off handsomely for an organization. Our job in finance is to ensure that scarce capital is allocated well.
About the Author
Bryan Lapidus, FP&A, is a contributing consultant and author to the Association for Financial Professionals (AFP). This story amalgamates two articles that first appeared on the AFP’s website, Structured Data Vs. Unstructured Data for FP&A and Treasury and Structured Data Vs. Unstructured Data for FP&A and Treasury, Part 2.
Copyright © 2017 Association for Financial Professionals. All rights reserved.