Skip to main content

As financial scandal, driven by Enron and Worldcomm, unfolded in the early 2000s, I decided to explore the applicability of my area, data quality, to financial reporting.

From my work as a consultant, I was well aware that the financial services industry, like many others, was bedeviled by poor operational data. I wondered if data quality applied to larger issues as well. My first step was to make sure I understood an

income statement


balance sheet


cash flow

. I purchased some books and spent an hour every morning reading. But after a month, I was more confused than ever.

I was sorely embarrassed. After all, I am a Ph.D. statistician and have spent half my life working with financial services firms. How could I not understand something as simple as a balance sheet? The first person to whom I admitted my lack of understanding was a Wall Street veteran who responded, "Don't worry, Tom. Eighty percent of Main Street investors don't understand them either." I talked to many others, but matters only grew bleaker. A pitiful few, at best, understood financial statements. Worse, many statements contained such serious errors that they required restatement.

I find the whole situation paradoxical: Despite the critical needs of investors, business leaders, regulators, and the markets themselves for trusted data, the financial community is stunningly tolerant of poor data quality.

Image placeholder title

In publishing my book,

Data Driven

, I hope to expose the paradox. As

Data Driven

points out, when they are of high-quality and "put to work," data are assets on par with capital and people. Bad data, in contrast, are liabilities.

To be sure, bad data come in many forms. Sometimes the data are simply opaque, as with financial statements. Sometimes the "facts" just aren't so. A recent example is the news report on Sept. 8 of

United Airlines'

( UAUA) bankruptcy. It sent United's stock reeling. Trouble is, the report was 6 years old. A third category of data quality problem is that the data one really need are simply unavailable. Who really knew, for example, what was in those soon-to-be-toxic CDOs (collateralized debt obligations)?

One possible explanation for the paradox is that some people make money when data are bad. True enough. A billion dollars changed hands in the United incident. More generally, the best way to make money in the market is to create and exploit an "information asymmetry." It's quite simple really. You discover something that no one else knows about the true value of a product and trade based on that knowledge. Having data the other guy doesn't, having correct data when the other guy's are incorrect and having deeper insights into what the data mean all qualify.

The problem with this explanation is that just because bad data are in some people's interest doesn't mean they are everyone's interest. When you're going to play a game you didn't invent, you simply must have good enough data to protect yourself from being caught on the wrong side of the asymmetry. And clearly enough, lots of good players got caught on the wrong side. So this explanation does not explain the tolerance of bad data.

Another possibility is that some data, like price evaluations, aren't "facts" but estimates. Also true. But other professions have figured out how to quantify the "goodness of estimates," using confidence intervals, error bars, and so forth. I refuse to believe that the financial services industry, with all its brainpower, can't sort this out.

That leaves me with my final possible explanation of the paradox. It is that investors, regulators, and company leaders neither fully understood the risk nor that something could be done about the problem well enough to "demand better data."

Data Driven

obviates this explanation. It features the "10 habits of those with the best data" and case studies that describe how leading financial services companies have applied those habits. Not through extraordinary investment, stunning intellectual insight, or technological magic, but through good old-fashioned management focus, no-nonsense measurement of quality levels, and a drive to find and eliminate root causes of error. As

Data Driven

explains, practically any company can enjoy the benefits of high-quality data.

Today, as




Fannie Mae

( FNM),

Freddie Mac

( FRE),

Merrill Lynch

( MER), and

Washington Mutual



Bear Sterns


Lehman Brothers

and maybe others fail and markets roil, one can't help but ask "what if?" What if incomes hadn't been falsified on mortgages? What if borrowers really understood the terms of their mortgages? What if ratings agencies and credit bureaus had scored securities and home buyers correctly? What if people really knew what was in those toxic products they were buying? What if balance sheets could be trusted?

When the history of the credit crisis is finally written, "greed" and "denial" will be the leading villains. But high-quality data can be a powerful check on greed and denial. I don't have a definitive answer to the "what if? questions, but I can't help thinking that the crisis could have been shorter, shallower or averted altogether.

Some will argue that this is not the right time to focus on data quality. As the old saying goes, "When you're up to your ass in alligators, it's no time to think about draining the swamp." But that logic is wrong for both alligators and data. For those who ignore the swamp are back up to their asses in alligators soon enough. And those who ignore data quality will once again find themselves trying to figure out what to do next with no way of knowing what "facts" they can trust.

It is time for all to demand better data. Data that form a more complete picture, data that are more relevant, data that are more clearly defined, data that are far more accurate and data that are much better constructed so they are more readable and digestable. Data that come with a "created by" and "created on" labels and data that make the reliability of the source known.

Finally, some will argue that this will take a lot of money and time but add little value. These people are also wrong. For just as a newly drained swamp may become prime real estate, so too can high-quality data drive new growth opportunities. I teach you how to do it in

Data Driven

. For more information, please visit