NEW YORK (TheStreet) -- Jesse Vollmar is man enough to admit that it ain't how big a fella's data is, it's how smart he is about using it.
"It's not about the size and sweep of the information you collect," explained the co-founder and CEO of FarmLogs, the Ann Arbor, Mich., agriculture data analysis firm. "It's about knowing enough to ask the right question from that data that a customer wants to know."
This modest Michigan farm boy has been kindly describing to me, over several eye-opening phone calls, the groundbreaking data analysis of his nine-person so-called "smart agriculture" startup. Vollmar and his team got the idea to dig around inside the digital guts of industrial-grade agricultural equipment from mega-makers such as Caterpillar, Deere and Cummins. It turned out that, just as in the car industry, oodles of data-rich features such as global positioning, computer control and climate analysis were being quietly built into the Information Age riffs of farm tools.
"What we realized was it would be super easy to get lost in the gobs of information no farmer really wanted from a combine," Vollmar said. Instead, FarmLogs figured out what really mattered was the relatively modest-scale question of exactly how much rain fell in each bit of each field a farmer manages. "Literally, farmers drive over and look at their inventory every day," he said. "I grew up on a farm, and it's crazy."
With that specific question in mind, Vollmar dispensed with complex large dataset analysis and modeling and rather focused his inner numbers nerd on comparing the small set of numbers that described the location of farm machines to sized government precipitation data.
Miracle of miracles, FarmLogs could magically tell farmers exactly where rain was being made in their fields. "I don't pretend to understand the deep mathematics behind this," he said. "But all I know is as a business person, it's not the numbers we could touch -- it's answering a specific question for a farmer."
The big data Mulligan
Wouldn't you know it, when I actually asked professional numbers geeks -- who make their bones understanding just the kind of deep mathematics behind Vollmar's less-data-is-more experience -- it turned out that this young midwesterner was on the bleeding edge of the Information Age.
"Anybody with half a brain knows that Big Data is important. And academia did get excited about the potential for large data sets for about a half a second," said David Putrino, assistant professor at the Brain and Mind Research Institute at the Weill Cornell Medical College in New York City.
Putrino, who has participated in large dataset neuroscience experiments at Harvard, MIT and NYU, says researchers realized quickly that when sets of data get large, they get stupid.
"What happened was, there were these poor lost souls who felt that with large enough sets of numbers, all the hard work was done for them," he said. "That if you were hip enough to find the 'signal from the noise,' the truth was right there in front of you." But what actually happens, Putrino said, is that in large sets of numbers strands of the data appear to affect each other, when in fact they do not. "Unless you're dealing with researchers who have the integrity to deeply question every last assumption made," Putrino said, "most of the eerie conclusions made by these large number set models turn out to be wrong."
Investors would be deeply foolish to dismiss Putrino as an arcane academic unaware of the realities of the real-world numbers. Even those deep into the business of collecting the basic statistics that fuel the Information Age are feeling the pressure of Big Data delusions.
"It's absolutely true," said Seth Harden, founder of Statistic Brain, the six-person, Los Angeles online statistics service. "We constantly see how the exact same numbers we curate and post are being used for the opposite side of the exact same argument."
Harden says that of the more than 1,000 subjects Statistic Brain collects data on -- mostly for broad topics such as levels of child abuse, top movie grosses and Madonna's career -- there's white-hot debate about how each number is collected and compared. "I get 100 emails a day about where our numbers come from," he said. "We are always looking, touching and updating that data to keep it as accurate as we can."
Trust the numbers at your peril
Now comes the numbers story investors need to start counting down for the new year: Vollmar, Putrino and Harden all agree passionately that without active, constant and very human qualification for the assumptions behind data, the models made from these numbers cannot be trusted.
"Unless there is a significant disclosure of a how data model works, followed by a deep and rich discussion of the limits of the assumptions of that model," Putrino said, "the chances of that model anticipating reality are very small indeed."
Considering that Google, Facebook, Twitter, LinkedIn, Amazon and the rest of the information economy is in the business of never, ever disclosing exactly how they model the world numerically, the chances investors will be able to test for the accuracy -- and value -- of these companies is probably zero.
This is the grim bottom line for Big Data in 2014: Trust these numbers blindly at your peril.