The job market for mathematicians and statisticians has become hot as the sheer volume of data generated by ever faster, cheaper computing resources explodes.
Data storage has become so inexpensive that a 2011 McKinsey and Co. report estimated that a disk drive capable of storing all the world's music would cost about $600. Walmart stores 10 times more data on customer transactions and other parts of its operation than is contained in the entire Library of Congress, according to the same report.
Analyzing the so-called "big data" deluge has become a key task for businesses in an effort to divine everything from which ads online customers will click to how much inventory they need to maintain. Political candidates analyze data to predict voting patterns. Dating websites try to predict ideal mates.
Kaggle competitions focus on creating and testing formulas that can be used to make predictions based on the contents of giant datasets.The more accurate the formula, the better the chances it will accurately provide answers to complex questions, such as the orange used car being the least likely to break down. Goldbloom argues that no matter how many data scientists companies hire, relying on in-house data talent means companies can't know if they're getting the best solution. In a Kaggle contest, competitors find out as soon as they submit their solutions how they stack up against fellow contestants. They can keep trying for the duration of the typically three-month contests, which are highlighted on the company web site. As the first entries come in, the accuracy of competing models improves by leaps, Goldbloom said. As the contests progress, the improvement curve flattens out. Goldbloom and Howard believe that shows the competitive approach pushes data scientists toward the best solutions within human reach. "Crowdsourcing allows you to squeeze data dry," Goldbloom said.