How Traders Are Using Text and Data Mining to Beat the Market

NEW YORK (TheStreet) -- Sometimes it can seem like the tsunami of digital blurbs, tweets, "likes" and quantitative data that surround us in the digital age is supplanting the need for traditional journalism and publishing, making any medium that presents information in a form longer than 140 characters seem obsolete.

But another trend contradicts the dire outlook for old-fashioned text: a market for text mining of actual reporting as published in newspapers and magazines. In other words, from "traditional journalism." And while many members of the popular press whose materials are currently being used by traders do not know it, Wall Street and select media companies are seizing the opportunity.

As far back as 2008, before "big data" was big news, a report by the Boston-based Aite Group found that the percentage of financial players mining unstructured data, including content from companies like Dow Jones and Thomson Reuters, rose to 35% from 2%, and spending was projected to almost double over the next two years. Writes Adam Honorè, author of the report, "Firms will be looking for any competitive advantage they can find, and unstructured data offers an untapped reservoir of new ideas waiting to be discovered."

What Is Text Mining?

Text mining is the data analysis of natural language works (articles, books, etc.), using text as a form of data. It is often joined with data mining, the numeric analysis of data works (like filings and reports), and referred to as "text and data mining" or, simply, "TDM."

TDM involves using advanced software that allows computers to read and digest digital information far more quickly than a human being can. TDM software breaks down digital information into raw data and text, analyzes it, and comes up with new connections, from unexpected patterns in protein interactions that eventually lead to the development of a new drug, to subtle shifts in weather patterns that might predict a downturn in the price of wheat.

The latter example is of interest to Wall Street, specifically hedge fund managers and algorithmic traders, who are buying licenses from traditional sources like the Associated Press to gain access to breaking news. Traders then use TDM software to mine those feeds to predict movements of markets for everything from government bonds to commodities.

How the AP Helps Investors Make Money

"The advent of TDM promises new revenue streams for traditional publishing outlets and new sources of insight and efficiency for our customers," says Bruce Glover, deputy director of digital for the Associated Press. According to Glover, the AP licenses "machine-readable news products" (MRN) to financial clients, allowing information to move quicker. Given the importance of algorithmic trading on Wall Street, speed is paramount, and machines can process information found in news articles far more quickly and with better recall than humans.

Moreover, according to Glover, there is a new trading strategy that he's heard referred to as hyper-contextual trading (HCT) which recognizes the benefits of assimilating all reliable information to support decisions. Glover says, "It's all very encouraging and the marketplace is indicating that the AP has a valuable and growing asset."

The AP has licensed its content to trading firms that use software to mine not just hard data but more qualitative information, from stories about political upheaval to the number of hashtags mentioning a company name on Twitter. These firms hope that TDM will give them a meaningful edge over competitors.

Often, it does. On September 2, 2014, a journalist tweeted about a possible credit card breach at Home Depot (HD) . That tweet, one of about 500 million on any given day, was targeted as a "notable signal" by New York-based TDM company Dataminr. The Wall Street Journal reported that Dataminr subscribers, which at the time included 60 banks and hedge funds, got the news a full 15 minutes ahead of anyone else and, crucially, before Home Depot shares dipped by 2%.

TDM: The Possibilities

Think of what could be gained through immediate and automated access to not only imprecise tweets and Facebook postings, but to the comprehensive resources of a world-wide news organization like the AP or to the full archives of a leading scientific journal. Access like this would not just boost investor portfolios based on developing news stories, but could also provide life-saving information for use in a clinical trial for a patient with a rare disease or in helping to slow the pace of global warming. The information, in turn, would go back into portfolios as pharmaceutical index funds are purchased and carbon-intensive industries are shorted.

Big data and data mining are big news. In 2012, Nate Silver's uncannily accurate predictions of the U.S. national elections using data mining techniques quickly made traditional exit polls seem quaintly old fashioned. Yet few people realize that adding text to the equation, whether in the form of financial reporting or scientific and academic journals, can add a whole new level of clarity to an often murky and overwhelming information ocean.

Publishers and users of text-based materials alike are seeing new opportunities for earnings and for knowledge in new uses of those materials made possible by big data techniques, and an increasing number are jumping on board.

"There is growing recognition that text content is being consumed in a variety of ways and publishers need to be flexible in terms of our licensing," says Glover. "And editors, I think, recognize that there is no reason not to capitalize on the intelligence in their reporting."

Wall Street isn't the only market for licensing content for TDM. According to Glover, "The AP has also recently done a deal in the cognitive computing space, with media monitoring companies and with PR [public relations] firms. The latter use TDM to measure the sentiment around a particular topic," says Glover.

Sentiment is not something you necessarily find in raw, quantitative data alone. So, however sexy the idea of big data is at the moment, it's necessary and natural to marry that data to text.

This is where text publishers come in. The good news is that there is plenty of room for growth in the financial market and others. 

"This is just the beginning," says Glover.

This article is commentary by an independent contributor. At the time of publication, the author held no positions in the stocks mentioned.

More from Stocks

Why This Rally Made Sense: Cramer's 'Mad Money' Recap (Thursday 12/12/18)

Why This Rally Made Sense: Cramer's 'Mad Money' Recap (Thursday 12/12/18)

Dave & Buster's Weakness in Same-Store Sales Hits Shareholders

Dave & Buster's Weakness in Same-Store Sales Hits Shareholders

Market Movers: December FOMC Meeting

Market Movers: December FOMC Meeting

Government Shutdown Won't Hurt Stocks Much, History Says

Government Shutdown Won't Hurt Stocks Much, History Says

UK Prime Minister Theresa May Survives Leadership Challenge; Pound Rallies

UK Prime Minister Theresa May Survives Leadership Challenge; Pound Rallies