Enabling researchers to data-mine Big Data at the speed of Big Data“The one-way nature of connections in Wikipedia, the lack of links, and the uneven distribution of Infoboxes, all point to the limitations of metadata-based data mining of collections like Wikipedia,” said Mr. Leetaru. “With SGI UV 2, the large shared memory available allowed me to ask questions of the entire dataset in near-real time. With a huge amount of cache-coherent shared memory at my fingertips, I could simply write a few lines of code and run it across the entire dataset, asking whatever questions came to mind. This isn’t possible with a scale-out computing approach. It’s very similar to using a word processor instead of using a typewriter – I can conduct my research in a completely different way, focusing on the outcomes, not the algorithms.” The analytical approach Loaded into SGI® UV™ 2000, the Big Brain computer, this massive dataset underwent full text geocoding and complete date-coding, using algorithms that identified every mention of every location and every date across the text of every entry on Wikipedia. More than 80 million locations and 42 million dates between 1000 AD and 2012 were extracted, averaging 19 locations and 11 dates per article (every 44 words and every 75 words, respectively). The connections between every date and every location were captured into a massive network representing Wikipedia’s view of history. With this instrumentation, Mr. Leetaru was able to perform near-real time analysis over the entire dataset on the SGI UV 2 to create visual maps throughout space and time to see not only how history unfolded but also the overall tone of the world throughout the last thousand years, and interactively testing a wide array of theories and research questions, all in less than a day’s work. The New SGI UV: The Big Brain computer SGI UV 2 product family enables users to find answers to the world’s most difficult problems on a system as easy to administer as a workstation. Built with Intel® Xeon® processor E5 family, running standard Linux, and supporting a wide range of storage options, SGI UV 2 offers a complete, industry-standard solution for no-limit computing. With as little as 16 cores and 32 gigabytes of memory, SGI UV 2 can start small and seamlessly expand. This next generation platform doubles the number of cores (up to 4096 cores) and quadruples the amount of coherent main memory (up to 64 terabytes) from the previous generation, available for in-memory computing in a single-image system. SGI UV 2 can scale to eight petabytes of shared memory and at a peak I/O rate of four terabytes per second (14 PB/hour) it could ingest the entire contents of the U.S. Library of Congress print collection in less than three seconds.
SGI UV 2000 is available immediately. SGI UV 20 can be ordered today and will start shipping in August 2012. Pricing starts at $30,000 USD.About SGI SGI, the trusted leader in technical computing, is focused on helping customers solve their most demanding business and technology challenges. Visit sgi.com for more information. Connect with SGI on Twitter (@sgi_corp), Facebook (facebook.com/sgiglobal), YouTube (youtube.com/sgicorp), and LinkedIn. For photos and videos go to: http://www.sgi.com/go/wikipedia © 2012 Silicon Graphics International Corporation. SGI and the SGI logo are trademarks or registered trademarks of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. Intel and Xeon are registered trademarks of Intel Corporation. All other trade names and marks are the property of their respective owners. Images provided courtesy of Kalev Leetaru Photos/Multimedia Gallery Available: http://www.businesswire.com/cgi-bin/mmg.cgi?eid=50313303&lang=en