(NASDAQ:SGI), the trusted leader in technical computing has partnered with Kalev H. Leetaru of the University of Illinois to create the first-ever historical mapping and exploration of the full text contents of the English-language edition of Wikipedia, in time and space. The results include visualizations of modern history captured in under a day utilizing in-memory data-mining techniques. Loading the entire English language edition of Wikipedia into
SGI® UV™ 2000
, Mr. Leetaru was able to show how Wikipedia’s view of the world unfolded over the past two centuries. Location, year and the positive or negative sentiment have been tied to those references.
While several previous projects have mapped Wikipedia entries with manually assigned location metadata by an editor, these previous attempts only accounted for a tiny fraction of Wikipedia’s location information. This project unlocked the contents of the articles themselves, identifying every location and date in all four million pages and the connections among them to create a massive network.
“Seeing” Wikipedia in a brand new way
allows the world to take a step back from the individual articles and text to gain a forest view of the tremendous knowledge captured in Wikipedia, not just a page by page tree view. We can watch how one of the largest collections of human knowledge has evolved and see what we could never see before, such as global sentiment at a certain time and place, or where there might be blind spots in the knowledge coverage, ” said Franz Aman, chief marketing officer and head of strategy, SGI. “We love to use Google Earth because we can zoom out and get the big picture view. With SGI UV 2, we can apply the same concept to Big Data to get the big picture on our Big Data.”
From this analysis, Wikipedia is seen to have four periods of growth in its historical coverage: 1001-1500 (Middle Ages), 1501-1729 (Early Modern Period), 1730-2003 (Age of Enlightenment), 2004-2011 (Wikipedia Era) and its continued growth appears to be focused on enhancing its coverage of historical events, rather than increased documenting of the present. The average tone of Wikipedia’s coverage of each year closely matches major global events, with the most negative period in the last 1,000 years being the American Civil War, followed by World War II. The analysis also shows that the “copyright gap” that blanks out most of the twentieth century in digitized print collections is not a problem with Wikipedia where there is steady exponential growth in its coverage from 1924 to today.