Visualization
Document Atlas
This is a utility for visualizing large corpora of text documents. First it identifies relevant semantics based on the documents from the input corpus–this is done using Latent Semantic Indexing. Than the whole corpus is projected onto discovered semantics and positioned on a 2D plane using multidimensional scaling. The user can explore the 2D plane using an intuitive interface. The density of documents is used for generating the background relief in order to make the visualization of documents similar to a map. Keywords describing specific areas are also written on the map. All these features together provide the user with an easier path towards understanding the corpus.
Upper screenshot shows a visualization of Reuters articles from 1997 containing keyword “Slovenia”. One can see from the visualization that Slovenia appeared in the articles related to sports (upper left, showing also an extended list of keywords for this area), economy (upper right), international politics with focus on European Union and NATO (bottom left) and internal politics (bottom right). For more details on how to read the visualization, check the publications.
Please check Document Atlas homepage for more details!
Publications
B. Fortuna, D. Mladenic, M. Grobelnik. Visualization of Temporal Semantic Spaces. Semantic Knowledge Management, edited by J. Davies, M. Grobelnik, D. Mladenic, Springer. [link]
B. Fortuna, D. Mladenic, M. Grobelnik. Visualization of text document corpus. Informatica 29 (2006), 497-502 [pdf]