The project «Wordnews» focuses on systematics for the interpretation and visual display of textual information. As a first example the current news headlines of several leading international news sources are being analysed and displayed. The output tries to visualise meaning according to calculations on the quantitative occurrence of words.

Preliminary thoughts

For some time now I have been concerned with systems that are able to deal with large amounts of textual information. Those systems analyse the meaning of a given information and represent it visually. Interesting research and implementations exist in particular in librarianship. Semantic maps play a major role. The most diverse contents can be connected in clusters on the basis of their semantic references with the help of these mappings. In the case of automated semantic analyses they make very high demands on programming, thesauruses and linguistic databases.

For my project «Wordnews» I chose a variety of textual analysis in which I assign certain advanced qualities to the words according to their quantitative occurrence.


Fundamentals of the project

Yahoo provides a service at http://news.yahoo.com, which gives in accumulated form the news of important newspapers and agencies. The «Top Stories» contain the ten most current news stories of both AP and Reuters as well as the news of the Los Angeles Times and the Washington Post. These top stories are offered in the so-called RSS format. RSS is the abbreviation for «Rich Site Summary» or «Really Simple Syndication». In contrast to regular web sites these RSS-files can be read by other programs and therefore be processed further. Thus it is possible to include these contents in your own interfaces.

See:  http://www.typedown.com/external-01/news/yahoo-worldnews.php


Text analysis

The current news stories which I receive from Yahoo (or any other adequate source) in this way are evaluated  in a specially developed software. Words that are not relevant for the meaning, such as about, above, always, but, even, the, etc. (about 300 so-called stopwords) are filtered out in order not to falsify the analysis. The occurrences of the words are analysed: Words which appear several times are evaluated by their percentage - in comparison to the total number of words. If a word occurs more often it will get a higher status than a word that only occurs once. This happens when several News-Sources report on the same subject. Accordingly, these subjects are more relevant as one that is mentioned by only one source.


The result is presented as connected by content. Words that occur more frequently are shown larger in comparison to the others according to their quantitative occurrence. Thus a kind of «word-carpet» is generated where some words are more accentuated than others. Frequently mentioned subjects are visually singled out whereas subjects of less general interest are much smaller in size. An immediate shift from the visual view to the original view of the actual news lines is always possible.

Further functions

I have implemented other functions such as searching the entire pool of all available news sources by using the search option of Yahoo News. When clicking on a certain term in the  «top-stories-analysis» a search for this subject is started, i.e. all news which contain this word are shown. Due to the fact that so many sources are contained in Yahoo News (approx. 7.000 in 35 different languages) a comprehensive survey of the searched term is provided. If the results of a search are presented in the analysis the searched subject is shown proportionally large since it must be part of every news story. Because of the graded representation of the other words a new «word-carpet» is created - now regarding a more clearly defined topic. In order to reach even more flexibility I have also added a form in which one can search for any given word or subject.

Project related links

Current top stories:


Analysis of top stories: http://www.typedown.com/external-01/news/yahoo-wordnews.php

Yahoo News: http://news.yahoo.com

Yahoo News RSS-Feeds: http://news.yahoo.com/rss



Image material (Screenshots)




Image material (Print view)

Image material (Prints)



Patterns in Unstructured Data, Discovery, Aggregation, and Visualization.

A Presentation to the Andrew W. Mellon Foundation by Clara Yu, John Cuadrado, Maciej Ceglowski, J. Scott Payne, National Institute for Technology and Liberal Education (NITLE). http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm


Foltz, P. W.  «Using Latent Semantic Indexing for Information Filtering». In R. B. Allen (Ed.) Proceedings of the Conference on Office Information Systems, Cambridge, MA, 40-47.  http://www-psych.nmsu.edu/~pfoltz/cois/filtering-cois.html


Dominic Widdows, Scott Cederberg and Beate Dorow.  Visualisation Techniques for Analysing Meaning.  Fifth International Conference on Text, Speech and Dialogue,  Brno, Czech Republic, September 2002.



The Visual Display of Quantitative Information, Edward R. Tufte,



Envisioning Information, Edward R. Tufte,



Peter Cho,  Cybercartography: Mapping in media art



Current Practices in Perceptual Mapping, 1997 Sawtooth Software Conference

Thomas A. Wittenschläger, John A.Fiedler.



Inhaltliche Strukturierung von Ressourcen - Eine Einführung in XML,
von Margarete Payer und Alois Payer.  http://www.payer.de/xml/xml01.htm


Wikipedia – Perceptual Mapping.  http://en.wikipedia.org/wiki/Perceptual_mapping


Getting the News Out - RSS and the Semantic Web:






