At the Plurilinguism 4.0 event on Friday, I ran into the conference moderator Christophe Büchi - who worked for many years as correspondent for western Switzerland at the newspaper NZZ, and wrote a book on the subject of Röstigraben, a meme for cultural-geographic divisions in the country.
Later on I worked on a hackathon project with journalist Celine Zund and politician Karlen Kathrin to understand the effects of Röstigraben. Our project uses the open web and machine learning tools to explore how geography impacts the news. You can read more about it in the project page:
The Python code extracts entities using cloud APIs, and calculates a single number we called the Rösti score. A positive score means that the article talks mostly about the other side of the country, a negative score suggests the article speaks about it’s own. This is based on detecting the language of the article, extracting locations of the main topics, and making a whole bunch of assumptions Again, please visit the project page for details on how this works.
After the event, I ran our algorithm on front page articles from a dozen or so top newspapers (there’s a list here) - where it was possible to get results, collecting the Rösti scores in the table below. The number of entities detected in the document, how many of those were geolocated, and the corresponding match percentage, is also shown.
Let’s take a typical political news piece which our algorithm works well with, one of the main topics of national politics being discussed this weekend:
The TextRazor API manages to correctly detect French as the language, and extract 7 unique entities - actually there are many more, but we filter them out based on a threshold of minimum relevance. We find the location of all of them except the political party (whose location is irrelevant in this case). The entities we are interested in, and their scores look like this:
Toni Brunner Q115614 47.299501 9.0856081 0.4266
Christoph Blocher Q123857 47.69653 8.63386 0.5627
Ueli Maurer Q123979 47.320833333333 8.7930555555556 0.7158
Suisse Q39 46.798562 8.231973 0.4221
Guy Parmelin Q121160 46.45 6.2833333333333 0.3612
Canton de Vaud Q12771 46.616666666667 6.55 0.3365
Since the middle point of Switzerland/Suisse is our baseline, it is ignored in the calculation. The birthplaces of politicians and centrepoint of the Canton of Vaud are all taken into account for the final score (59): the French-speaking article is clearly concerned with politics on the German side of the country. Correspondingly, the German-speaking NZZ article on the same topic receives a strong negative score (-270):
This process does not work well with articles from regional newspapers - e.g. Künstler macht Schaufensterpuppe zur Allegorie der Armut - which do not mention places names, or well known organizations and people which can be matched to Wikidata. Additional data sources, such as business indexes and telephone directories would need to be pulled in to overcome this problem.
Articles which focus on geography and have lots of place names tend to dominate the scores, as well as - surprise surprise - celebrity and sports articles, because Wikidata is very good with player names and their birthplaces.
Thanks to my team-mates and to Mr. Büchi and Forum Helveticum for the enlightening conversations and conference, which led me to think quite deeply about the potential of technology in supporting journalists and politicians - and through researching the topic, learning new things about my adoptive country.
We had lots of ideas about applications for such a tool, which of course would need lots more work to be made into a reasonably useful product. In the meantime, you’re welcome to grab the open source Python code, see if you come up with any alternative approaches, turn it into an API or data visualization. Just don’t forget to cross over to the R-östigraben and see what’s what
Image via Forum Helveticum