A new research paper by Rodion Skovoroda and Tomila Lankina entitled Fabricating Votes for Putin: New Tests of Fraud and Electoral Manipulations from Russia has been published in Post-Soviet Affairs.
Tomila Lankina has been recently an invited speaker at the Transatlantic Academy in Washington, DC where she gave a talk on popular mobilization in Russia and on Russia’s media manipulation in the Russia-Ukraine conflict.
For construction of the Russian protest framing dictionary we employed a technique called supervised Latent Semantic Scaling (LSS). This supervised machine-learning technique requires manual coding involving a training set for dictionary construction and a test set for dictionary validation. In the manual content analysis stage of the Russian-language protest framing dictionary construction, each sentence of the randomly-selected thirty news stories was coded by the lead project researcher for the analysis of the framing of protests in Russian state-controlled media on a five-point scale by the primary coder, and then sentence scores were aggregated into document scores by taking the average. Sentence-level coding is usually necessary in document scaling considering that human coders cannot make nuanced judgements reliably (c.f. Benoit et al. 2015).
Continue reading “Dictionary Construction Procedure”
The Russian protest framing dictionary was created to analyse how Russian state-controlled media cover street protests. The list of keywords in the dictionary and continuous scores attached to the words allow computer programs to locate Russian language news stories on a social disorder vs. freedom to protest scale.
The dictionary was constructed using a technique called Latent Semantic Scaling. It is based on a 27 million-word corpus of Russian newspaper articles and TV transcripts published in state-controlled media sources in 2011-2014 (NTV, Russia 1, Channel 1, Izvestia, Russian Gazette and Komsomolskaya Pravda). The dictionary is able to capture the framing of protest on a par with human coders. Nevertheless, caution needs to be exercised when applying this dictionary to analysis of news stories collected from different time periods or with different types of media content.
Use of the dictionary is very simple and is similar to other forms of dictionary-based content analysis. Document scores should be calculated ignoring words not found in the dictionary. In other words, the document scores are a sum of scores divided by the number of entry words in the documents, not by the total number of words in the documents. Please see the sample code for more detail.
This is a sample R code for dictionary-based content analysis. You have to install qunteda package before running this code.
Continue reading “Content Analysis Employing the LSS Dictionary”
Below we reproduce part of the large dataset on Russian state-controlled media coverage of protests that we constructed using our content analysis dictionary. This subset contains the results of content analysis as well as metadata of 2,519 news stories about protest in Russia between 2011 and 2013.