Dictionary Construction Procedure

For construction of the Russian protest framing dictionary we employed a technique called supervised Latent Semantic Scaling (LSS). This supervised machine-learning technique requires manual coding involving a training set for dictionary construction and a test set for dictionary validation. In the manual content analysis stage of the Russian-language protest framing dictionary construction, each sentence of the randomly-selected thirty news stories was coded by the lead project researcher for the analysis of the framing of protests in Russian state-controlled media on a five-point scale by the primary coder, and then sentence scores were aggregated into document scores by taking the average. Sentence-level coding is usually necessary in document scaling considering that human coders cannot make nuanced judgements reliably (c.f. Benoit et al. 2015).

The scale for protest framing analysis was protest as freedom to protest—the “freedom to protest” frame, versus protest as social disorder—the “disorder” frame. The “freedom to protest” frame was assigned to text that highlights citizens’ democratic right to participate in protests; that portrays protests as acts of civic activism; as acts promoting democracy; and as something that people are entitled to in a democratic state. The protests as “disorder” frame was assigned to stories that portrayed protests as leading to chaos and violence; as events potentially leading to a violent revolution; as destabilizing; as “paid for” (for instance, by the West) and therefore ostensibly not genuine (proplachennye); as acts organised by groups stigmatised by the regime in Russia and portrayed in a negative light, for instance by the gay community; as events featuring “fascists” or other “right-wing extremists.”

The documents in the test set were also coded by two native Russian-speaking research assistants (secondary coders) who possess postgraduate-level qualifications in the social sciences. The purpose of this secondary manual coding stage was to confirm that our analysis of framing is accurate and that the primary coding is replicable, while also providing performance benchmarks for machine coding. The scores assigned by the two secondary coders were averaged, and document scores were then calculated. The level of correspondence in document scores measured by Pearson’s correlation coefficient between the primary and secondary coders 1 and 2 was r=0.88 and r=0.76, respectively. The high level of agreement between primary and secondary coders confirms that our framing analysis is replicable.

Coding of the test set by the LSS dictionary was also significantly correlated with coding by the primary coder. The agreement in document scores assigned by the machine and the primary coder was r=0.75. Figure 11 contains framing scores of documents both in the training (black) and the test (red) sets. We can confirm that the documents in the test set (red) are accurate in relation to the other documents. The level of correlation between the primary coder and the machine (r=0.75) is as strong as the correlation between the primary coder and one of the secondary coders (r=0.76); this clearly demonstrates the validity of dictionary coding of our large text data for the purposes of protest framing analysis.

agreement_pro2_sub50_p10_v3

Leave a Reply

Your email address will not be published. Required fields are marked *