Working paper on latent semantic scaling

The detail of our automated content analysis technique called latent semantic scaling (LSS) is now available in Kohei Watanabe’s working paper titled Big Media Analysis: Application of Vector Space Models to Document Scaling.

Computerized analysis of media content is often challenging because diverse topics in news stories cause high data sparseness. Although supervised machine learning techniques usually requires large training set for accurately analysing diverse content, this paper proposes use of vectors space models for the purpose. Vectors space models, such as LSA, NMF, LDA or Word2vec, are used to extract semantic information from large corpora fully automatically to reliably estimate parameters for words that rarely appear in small training set. This new technique is explained with two examples from actual large-scale content analysis projects: international news agencies’ coverage of the Ukraine crisis 2013-2014, and Russian news media’s coverage of street protests 2011-2014. These examples show advantages of the new technique in document scaling over ‘off-the-self’ dictionaries and Bayesian supervised-machine learning techniques.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s