Abstract
I. Introduction
II. Related Work
III. The Proposed Method
IV. Experiments and Analysis
V. Conclusion
Authors
Figures
References
Abstract
In this era of Internet and big data, there is billions of news generated every day, and the traditional manual methods are insufficient for public opinion orientation analysis. Especially for Chinese, which has more complicated syntax and semantic structure, and there is no space between words as separator. This greatly increases the difficulty of analyzing opinion orientation. In this paper, a novel approach is proposed aiming at solving the problem of public opinion orientation analysis based on Chinese news. The approach combines word2vec, sentiment dictionaries and syntax rules, where the word2vec can map words into different vectors with finite dimensions. Through it we can calculate the cosine similarity between the words and sentiment dictionaries to get the orientation value of target words, which is helpful for calculating the orientation value of key sentences and full text. Specifically, the process consists of three steps. First, word2vec is used to train word embedding, and every word in corpus is mapped into a given vector space. Then, key sentences are extracted from news content. Finally, pre-defined syntax rules with word vector similarity are used to analyze document orientation based on key sentences. Several experiments are conducted on both closed and open datasets, and the results validate the effectiveness of the proposed approach.
Introduction
Online media such as microblog, portals, forums, mainstream news organizations, etc., are trying to publish various news at first time and the number of news is growing dramatically every day. A piece of news often conveys the author’s opinions, and contains event’s orientations during the process of its generating and disseminating. These positive or negative news orientations may affect the tendency of public opinions and the views of people. Orientation analysis is of high value in the area of monitoring Internet public opinion. The application background of this paper is monitoring the enterprise operation in Shanghai pilot free-trade zone (FTZ) on the Internet. Before it, we have done some relevant studies on the relationship extraction [1] and hot events clustering [2] for FTZ. Although these studies have achieved some good results, but there are still some deficiencies in public orientation opinion analysis. For regulators, it can help them to know the feedbacks of public events and supervise the development trend of enterprises under their jurisdiction. The new era of big data brings a great challenge to the original public opinion analysis. Currently, mainstream search engines only support to search news by keywords, and the search results do not have any further information such as classifications, orientations or opinions. Therefore, how to extract more valuable and correct information from large number of news by an effective way is a big challenge under big data surroundings. The essence of public opinion orientation analysis is document-level sentiment analysis. As we all know, Chinese and English are the two most widely used languages in the world.