نمونه متن انگلیسی مقاله
Currently, the masses are interested in sharing opinions, feedbacks, suggestions on any discrete topics on websites, e-forums, and blogs. Thus, the consumers tend to rely a lot on product reviews before buying any products or availing their services. However, not all reviews available over internet are authentic. Spammers manipulate the reviews in their favor to either devalue or promote products. Thus, customers are influenced to take wrong decision due to these spurious reviews, i. e., spammy contents. In order to address this problem, a hybrid approach of improved binary particle swarm optimization and shuffled frog leaping algorithm are proposed to decrease high dimensionality of the feature set and to select optimized feature subsets. Our approach helps customers in ignoring fake reviews and enhances the classification performance by providing trustworthy reviews. Naive Bayes (NB), K Nearest Neighbor (kNN) and Support Vector Machine (SVM) classifiers were used for classification. The results indicate that the proposed hybrid method of feature selection provides an optimized feature subset and obtains higher classification accuracy.
In current times, the amount of content available to the user on the internet is rapidly increasing . While purchasing the product or availing services customers generally tend to make a decision relying solely on the information available in the review sites . However, there is a limited quality control for these available data. This limitation invites people to post spurious reviews on the websites in order to either promote or demote the products . Such individuals are known as opinion spammers. The positive spam reviews about a product may lead to financial gains and would help to increase the popularity of the product . Similarly, negative spam reviews are posted with the intention of defaming a product or services . Recently, the problem of spam or fake reviews has been on the rise, and many such cases have been released in the news. Hence, there arises a necessity of finding the authenticity of these reviews. Feature selection (FS) is a technique in which a subset of features are selected from the original dataset . It is mainly used to build more robust learning models and to reduce the processing cost. The main purpose of feature selection is to reduce the number of features to increase both the performance of the model and the accuracy of classification . FS can be examined as a search into a state space. Thus, a full search can be performed in all the search spaces traversed. However, this approach is not feasible in case of a very large number of features. Hence, a heuristic search deliberates those features, which have not yet been selected at each iteration, for evaluation. A random search creates random subsets within the search space that can be evaluated for importance of classification performance. Due to their randomized nature, meta-heuristics such as particle swarm optimization (PSO), evolutionary algorithms (EA), bat algorithm (BA), ant colony optimization (ACO) and genetic algorithm [8,9] are widely used for feature selection. When the feature space is high dimensional, selecting the optimal feature subset using traditional optimization methods have not proven to be effective. Therefore, meta-heuristic algorithms are used extensively for the appropriate selection of features. Two types of feature selection methods, namely the filter method and wrapper method can be incorporated for selecting subset of features. The filter model analyzes the intrinsic properties of data without involving the use of any learning algorithms  and can perform both subset selection and ranking. Though ranking involves identifying the importance of all the features, this method is more specifically used as a pre-process method since it selects redundant features. The wrapper model unlike other filter approaches considers the relationship between features . This method initially uses an optimizing algorithm to generate various subsets of features and then uses a classification algorithm to analyze the subsets generated.
Feature selection is critical to the performance improvement for a classification. Hence, it is important to discard the irrelevant and, noisy features from a given dataset that would decrease the classification accuracy. A number of methodologies have been adopted to select the best feature subset., In this investigation, an hybrid approach was applied for selecting the optimized feature subset. This hybrid methodology efficiently reduces the feature subset size due to randomization, which in turn improves the accuracy of the classifier. Moreover, the results when compared against the existing feature selection techniques, indicate that the proposed feature selection technique offers classification accuracy and is efficient. Furthermore, the method classifies the reviews into spam and ham reviews efficiently. Thus, it can be concluded that the Naive Bayes classifier shows better classification performance than the kNN and SVM classifiers.