Project Login
Registration No:

Mining Online Reviews for Predicting Sales Performance in the Movie Domain

Platform : DOT NET

IEEE Projects Years : 2012 - 13

Mining Online Reviews for Predicting

Sales Performance in the Movie Domain



  • Posting reviews online has become an increasingly popular way for people to express opinions and sentiments toward the products bought or services received.
  • Analyzing the large volume of online reviews available would produce useful actionable knowledge that could be of economic values to vendors and other interested parties.
  • In this paper, we conduct a case study in the movie domain, and deal with the problem of mining reviews for predicting product sales performance.
  • Our analysis shows that both the sentiments expressed in the reviews and the quality of the reviews have a significant impact on the future sales performance of products in question.
  • For the sentiment factor, we propose Sentiment PLSA (S-PLSA), in which a review is considered as a document generated by a number of hidden sentiment factors, in order to capture the complex nature of sentiments.


Existing System:


  • Consider the past sale performance of the same product, or in the movie domain, past box office performance of the same movie.
  • We capture this effect through the use of an Autoregressive (AR) model, which has been widely used in many time series analysis problems, especially in econometric contexts.
  • Accuracy and Efficiency will be less.
  • We don’t analyze the feeling of people about a movie.
  • We can’t analyze the future of one product.
  • Vendors considering only the volume of movie sales.


Proposed System:


  • Propose a novel approach to sentiment mining based on Probabilistic Latent Semantic Analysis (PLSA), which we call Sentiment PLSA (S-PLSA).
  • Different from the traditional PLSA [6], S-PLSA focuses on sentiments rather than topics.
  • Instead of considering all the words (modulo stop words) present in the blogs, we focus primarily on the words that are sentiment related.
  • We propose the S-PLSA model, which through the use of appraisal groups provides a probabilistic framework to analyze sentiments in reviews.
  • The sentiment-aware model for predicting future product sales.





  • Domain-Driven Data Mining (D3M)
  • Review Mining
  • Sentiment PLSA
  • Ranking and Recommender Systems



1. Domain-Driven Data Mining (D3M)


In the past few years, domain-driven data mining has emerged as an important new paradigm for knowledge discovery . Motivated by the significant gap between  the academic goals of many current KDD methods and the real-life business goals, D3 advocates the shift from datacentered hidden pattern mining to domain-driven Actionable Knowledge Discovery (AKD). The work presented in this paper can be considered as an effort along this direction in that

1) we aim to deliver actionable knowledge by making predictions of sales performance, and

2) In developing the prediction model, we try to integrate multiple types of intelligence, including human intelligence, domain intelligence, and network intelligence (Web intelligence).


2. Review Mining :


With the rapid growth of online reviews, review mining has attracted a great deal of attention. Early work in this area was primarily focused on determining the semantic orientation of reviews. Among them, some of the studies attempt to learn a positive/negative classifier at the document level.

There are also studies that work at a finer level and use words as the classification subject. They classify words into two groups, “good” and “bad,” and then use certain functions to estimate the overall “goodness” or “badness” score for the documents


3. Sentiment PLSA:

            Sentimental Analyzer is used to analyze the sentiment terms of people about one product.Based on Sentiments such as Type of Movie (Horror,Family Movie,Heart Touching ….) we are going to mine reviews of movie to improve sales performance.

            In most of the studies cited above, the sentiments are captured by explicit rating indication such as the number of stars; few studies have attempted to exploit text mining strategies for sentiment classification. To fill in this gap, Ghose and Ipeirotis argue that review texts contain richer information that cannot be easily captured using simple numerical ratings.

Our work is similar to in the sense that we also exploit the textual information to capture the underlying sentiments in the reviews.


4. Ranking and Recommender Systems


. Recommender systems have emerged as an important solution to the information overload problem where people find it more and more difficult to identify the useful information effectively

Software Requirements:

Operating System                : Windows95/98/2000/XP

 Front End                            :

 Database Connectivity        :  Mysql2005


Hardware Requirements:

Processor                          :   AMD Athol 64x2 Dual core processor 4000+/ P IV

Speed                                :    2.11 GHz

RAM                                 :    2GB

Hard Disk                         :    160 GB

Key Board                        :    Standard Windows Keyboard





The wide spread use of online reviews as a way of conveying views and comments has provided a unique opportunity to understand the general public’s sentiments and derive business intelligence. In this paper, we have explored the predictive power of reviews using the movie domain as a case study, and studied the problem of predicting sales performance using sentiment  information mined from reviews. Wehave approached this problem as a domain-driven task, and managed to synthesize human intelligence (e.g., identifying important characteristics of movie reviews), domain intelligence (e.g., the knowledge of the “seasonality” of box office revenues), and network intelligence (e.g., online  reviews posted by moviegoers). The outcome of the proposed models leads to actionable knowledge that be can

readily employed by decision makers. A center piece of our work is the proposal of S-PLSA, a generative model for sentiment analysis that helps us move from simple “negative or positive” classification toward a deeper comprehension of the sentiments in blogs. Using SPLSA as a means of “summarizing” sentiment information from reviews, we have developed ARSA, a model for predicting sales performance based on the sentiment information and the product’s past sales performance. We have further considered the role of review quality in sales performance prediction, and proposed a model to predict the quality rating of a review when it is not readily available. The quality factor is then incorporated into a new model called ARSQA. The accuracy and effectiveness of the proposed models have been confirmed by the experiments on two movie data sets. Equipped with the proposed models, companies will be able to better harness the predictive power of reviews and conduct businesses in a more effective way. It is worth noting that although we have only used SPLSA for the purpose of prediction in this work, it is indeed a model general enough to be applied to other scenarios.

For future work, we would like to explore its role in clustering and classification of reviews based on their sentiments. It would also be interesting to explore the use of S-PLSA as a tool to help track and monitor the changes and trends in sentiments expressed online. Also note that the ARSA and ARSQA models are general frameworks for sales performance prediction, and would certainly benefit from the development of more sophisticated models for sentiment analysis and quality prediction.






CALL: 08985129129 ,  E-Mail Id: