This site uses cookies, so that our service can work better. I understand

Project 4: Training a human-like movie evaluation system based on the semantic features of the storylines

Authors: Julia Berezutskaya / Marcel van Gerven (Computational Cognitive Neuroscience Lab, Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen)



One of the interesting questions in behavioral neuroscience is how humans evaluate perceived complex information. For example, we would like to understand what makes a movie to be rated high or low on average. Specifically, it is interesting to see whether the information about the movie, such as the movie description, its cast, genre and other attributes, is predictive of the average rating the movie receives.

In our project, we want to develop a model that predicts IMDb ratings of the movies based on their descriptions. To our knowledge, the existing algorithms attempt to predict IMDb ratings based on meta-information about the movies: directors, actors, genre, year, movie length and so on (Hsu et al., 2014; San, 2016). We believe that one of the key indicators of the movie reception is the movie storyline along with its character descriptions. In the present project, we aim at building a model that predicts IMDb ratings using movie storyline information.

We propose to start with a baseline classifier, such as a random forest classifier, which takes in the key content words from the movie storyline description and predicts the IMDb rating of the movie. The performance of this model should tell us whether the simple ‘bag of words’ representation of the movie storyline can be predictive of the movie rating.

The second approach will entail training a discriminative deep neural network, which will take in the full storyline descriptions preserving the temporal relationships between the words, and predict the IMDb rating of the movie. The difference in model performance compared to the decision tree classifier should tell us whether the temporal dependences in the storyline description provide additional information about how the movie is going to be rated. It will also be interesting to see whether we will be able to retrieve more high-level semantic features from the text input, such as elements of the plot.

In both cases, we can make use of pretrained word embeddings to represent text information, such as word2vec (Mikolov et al., 2013) or GloVe (Pennington et al., 2014). In addition, IMDb movie tags, capturing various meta-information about the movies (e.g. genre, year of production and so on) can be included as additional predictors in both models.

We believe that in case of success this project can provide information about the trends in human behavior when it comes to evaluation of the movie input. It will be interesting to see whether we will be able to uncover the semantic features that influence the movie reception by public.


A list of 1-5 key papers/materials summarising the subject:

[1] Hsu, P.-Y., Shen, Y.-H., and Xie, X.-A. (2014). Predicting Movies User Ratings with Imdb Attributes. In International Conference on Rough Sets and Knowledge Technology, (Springer), pp. 444–453.

[2] Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. ArXiv Prepr. ArXiv13013781.

[3] Pennington, J., Socher, R., and Manning, C.D. (2014). Glove: Global vectors for word representation. In EMNLP, pp. 1532–1543.

[4] San, C. (2016). Predict Movie Ratings:


A list of requirements for taking part in the project:


A maximal number of participants:



Is there a plan for extending this work to a paper in case the results are promising?