Data Science: Do people have trouble buying into superheroine movies?

Wonder Woman was a huge success both on screen and in the box office, though it didn’t intentionally release on the International Women’s Day. In contrast, Captain Marvel led many controversies over its feminism marketing campaign. Do people have trouble buying into superheroine movies? Will this new trend of women-led superheroes put an end to the Superman?

To answer this, I am going to analyze the data using python on gender ratio in two thousand films over the last two decades.

The implementation can be broken down into three parts:

  1. Scrape a list of top 100 films each year in the last 20 years.
  2. Use gender analysis on celebrity names, and count the gender using python.
  3. Get the result, visualize the change of female roles in the film industry including heroic movies.

First, scrape the movie information from box office mojo using Octoparse. The URLs of the yearly box office in Mojo follows a constant pattern with a fixed host name and a year tag at the end. For example, the URLs of the box office is

https://www.boxofficemojo.com/yearly/chart/?yr=2019&p=.htm in 2019 https://www.boxofficemojo.com/yearly/chart/?yr=2018&p=.htm in 2018.

That said, if we follow this pattern we should be able to get a list of URLs from 2000 to 2019 like this:

Load this list with Octoparse. It will automatically create a loop extraction list and guide you to create another extraction list of movies in a year. Click to extract data including Title, Actors, Distributors, Domestic_Total_Gross and Foreign_Gross. About 20 minutes later we get all the details of 2000 films.

What does data say about movies in the past 20 years?

The idea here is to take a name gender library and return to the actors’ name and count the frequency of female and male among the list.(complete version:https://gist.github.com/octoparse/3abc6771a87e49e34c9fa18f2ed7d91e#file-gender_analysis_on_movies-py)

First, load the movie list from Mojo and Marvel in Python.

Second, pre-process the text (please check the details for processing text.) in order to get a list of tokenized names.

Then we will analyze all movies’ actors gender by year. To do this, import the gender library which analyzes the first name and returns the gender.

After that, we will be able to get the numbers of female and male to visualize the data as below:

Solid lines indicate the actual numbers. Dotted lines reflect the developing trend.

Movies in the last 20 years. Solid lines indicate the actual numbers. Dotted lines reflect the developing trend.

Both numbers of males and females change in the same direction. Both lines move upward before 2010, reaching their peaks in 2011, and moved downward since then. The number of actors is decreasing in general. It might indicate a downfall in the film industry. The gap between the two is showing a tendency of closing in general, yet the space between 2011 and 2015 has widened. Which said, There is disposition towards equality in numbers of female and male actors even though gender disparity is entrenched in the film industry.

Let’s take a closer look at the Marvel films:

Marvel films. Solid lines indicate the actual numbers. Dotted lines reflect the developing trend.

In contrast, both lines move upward since 2012, and there is a steep increase between 2012 and 2013. Heroic movies were getting popular during the phase of economic recovery. Female actors show an instant increase comparing with the numbers before 2012. And there is a trend of increasing female roles present in the Marvels. It may speak to the fact that the film industry’s attempt to boost the box office during the economic recovery by introducing more female actors into the superhero series. The move did work in a place like the United States where people fancy about superheroes and are proud of their identity of freedom, democracy, and power.

If we look at Divergent (2014) and Rogue One: A Star Wars Story (2016), The Hunger Games (2012), Lucy (2014), Mad Max: Fury Road (2015) and Wonder Woman (2017), it is apparent we have different superheroines on the movie screen. Women start to move the plot forward rather than being a sidekick to Superman. It does speak that female superhero is a new role of redemption.

Superhero movies are the icon of crime-fighting, social righteousness and self-sacrificing. It makes me ecstasy when Captain Marvel released. It’s not only because I am a woman, but I also like the idea of the superwomen as it shows certain progress in gender equality whether it’s a good or bad superhero. I can’t express enough appreciation for there are more strong and independent women like Furiosa in Mad Max and Captain Marvel who are the heroes of their own, and for less beautiful but weak Marry Jane who is meant to be saved by the Spider-Man.

read original article here