Did Google Really Crack The Code on Predicting Box Office Revenues?

Following up on my last post, I was hoping to deliver on the idea of using multivariate linear regression to learn more about how features like genre, MPAA rating, budget, etc. influence box office revenues. While I started by running the model with my own data, I found that diving into the history of box office predictions to be even more intriguing and also more promising for how I might be able to generate a useful model on my own.

I started off by combining my box office revenue data with my recently developed regression model. After configuring continuous and categorical features, I noticed some inherent challenges.

The top two largest coefficients were Animation and Horror, which added $94 million and $59 million to our target variable (gross revenue) respectively. The flaw was obvious — my model believes that a ‘Horror Animation’ movie will implicitly add both $94 million and $59 million to gross revenue, knowing absolutely nothing else about a theoretical movie of this nature. Not only that, but the more genres and features we add, the more revenue it thinks the movie will generate. For example, ‘Musical’ will add $56 million, ‘Romance’ will add $18 million, and an ‘R’ rating will add $26 million. With all of these combined, the model believes that our R-rated animated romantic musical horror movie will generate $253 million at the box office with a budget of $0.

Clearly, my model might benefit from incorporating strategies from other published works. After browsing some interesting strategies that others have used in the past, I stumbled upon an article that was particularly intriguing.

The Hollywood Reporter published “Google Unveils Model to Predict Box Office Success” on June 6, 2013.

“Four weeks out, Google looks at search volume for a film’s trailer, factors in other information, like franchise status and seasonality, and can predict opening weekend box office revenue with 94 percent accuracy.” This was simultaneously exciting and deflating. On one hand, the holy grail of predicting box office revenue exists, which in theory could be highly influential on what movies are ultimately put into production. On the other, it is most likely a closely guarded secret at Google, and any further investigation and analysis of my own would likely be in vain.

Surprisingly, Google’s white paper on the subject told a much simpler story. Developed in 2013, Google’s model uses simple linear regression, not unlike the models I and others had devised. Even more surprising was Google’s finding that many of the features that we had previously considered intuitive were actually insignificant.

Google’s most predictive model used:

  • All Title Search Volume (on Google)
  • Trailer-Related Title Search Volume (on Google)
  • YouTube Search Volume
  • Franchise status
  • Seasonality

Also, the fine print made it clear that the 94% R-squared refers to their model predicting gross revenue four weeks ahead of release. While four weeks might allow time for studios to tweak their marketing strategy, the holy grail of box office prediction would tell studios how much a movie could make before it is ever greenlit into production.

Google’s white paper is not short on interesting choices or further points of guidance. Their model was created using only 99 films released in 2012, which is far fewer than we might expect. They also offer detail on models that predict box office revenue 1 week prior, 1 day prior, and beyond opening weekend. Their inclusion of Rotten Tomatoes audience score exclusively for ‘beyond opening weekend’ confirms my suspicion that review scores are less influential at the very beginning of a film’s run.

I sincerely trust the figures Google has provided. After all, this white paper was published in 2013, five years before the company removed ‘Don’t Be Evil’ from its code of conduct. That said, I couldn’t help but notice a certain theme running throughout the article that just happens to be very convenient for their bottom line.

As with most things Google, this white paper is free with ads. The ad in this case is Google’s own AdWords marketing service. While paid clicks is not a factor in their most predictive model, it is a central theme in their overall messaging. Two of the large, bold, red callout sections promote:

  • If a new film garners 20,000 more paid clicks than a similar film, that film is likely to bring in up to $7.5M more in opening weekend receipts.
  • If one film garners 10,000 more paid clicks than a similar holdover film, that film is likely to perform approximately $1.9-$3.5M better.

At face value, the first claim seems to insinuate that each paid click garners an additional $375 in revenue on opening weekend (up to 20,000 clicks). With the average AdWords search term costing $1–2, this seems like a no-brainer for film marketers. However, a critical detail lies in the fine print, “20,000 more paid clicks than a similar film.” Placing an ad on Google AdWords requires marketers to place competitive bids for search terms. Instead of providing marketers something absolute like 20,000 paid clicks ‘total,’ the ‘more than’ context is framed in a way that encourages competitive bidding for search terms, which is conveniently the most lucrative outcome for Google.

On top of that, it’s challenging to digest the feasibility of this claim. Even if every one of the 20,000 paid clickers purchased a ticket (each ticket averaging $8.12 in 2012), this would only equate to an additional $162,400 at the box office. I should acknowledge the counterargument — that there are likely many more users who noticed the paid ad but didn’t click, and that might count as an influential impression.

Adding to the lack of clarity around paid clicks, it’s curious that Google only discloses R-squared for individual features for its model that does not include paid clicks. For models used to predict revenue one day before and for the period after opening weekend, search ad click volume is considered significant, but the feature’s individual R-squared is not provided. Additionally, the significant variable is listed as “search ad click volume,” which implies the feature is indeed absolute and can be understood outside the confines of relative competition, yet Google only provides actionable detail in a way that encourages outbidding rivals for search terms.

To be fair, this is much ado about nothing, and I can only accuse Google of clever salesmanship. In a follow up article in the Hollywood Reporter, “Hollywood Scoffs at Google Box Office Prediction Tool,” one of the writers of Google’s white paper clarifies that paid clicks are not their answer for everything, “You can’t really buy search query volume. That’s organic; it comes from things like trailers, billboards. If a studio is unhappy about what we are predicting, we won’t say search is the missing element.”

Despite my deep dive into questioning Google’s motivation for publishing their white paper, I still find the article very relevant for my own quest to build useful models for studios. I do trust that search volume and even paid clicks are telling indicators of consumer interest and behavior. Importantly, while Google generally does not offer the public a way to fully quantify search volume for any given subject, it does allow us to get half the story with Google Trends. Using Trends, it is possible to get a normalized understanding of any key phrase. For example, using the key phrase “Thor Ragnarok,” Google will show a chart detailing search volume over time on a normalized scale (from 0–100). Combining this with a second key phrase like “new movies,” we can normalize the relative search volume for multiple movies in our dataset. I would like to credit Harrison M for this insight.

This system isn’t perfect — notice that using “new movies” as a relative search term isn’t exactly flat, it spikes around peak movie seasonality. Ideally, we would be able to find a key phrase that is very flat over time. Hopefully further research will reveal a more steady search term allowing us to determine an approximation of search volume for our dataset.

Next time, I hope to return with a model that incorporates relative Google search volume as well as other features. I should also have a better understanding of time series data, which will allow me to make predictions prior to and after release.