An animation of theme parks opening around the world

parntersdisney

I’ve been collecting a lot of data to be able to do my last few posts, and I’d mentioned that I wanted to try more with time series data. A few years ago I got to sit in a lecture at Queensland University of Technology by Sudipto Banerjee on Bayesian spatiotemporal modelling. At the time the material was way too advanced for me, but the idea of analysing data points with time and space treated correctly has always stuck.

As I dug into different things I could do with spatiotemporal , I realised that I needed a lot more understanding of the data itself before I could do fun tricksy things with it. I needed something that would maintain my interest, but also force me to mess around munging spatiotemporal data

An idea born of necessity

In the first year of my postgraduate research, I was really interested in data visualisations. Thankfully at the time a bunch of blogs like FlowingData were starting up, reporting on all types of cool data graphics.’Infographics’ also became a thing, threatening to destroy Data Science in its infancy. But what caught my eye at the time were the visualisations of flight paths like this one.

So now that I have some data and a bit of time and ability, I thought I’d try a more basic version of a spatiotemporal visualisation like this. My problem is that I hate installing extra one-time software for my whims, so the idea of using ImageMagick annoyed me. On top of that, when I tried I couldn’t get it to work so I determined to do what I could using base R and ggplot.

The result

This is probably the first article I can say I’m pretty happy with the result:

movie

 

The first thing you can see is that Europe, not the US is the true home of theme parks, with Tivoli Gardens appearing in 1843, and remaining in the top 25 theme parks since before there were 25 parks to compete against.

Beyond that, you can also sort of see  that there is a ‘contagion’ effect of parks – when one opens in an area, there are usually others opening nearby pretty soon. There’s two reasons I can think of for this. First, once people are travelling to an area to go to a theme park, going to two theme parks probably isn’t out of the question so someone’s bound to move in to capture that cash. Second is that the people opening new parks have to learn to run theme parks somewhere, and if you’re taking a massive risk on opening a $100 million park with a bunch of other people’s money you’ll want to minimise your risk by opening it in a place you understand.

Future stuff

Simply visualising the data turned out to be more than a data munging exercise for me – plotting this spatially as an animation gave some actual insights about how these things have spread over the world. It made me more interested in doing the spatio-temporal clustering as well – it would be really cool to do that then redo this plot with the colours of the points determined by the park’s cluster.

Another direction to explore would be to learn more about how to scrape Wikipedia and fill out my data table with more parks rather than just those that have featured in the TEA reports. I know this is possible and it’s not exactly new, but it’s never come across my radar and web scraping is a pretty necessary tool in the Data Science toolkit.

What applications can you think of for this sort of visualisation? Is there anything else I could add to this one that might improve it? I’d love to hear your thoughts!

The code

Just in case you wanted to do the same, I’ve added the code with comments below. You’ll need to add your own file with a unique name, latitude, longitude and date in each row.

# Load the required libraries
library(ggmap)
library(ggplot2)
library(maptools)
library(maps)
library(data.table)

info <- read.csv("***.csv", stringsAsFactors = FALSE)

info <- info[complete.cases(info),]
setDT(info)
info$opened <- as.Date(info$opened) # S
setkey(info, park)
setkey(info, opened)


# Setup for an animation 
a_vec <- seq(1840, 2016 , by=1) # Create a vector of the years you will 
animate over

# Create a matrix to hold the 'size' information for the graph
B = matrix( rep(0, length(a_vec)*length(info$park)),
 nrow= length(a_vec), 
 ncol= length(info$park))

for (i in 1:ncol(B))
{
 for (x in 1: nrow(B))
 { #I want to have a big dot when it opens that gets gradually smaller,
    like the alpha in the flights visualisation.
  open_date <- as.numeric(year(info$opened[i]))
  c_year <- a_vec[x]
  #If the park hasn't opened yet give it no circle
  if ( open_date < c_year)
  {B[x,i] <- 0} else
  # If the park is in its opening year, give it a big circle.
  if (open_date == c_year)
  {B[x,i] <- 10}
  }}

# Make the circle fade from size 10 to size 1, then stay at 1 until 
the end of the matrix

for (i in 1:ncol(B))
{ for (x in 2: nrow(B))
  {if (B[x-1, i] > 1){ B[x,i] <- B[x-1, i] - 1}else
   if(B[x-1, i] == 1){ B[x,i] <- 1}}}

B <- data.frame(B)
B <- cbind( a_vec, B)
setDT(B)
names(B) <- c("years", info$park) #Set the column names to the names of 
the parks

xxx <- melt(B, "years") # Convert to long format

# Create a table of locations
loc <- data.table("variable" = info$park,
                   "lat"= info$lat, 
                   "long"= info$long)

#Join the locations to the long table
xxx <- merge(xxx, loc, by = "variable", all.x = TRUE)
setkey(xxx, years)

# Create a ggplot image for each entry in the a_vec vector of years we
 made at the beginning. 
for (i in 1: length(a_vec))
    {mydata <- xxx[years ==a_vec[i]] # Only graph the rows for year i.
     mydata <- mydata[mydata$value!=0,] #Don't plot stuff not open yet.
     #Write the plot to a jpeg file and give it a number to keep the 
      frames in order.
     jpeg(filename = 
     paste("~/chosenfolder/animation", i, ".jpeg", sep = ""),
     width = (429*2) , height = (130*2), units = "px") 
     mp <- NULL 
     # Plot a world map in grey and entitle it with the year.
     mapWorld <- borders("world", colour="gray50", fill="gray50") 
     mp <- ggplot() + mapWorld + theme_bw() + ggtitle(a_vec[i])
     # Add the points on the map, using the size vector we spent all that
       time building matrices to produce.
     mp <- mp+ geom_point(aes(x=mydata$long, y=mydata$lat) ,
     color = "orange", size = mydata$value/1.5) + ylim(c(0, 60))
     plot(mp)
     dev.off()
}

 

One thought on “An animation of theme parks opening around the world

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s