Maps

Required reading

Required viewing

Recommended reading

Key concepts/skills/etc

Key libraries

Key functions/etc

Introduction

In many ways maps can be thought of as a fancy graph, where the x-axis is latitude, the y-axis is longitude, and there is some outline or a background image. We are used to this type of set-up, for instance, in a ggplot setting that is quite familiar.

ggplot() +
  geom_polygon( # First draw an outline
    data = some_data, 
    aes(x = latitude, 
        y = longitude,
        group = group
        )) +
  geom_point( # Then add points of interest
    data = some_other_data, 
    aes(x = latitude, 
        y = longitude)
    )

And while there are some small complications, for the most part it is as straight-forward as that. The first step is to get some data. And helpfully, there is some geographic data built into ggplot, and there is some other information built into a package called maps.

library(maps)
library(tidyverse)

canada <- map_data(database = "world", regions = "canada")
canadian_cities <- maps::canada.cities

head(canada)
       long      lat group order region    subregion
1 -59.78760 43.93960     1     1 Canada Sable Island
2 -59.92227 43.90391     1     2 Canada Sable Island
3 -60.03775 43.90664     1     3 Canada Sable Island
4 -60.11426 43.93911     1     4 Canada Sable Island
5 -60.11748 43.95337     1     5 Canada Sable Island
6 -59.93604 43.93960     1     6 Canada Sable Island
head(canadian_cities)
           name country.etc    pop   lat    long capital
1 Abbotsford BC          BC 157795 49.06 -122.30       0
2      Acton ON          ON   8308 43.63  -80.03       0
3 Acton Vale QC          QC   5153 45.63  -72.57       0
4    Airdrie AB          AB  25863 51.30 -114.02       0
5    Aklavik NT          NT    643 68.22 -135.00       0
6    Albanel QC          QC   1090 48.87  -72.42       0

With that information in hand we can then create a map of Canada that shows the cities with a population over 1,000. (The geom_polygon() function within ggplot draws shapes, by connecting points within groups. And the coord_map() function adjusts for the fact that we are making something that is 2D map to represent something that is 3D.)

ggplot() +
  geom_polygon(data = canada,
               aes(x = long,
                   y = lat,
                   group = group),
               fill = "white", 
               colour = "grey") +
  coord_map(ylim = c(40, 70)) +
  geom_point(aes(x = canadian_cities$long, 
                 y = canadian_cities$lat),
             alpha = 0.3,
             color = "black") +
  theme_classic() +
  labs(x = "Longitude",
       y = "Latitude")
# If I'm being honest, this 'simple example' took me six hours to work out. Firstly 
# to find Canada and then to find Canadian cities.

In this section we will go through two types of maps: static and interactive. Static maps will be useful for printed output, such as a PDF or Word report, or where there is something in particular that you want to illustrate. Interactive maps will be more useful in an online setting, where you want your users to be able to explore the data themselves. (Further, they are a great way to take advantage of having your own website.)

Static maps

As is often the case with R, there are many different ways to get started create static maps. We’ve already seen how they can be built using simply ggplot, but here we’ll explore one package that has a bunch of functionality built in that will make things easier: ggmap.

There are two essential components to a map: 1) some border or background image (also known as a tile); and 2) something of interest within that border or on top of that tile. In ggmap, we will use an open source option for our tile, Stamen Maps (maps.stamen.com), and we will use plot points based on latitude and longitude.

Australian polling places

Like Canada, in Australia people go to specific locations, called booths, to vote. These booths have latitudes and longitudes and so we can plot these. One reason we may like to do this is to notice patterns over geographies.

To get started we need to get a tile. We are going to use ggmap to get a tile from Stamen Maps, which builds on OpenStreetMap (openstreetmap.org). The main argument to this function is to specify a bounding box. This requires two latitudes - one for the top of the box and one for the bottom of the box - and two longitudes - one for the left of the box and one for the right of the box. (It can be useful to use Google Maps, or an alternative, to find the values of these that you need.) The bounding box provides the coordinates of the edges that you are interested in. In this case I have provided it with coordinates such that it will be centered around Canberra, Australia (our equivalent of Ottawa - a small city that was created for the purposes of being the capital).

library(ggmap)

bbox <- c(left = 148.95, bottom = -35.5, right = 149.3, top = -35.1)

Once you have defined the bounding box, then the function get_stamenmap() will get the tiles in that area. The number of tiles that it needs to get depends on the zoom, and the type of tiles that it gets depends on the maptype. I’ve chosen the maptype that I like here - the black and white option - but the helpfile specifies a few others that you may like. At this point you can pass your maps to ggmap and it will plot the tile! It will be actively downloading these tiles, so you need an internet connection.

canberra_stamen_map <- get_stamenmap(bbox, zoom = 11, maptype = "toner-lite")

ggmap(canberra_stamen_map)

Once we have a map then we can use ggmap() to plot it. (That circle in the middle of the map is where the Australian Parliament House is… yes, our parliament is surrounded by circular roads (we call them ‘roundabouts’), actually it’s surrounded by two of them.)

Now we want to get some data that we will plot on top of our tiles. We will just plot the location of the polling places, based on which ‘division’ (the Australian equivalent to ‘ridings’ in Canada) it is. This is available here: https://results.aec.gov.au/20499/Website/Downloads/HouseTppByPollingPlaceDownload-20499.csv. (The Australian Electoral Commission (AEC) is the official government agency that is responsible for elections in Australia.)

# Read in the booths data for each year
booths <- readr::read_csv("https://results.aec.gov.au/24310/Website/Downloads/GeneralPollingPlacesDownload-24310.csv", 
                          skip = 1, 
                          guess_max = 10000)

head(booths)
# A tibble: 6 x 15
  State DivisionID DivisionNm PollingPlaceID PollingPlaceTyp… PollingPlaceNm
  <chr>      <dbl> <chr>               <dbl>            <dbl> <chr>         
1 ACT          318 Bean                93925                5 Belconnen BEA…
2 ACT          318 Bean                93927                5 BLV Bean PPVC 
3 ACT          318 Bean                11877                1 Bonython      
4 ACT          318 Bean                11452                1 Calwell       
5 ACT          318 Bean                 8761                1 Chapman       
6 ACT          318 Bean                 8763                1 Chisholm      
# … with 9 more variables: PremisesNm <chr>, PremisesAddress1 <chr>,
#   PremisesAddress2 <chr>, PremisesAddress3 <chr>, PremisesSuburb <chr>,
#   PremisesStateAb <chr>, PremisesPostCode <chr>, Latitude <dbl>,
#   Longitude <dbl>

This dataset is for the whole of Australia, but as we are just going to plot the area around Canberra we will filter to that and only to booths that are geographic (the AEC has various options for people who are in hospital, or not able to get to a booth, etc, and these are still ‘booths’ in this dataset).

# Reduce the booths data to only rows with that have latitude and longitude
booths_reduced <-
  booths %>%
  filter(State == "ACT") %>% 
  select(PollingPlaceID, DivisionNm, Latitude, Longitude) %>% 
  filter(!is.na(Longitude)) %>% # Remove rows that don't have a geography
  filter(Longitude < 165) # Remove Norfolk Island

Now we can use ggmap in the same way as before to plot our underlying tiles, and then build on that using geom_point() to add our points of interest.

ggmap(canberra_stamen_map, 
      extent = "normal", 
      maprange = FALSE) +
  geom_point(data = booths_reduced,
             aes(x = Longitude, 
                 y = Latitude, 
                 colour = DivisionNm),
             ) +
  scale_color_brewer(name = "2019 Division", palette = "Set1") +
  coord_map(projection="mercator",
            xlim=c(attr(map, "bb")$ll.lon, attr(map, "bb")$ur.lon),
            ylim=c(attr(map, "bb")$ll.lat, attr(map, "bb")$ur.lat)) +
  labs(x = "Longitude",
       y = "Latitude") +
  theme_minimal() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

We may like to save the map so that we don’t have to draw it every time, and we can do that in the same way as any other graph, using ggsave().

ggsave("outputs/figures/map.pdf", width = 20, height = 10, units = "cm")

Finally, the reason that I used Stamen Maps and OpenStreetMap is because it is open source, however you can also use Google Maps if you want. This requires you to first register a credit card with Google, and specify a key, but with low usage should be free. The get_googlemap() function with ggmap, brings some nice features that get_stamenmap() does not have. For instance, you can enter a placename and it’ll do it’s best to find it rather than needing to specify a bounding box.

Toronto bike parking

Let’s see another example of a static map, this time using Toronto data accessed via the opendatatoronto package. The dataset that we are going to plot is available here: https://open.toronto.ca/dataset/street-furniture-bicycle-parking/.

# This code is based on code from: https://open.toronto.ca/dataset/street-furniture-bicycle-parking/.
library(opendatatoronto)
# (The string identifies the package.)
resources <- list_package_resources("71e6c206-96e1-48f1-8f6f-0e804687e3be")
# In this case there is only one dataset within this resource so just need the first one  
raw_data <- filter(resources, row_number()==1) %>% get_resource()
write_csv(raw_data, "inputs/data/bike_racks.csv")
head(raw_data)

Now that we’ve saved a copy of the data, we can use that one. First we need to clean it up a bit. There are some clear errors in the ADDRESSNUMBERTEXT field, but not too many, so we’ll just ignore it.

raw_data <- read_csv("inputs/data/bike_racks.csv")
# We'll just focus on the data that we want
bike_data <- tibble(ward = raw_data$WARD,
                    id = raw_data$ID,
                    status = raw_data$STATUS,
                    street_address = paste(raw_data$ADDRESSNUMBERTEXT, raw_data$ADDRESSSTREET),
                    latitude = raw_data$LATITUDE,
                    longitude = raw_data$LONGITUDE)
rm(raw_data)

Some of the bike racks were temporary so remove them and also let’s just look at the area around the university, which is Ward 11

# Only keep ones that still exist
bike_data <- 
  bike_data %>%
  filter(status == "Existing") %>% 
  select(-status)

bike_data <- bike_data %>% 
  filter(ward == 11) %>% 
  select(-ward)

If you look at the dataset at this point then you’ll notice that there is a row for every bike parking spot. But we don’t really need to know that, because sometimes there are lots right next to each other. Instead we’d just like the one point (we’ll take advantage of this in an interactive graph in a moment). So we want to create a count by address, and then just get one instance per address.

bike_data <- 
  bike_data %>%
  group_by(street_address) %>% 
  mutate(number_of_spots = n(),
         running_total = row_number()
         ) %>% 
  ungroup() %>% 
  filter(running_total == 1) %>% 
  select(-id, -running_total)

head(bike_data)
# A tibble: 6 x 4
  street_address   latitude longitude number_of_spots
  <chr>               <dbl>     <dbl>           <int>
1 8 Kensington Ave     43.7     -79.4               1
2 87 Avenue Rd         43.7     -79.4               4
3 162 Mc Caul St       43.7     -79.4               1
4 147 Baldwin St       43.7     -79.4               2
5 888 Yonge St         43.7     -79.4               1
6 180 Elizabeth St     43.7     -79.4              10

Now we can grab our tile, and add our bike rack data onto it.

bbox <- c(left = -79.420390, bottom = 43.642658, right = -79.383354, top = 43.672557)

toronto_stamen_map <- get_stamenmap(bbox, zoom = 14, maptype = "toner-lite")

ggmap(toronto_stamen_map,  maprange = FALSE) +
  geom_point(data = bike_data,
             aes(x = longitude, 
                 y = latitude),
             alpha = 0.3
             ) +
  labs(x = "Longitude",
       y = "Latitude") +
  theme_minimal() 

Interactive maps

The nice thing about interactive maps is that you can let your users decide what they are interested in. Additionally, if there is a lot of information then you may like to leave it to your users as to selectively focus on what they are interested in. For instance, in the case of Canadian politics, some people will be interested in Toronto ridings, while others will be interested in Manitoba, etc. But it would be difficult to present a map that focuses on both of those, so an interactive map is a great option for allowing users to zoom in on what they want.

Leaflet

The leaflet package is originally a JavaScript library of the same name that has been brought over to R. It makes it easy to make interactive maps. The basics are fairly similar to the ggmap set-up, but of course after that, there are many, many, options.

Let’s redo the bike map from earlier, and possibly the interaction will allow us to see what the issue is with the data.

In the same way as a graph in ggplot begins with the ggplot() function, a map in the leaflet package begins with a call to the leaflet() function. This allows you to specify data, and a bunch of other options such as width and height. After this, we add ‘layers’, in the same way that we added them in ggplot. The first layer that we’ll add is a tile with the function addTiles(). In this case, the default is from OpenStreeMap. After that we’ll add markers that show the location of each bike parking spot with addMarkers().

library(leaflet)

leaflet(data = bike_data) %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(lng = bike_data$longitude, 
             lat = bike_data$latitude, 
             popup = bike_data$street_address,
             label = ~as.character(bike_data$number_of_spots))

There are two options here that may not be familiar. The first is popup, and this is what happens when you click on the marker. In this example this is giving the address. The second is label, which is what happens when you hover over the marker. In this example it is given the number of spots.

COVID-19

Let’s have another go, this time with Ontario data on COVID-19.

We can download the latest data from the Ontario Data Catalogue. This is a fast moving situation in which they are likely to make breaking changes to this dataset. To ensure these notes work, I will save and then use the dataset as at 4 April 2020, but you are able to get the up-to-date dataset using the link and the code.

ontario_covid <- read_csv("https://data.ontario.ca/datastore/dump/455fd63b-603d-4608-8216-7d8647f43350?bom=True")
write_csv(ontario_covid, "inputs/data/ontario_covid_2020-04-04.csv")
ontario_covid <- read_csv("inputs/data/ontario_covid_2020-04-04.csv")
head(ontario_covid)
# A tibble: 6 x 14
  `_id` ROW_ID ACCURATE_EPISODE_D… Age_Group CLIENT_GENDER CASE_ACQUISITIO…
  <dbl>  <dbl> <dttm>              <chr>     <chr>         <chr>           
1     1      1 2020-03-07 00:00:00 40s       MALE          Neither         
2     2      2 2020-03-08 00:00:00 20s       MALE          Neither         
3     3      3 2020-03-10 00:00:00 40s       FEMALE        Neither         
4     4      4 2020-03-11 00:00:00 50s       FEMALE        Neither         
5     5      5 2020-03-12 00:00:00 30s       FEMALE        Neither         
6     6      6 2020-03-15 00:00:00 50s       MALE          Neither         
# … with 8 more variables: OUTCOME1 <chr>, Reporting_PHU <chr>,
#   Reporting_PHU_Address <chr>, Reporting_PHU_City <chr>,
#   Reporting_PHU_Postal_Code <chr>, Reporting_PHU_Website <chr>,
#   Reporting_PHU_Latitude <dbl>, Reporting_PHU_Longitude <dbl>

There is a lot of information here, but we’ll just plot the number of cases, by the reporting area (health areas). So this isn’t the location of the person, but the location of the responsible health unit. Because of this, we’ll add a little bit of noise so that the marker for each person can be seen. We do this with jitter().

ontario_covid <- 
  ontario_covid %>% 
  mutate(Reporting_PHU_Latitude = jitter(Reporting_PHU_Latitude, amount = 0.1),
         Reporting_PHU_Longitude = jitter(Reporting_PHU_Longitude, amount = 0.1))

We will introduce a different type of marker here, which is circles. This will allow us to use different colours for the outcomes of each case. There are three possible outcomes: the case is resolved, it is not resolved, or it was fatal.

library(leaflet)

pal <- colorFactor("Dark2", domain = ontario_covid$OUTCOME1 %>% unique())

leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addCircleMarkers(
    data = ontario_covid,
    lng = ontario_covid$Reporting_PHU_Longitude, 
    lat = ontario_covid$Reporting_PHU_Latitude, 
    color = pal(ontario_covid$OUTCOME1),
    popup = paste("<b>Age-group:</b>", as.character(ontario_covid$Age_Group), "<br>",
                  "<b>Gender:</b>", as.character(ontario_covid$CLIENT_GENDER), "<br>",
                  "<b>Acquisition:</b>", as.character(ontario_covid$CASE_ACQUISITIONINFO), "<br>",
                  "<b>Episode date:</b>", as.character(ontario_covid$ACCURATE_EPISODE_DATE), "<br>")
    ) %>% 
  addLegend("bottomright", 
            pal = pal, 
            values = ontario_covid$OUTCOME1 %>% unique(),
    title = "Case outcome",
    opacity = 1
  )

mapdeck

Thank you to Shaun Ratcliff for introducing me to mapdeck.

The package mapdeck is an R package that is built on top of Mapbox (https://www.mapbox.com). It is based on WebGL, which means that your web browser does a lot of work for you. The nice thing is that because of this, it can do a bunch of things that leaflet struggles with, especially dealing with larger datasets. Mapbox is a full-featured application that many businesses that you may have heard of use: https://www.mapbox.com/showcase. To close out these notes on mapping, I want to briefly touch on mapdeck, as it is a newer, but very exciting, package.

To this point we have used stamen maps as our tile, but mapdeck uses mapbox - https://www.mapbox.com/ - and so you need to register and get a token for this. (It’s free.) Once you have that token you add it to R using:

library(mapdeck)
set_token("asdf") # replace asdf with your token.
mapdeck_tokens()

set_token(test$key)

(Don’t add it into your script otherwise everyone will be able to take it and use it, especially once you add it to GitHub.)

Then we need some data. Here we’re going to just use the example dataset, which is about flights.

# Code taken from the example: https://github.com/SymbolixAU/mapdeck
library(mapdeck)

url <- 'https://raw.githubusercontent.com/plotly/datasets/master/2011_february_aa_flight_paths.csv'
flights <- read.csv(url)
flights$info <- paste0("<b>",flights$airport1, " - ", flights$airport2, "</b>")

head(flights)
  start_lat start_lon  end_lat    end_lon airline airport1 airport2 cnt
1  32.89595 -97.03720 35.04022 -106.60919      AA      DFW      ABQ 444
2  41.97960 -87.90446 30.19453  -97.66987      AA      ORD      AUS 166
3  32.89595 -97.03720 41.93887  -72.68323      AA      DFW      BDL 162
4  18.43942 -66.00183 41.93887  -72.68323      AA      SJU      BDL  56
5  32.89595 -97.03720 33.56294  -86.75355      AA      DFW      BHM 168
6  25.79325 -80.29056 36.12448  -86.67818      AA      MIA      BNA  56
              info
1 <b>DFW - ABQ</b>
2 <b>ORD - AUS</b>
3 <b>DFW - BDL</b>
4 <b>SJU - BDL</b>
5 <b>DFW - BHM</b>
6 <b>MIA - BNA</b>

Finally, we can call the map. Again, this is just the example in the package’s website.

mapdeck(style = mapdeck_style('dark')
        ) %>%
  add_arc(
    data = flights
    , origin = c("start_lon", "start_lat")
    , destination = c("end_lon", "end_lat")
    , stroke_from = "airport1"
    , stroke_to = "airport2"
    , tooltip = "info"
    , layer_id = 'arclayer'
  )

And this is pretty nice!

References