Chapter 5 Maps
- TODO: Add something about geocoding.
- Cooley, David, 2020, ‘mapdeck’, freely available at: https://symbolixau.github.io/mapdeck/index.html.
- Kolb, Jan-Philipp, 2019, ‘Using Web Services to Work with Geodata in R’, The R Journal, 11:2, pages 6-23, freely available at: https://journal.r-project.org/archive/2019/RJ-2019-041/index.html.
- Gabrielle, 2019, ‘Visualising spatial data using sf and mapdeck - part one’, 4 December, freely available at: https://resources.symbolix.com.au/2019/12/04/mapdeck-1/.
- ‘Leaflet for R’, freely available at: https://rstudio.github.io/leaflet/.
- Kuriwaki, Shiro, 2020, ‘Making maps in R with sf’, 1 March, freely available at: https://vimeo.com/394800836.
- Engel, Claudia A, 2019, Using Spatial Data with R, 11 February, Chapter 3 Making Maps in R, freely available at: https://cengel.github.io/R-spatial/mapping.html.
- Lovelace, Robin, Jakub Nowosad, Jannes Muenchow, 2020, Geocomputation with R, 29 March, Chapter 8, Making maps with R, freely available at: https://geocompr.robinlovelace.net/adv-map.html.
- Thinking of maps as a (often fiddly, but strangely enjoyable) variant of a usual ggplot.
- Broadening the data that we make available via interactive maps, while still telling a clear story.
- Becoming comfortable with (and excited about) creating both static and interactive maps.
In many ways maps can be thought of as a fancy graph, where the x-axis is latitude, the y-axis is longitude, and there is some outline or a background image. We are used to this type of set-up, for instance, in a ggplot setting that is quite familiar.
And while there are some small complications, for the most part it is as straight-forward as that. The first step is to get some data. And helpfully, there is some geographic data built into ggplot, and there is some other information built into a package called
## long lat group order region subregion ## 1 -59.78760 43.93960 1 1 Canada Sable Island ## 2 -59.92227 43.90391 1 2 Canada Sable Island ## 3 -60.03775 43.90664 1 3 Canada Sable Island ## 4 -60.11426 43.93911 1 4 Canada Sable Island ## 5 -60.11748 43.95337 1 5 Canada Sable Island ## 6 -59.93604 43.93960 1 6 Canada Sable Island
## name country.etc pop lat long capital ## 1 Abbotsford BC BC 157795 49.06 -122.30 0 ## 2 Acton ON ON 8308 43.63 -80.03 0 ## 3 Acton Vale QC QC 5153 45.63 -72.57 0 ## 4 Airdrie AB AB 25863 51.30 -114.02 0 ## 5 Aklavik NT NT 643 68.22 -135.00 0 ## 6 Albanel QC QC 1090 48.87 -72.42 0
With that information in hand we can then create a map of Canada that shows the cities with a population over 1,000. (The
geom_polygon() function within
ggplot draws shapes, by connecting points within groups. And the
coord_map() function adjusts for the fact that we are making something that is 2D map to represent something that is 3D.)
In this section we will go through two types of maps: static and interactive. Static maps will be useful for printed output, such as a PDF or Word report, or where there is something in particular that you want to illustrate. Interactive maps will be more useful in an online setting, where you want your users to be able to explore the data themselves. (Further, they are a great way to take advantage of having your own website.)
5.2 Static maps
As is often the case with R, there are many different ways to get started create static maps. We’ve already seen how they can be built using simply ggplot, but here we’ll explore one package that has a bunch of functionality built in that will make things easier:
There are two essential components to a map: 1) some border or background image (also known as a tile); and 2) something of interest within that border or on top of that tile. In
ggmap, we will use an open source option for our tile, Stamen Maps (maps.stamen.com), and we will use plot points based on latitude and longitude.
184.108.40.206 Australian polling places
Like Canada, in Australia people go to specific locations, called booths, to vote. These booths have latitudes and longitudes and so we can plot these. One reason we may like to do this is to notice patterns over geographies.
To get started we need to get a tile. We are going to use
ggmap to get a tile from Stamen Maps, which builds on OpenStreetMap (openstreetmap.org). The main argument to this function is to specify a bounding box. This requires two latitudes - one for the top of the box and one for the bottom of the box - and two longitudes - one for the left of the box and one for the right of the box. (It can be useful to use Google Maps, or an alternative, to find the values of these that you need.) The bounding box provides the coordinates of the edges that you are interested in. In this case I have provided it with coordinates such that it will be centered around Canberra, Australia (our equivalent of Ottawa - a small city that was created for the purposes of being the capital).
Once you have defined the bounding box, then the function
get_stamenmap() will get the tiles in that area. The number of tiles that it needs to get depends on the zoom, and the type of tiles that it gets depends on the maptype. I’ve chosen the maptype that I like here - the black and white option - but the helpfile specifies a few others that you may like. At this point you can pass your maps to ggmap and it will plot the tile! It will be actively downloading these tiles, so you need an internet connection.
Once we have a map then we can use
ggmap() to plot it. (That circle in the middle of the map is where the Australian Parliament House is… yes, our parliament is surrounded by circular roads (we call them ‘roundabouts’), actually it’s surrounded by two of them.)
Now we want to get some data that we will plot on top of our tiles. We will just plot the location of the polling places, based on which ‘division’ (the Australian equivalent to ‘ridings’ in Canada) it is. This is available here: https://results.aec.gov.au/20499/Website/Downloads/HouseTppByPollingPlaceDownload-20499.csv. (The Australian Electoral Commission (AEC) is the official government agency that is responsible for elections in Australia.)
## # A tibble: 6 x 15 ## State DivisionID DivisionNm PollingPlaceID PollingPlaceTyp… PollingPlaceNm ## <chr> <dbl> <chr> <dbl> <dbl> <chr> ## 1 ACT 318 Bean 93925 5 Belconnen BEA… ## 2 ACT 318 Bean 93927 5 BLV Bean PPVC ## 3 ACT 318 Bean 11877 1 Bonython ## 4 ACT 318 Bean 11452 1 Calwell ## 5 ACT 318 Bean 8761 1 Chapman ## 6 ACT 318 Bean 8763 1 Chisholm ## # … with 9 more variables: PremisesNm <chr>, PremisesAddress1 <chr>, ## # PremisesAddress2 <chr>, PremisesAddress3 <chr>, PremisesSuburb <chr>, ## # PremisesStateAb <chr>, PremisesPostCode <chr>, Latitude <dbl>, ## # Longitude <dbl>
This dataset is for the whole of Australia, but as we are just going to plot the area around Canberra we will filter to that and only to booths that are geographic (the AEC has various options for people who are in hospital, or not able to get to a booth, etc, and these are still ‘booths’ in this dataset).
# Reduce the booths data to only rows with that have latitude and longitude booths_reduced <- booths %>% filter(State == "ACT") %>% select(PollingPlaceID, DivisionNm, Latitude, Longitude) %>% filter(!is.na(Longitude)) %>% # Remove rows that don't have a geography filter(Longitude < 165) # Remove Norfolk Island
Now we can use
ggmap in the same way as before to plot our underlying tiles, and then build on that using
geom_point() to add our points of interest.
ggmap(canberra_stamen_map, extent = "normal", maprange = FALSE) + geom_point(data = booths_reduced, aes(x = Longitude, y = Latitude, colour = DivisionNm), ) + scale_color_brewer(name = "2019 Division", palette = "Set1") + coord_map(projection="mercator", xlim=c(attr(map, "bb")$ll.lon, attr(map, "bb")$ur.lon), ylim=c(attr(map, "bb")$ll.lat, attr(map, "bb")$ur.lat)) + labs(x = "Longitude", y = "Latitude") + theme_minimal() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
We may like to save the map so that we don’t have to draw it every time, and we can do that in the same way as any other graph, using
Finally, the reason that I used Stamen Maps and OpenStreetMap is because it is open source, however you can also use Google Maps if you want. This requires you to first register a credit card with Google, and specify a key, but with low usage should be free. The
get_googlemap() function with
ggmap, brings some nice features that
get_stamenmap() does not have. For instance, you can enter a placename and it’ll do it’s best to find it rather than needing to specify a bounding box.
220.127.116.11 Toronto bike parking
Let’s see another example of a static map, this time using Toronto data accessed via the
opendatatoronto package. The dataset that we are going to plot is available here: https://open.toronto.ca/dataset/street-furniture-bicycle-parking/.
# This code is based on code from: https://open.toronto.ca/dataset/street-furniture-bicycle-parking/. library(opendatatoronto) # (The string identifies the package.) resources <- list_package_resources("71e6c206-96e1-48f1-8f6f-0e804687e3be") # In this case there is only one dataset within this resource so just need the first one raw_data <- filter(resources, row_number()==1) %>% get_resource() write_csv(raw_data, "inputs/data/bike_racks.csv") head(raw_data)
Now that we’ve saved a copy of the data, we can use that one. First we need to clean it up a bit. There are some clear errors in the ADDRESSNUMBERTEXT field, but not too many, so we’ll just ignore it.
raw_data <- read_csv("inputs/data/bike_racks.csv") # We'll just focus on the data that we want bike_data <- tibble(ward = raw_data$WARD, id = raw_data$ID, status = raw_data$STATUS, street_address = paste(raw_data$ADDRESSNUMBERTEXT, raw_data$ADDRESSSTREET), latitude = raw_data$LATITUDE, longitude = raw_data$LONGITUDE) rm(raw_data)
Some of the bike racks were temporary so remove them and also let’s just look at the area around the university, which is Ward 11
If you look at the dataset at this point then you’ll notice that there is a row for every bike parking spot. But we don’t really need to know that, because sometimes there are lots right next to each other. Instead we’d just like the one point (we’ll take advantage of this in an interactive graph in a moment). So we want to create a count by address, and then just get one instance per address.
## # A tibble: 6 x 4 ## street_address latitude longitude number_of_spots ## <chr> <dbl> <dbl> <int> ## 1 8 Kensington Ave 43.7 -79.4 1 ## 2 87 Avenue Rd 43.7 -79.4 4 ## 3 162 Mc Caul St 43.7 -79.4 1 ## 4 147 Baldwin St 43.7 -79.4 2 ## 5 888 Yonge St 43.7 -79.4 1 ## 6 180 Elizabeth St 43.7 -79.4 10
Now we can grab our tile, and add our bike rack data onto it.
bbox <- c(left = -79.420390, bottom = 43.642658, right = -79.383354, top = 43.672557) toronto_stamen_map <- get_stamenmap(bbox, zoom = 14, maptype = "toner-lite") ggmap(toronto_stamen_map, maprange = FALSE) + geom_point(data = bike_data, aes(x = longitude, y = latitude), alpha = 0.3 ) + labs(x = "Longitude", y = "Latitude") + theme_minimal()
5.3 Interactive maps
The nice thing about interactive maps is that you can let your users decide what they are interested in. Additionally, if there is a lot of information then you may like to leave it to your users as to selectively focus on what they are interested in. For instance, in the case of Canadian politics, some people will be interested in Toronto ridings, while others will be interested in Manitoba, etc. But it would be difficult to present a map that focuses on both of those, so an interactive map is a great option for allowing users to zoom in on what they want.
ggmap set-up, but of course after that, there are many, many, options.
Let’s redo the bike map from earlier, and possibly the interaction will allow us to see what the issue is with the data.
In the same way as a graph in
ggplot begins with the
ggplot() function, a map in the
leaflet package begins with a call to the
leaflet() function. This allows you to specify data, and a bunch of other options such as width and height. After this, we add ‘layers’, in the same way that we added them in
ggplot. The first layer that we’ll add is a tile with the function
addTiles(). In this case, the default is from OpenStreeMap. After that we’ll add markers that show the location of each bike parking spot with
There are two options here that may not be familiar. The first is popup, and this is what happens when you click on the marker. In this example this is giving the address. The second is label, which is what happens when you hover over the marker. In this example it is given the number of spots.
Let’s have another go, this time with Ontario data on COVID-19.
We can download the latest data from the Ontario Data Catalogue. This is a fast moving situation in which they are likely to make breaking changes to this dataset. To ensure these notes work, I will save and then use the dataset as at 4 April 2020, but you are able to get the up-to-date dataset using the link and the code.
## # A tibble: 6 x 14 ## `_id` ROW_ID ACCURATE_EPISODE_D… Age_Group CLIENT_GENDER CASE_ACQUISITIO… ## <dbl> <dbl> <dttm> <chr> <chr> <chr> ## 1 1 1 2020-03-07 00:00:00 40s MALE Neither ## 2 2 2 2020-03-08 00:00:00 20s MALE Neither ## 3 3 3 2020-03-10 00:00:00 40s FEMALE Neither ## 4 4 4 2020-03-11 00:00:00 50s FEMALE Neither ## 5 5 5 2020-03-12 00:00:00 30s FEMALE Neither ## 6 6 6 2020-03-15 00:00:00 50s MALE Neither ## # … with 8 more variables: OUTCOME1 <chr>, Reporting_PHU <chr>, ## # Reporting_PHU_Address <chr>, Reporting_PHU_City <chr>, ## # Reporting_PHU_Postal_Code <chr>, Reporting_PHU_Website <chr>, ## # Reporting_PHU_Latitude <dbl>, Reporting_PHU_Longitude <dbl>
There is a lot of information here, but we’ll just plot the number of cases, by the reporting area (health areas). So this isn’t the location of the person, but the location of the responsible health unit. Because of this, we’ll add a little bit of noise so that the marker for each person can be seen. We do this with
We will introduce a different type of marker here, which is circles. This will allow us to use different colours for the outcomes of each case. There are three possible outcomes: the case is resolved, it is not resolved, or it was fatal.
library(leaflet) pal <- colorFactor("Dark2", domain = ontario_covid$OUTCOME1 %>% unique()) leaflet() %>% addTiles() %>% # Add default OpenStreetMap map tiles addCircleMarkers( data = ontario_covid, lng = ontario_covid$Reporting_PHU_Longitude, lat = ontario_covid$Reporting_PHU_Latitude, color = pal(ontario_covid$OUTCOME1), popup = paste("<b>Age-group:</b>", as.character(ontario_covid$Age_Group), "<br>", "<b>Gender:</b>", as.character(ontario_covid$CLIENT_GENDER), "<br>", "<b>Acquisition:</b>", as.character(ontario_covid$CASE_ACQUISITIONINFO), "<br>", "<b>Episode date:</b>", as.character(ontario_covid$ACCURATE_EPISODE_DATE), "<br>") ) %>% addLegend("bottomright", pal = pal, values = ontario_covid$OUTCOME1 %>% unique(), title = "Case outcome", opacity = 1 )
Thank you to Shaun Ratcliff for introducing me to mapdeck.
mapdeck is an R package that is built on top of Mapbox (https://www.mapbox.com). It is based on WebGL, which means that your web browser does a lot of work for you. The nice thing is that because of this, it can do a bunch of things that leaflet struggles with, especially dealing with larger datasets. Mapbox is a full-featured application that many businesses that you may have heard of use: https://www.mapbox.com/showcase. To close out these notes on mapping, I want to briefly touch on
mapdeck, as it is a newer, but very exciting, package.
To this point we have used stamen maps as our tile, but mapdeck uses mapbox - https://www.mapbox.com/ - and so you need to register and get a token for this. (It’s free.) Once you have that token you add it to R using:
(Don’t add it into your script otherwise everyone will be able to take it and use it, especially once you add it to GitHub.)
Then we need some data. Here we’re going to just use the example dataset, which is about flights.
# Code taken from the example: https://github.com/SymbolixAU/mapdeck library(mapdeck) url <- 'https://raw.githubusercontent.com/plotly/datasets/master/2011_february_aa_flight_paths.csv' flights <- read.csv(url) flights$info <- paste0("<b>",flights$airport1, " - ", flights$airport2, "</b>") head(flights)
## start_lat start_lon end_lat end_lon airline airport1 airport2 cnt ## 1 32.89595 -97.03720 35.04022 -106.60919 AA DFW ABQ 444 ## 2 41.97960 -87.90446 30.19453 -97.66987 AA ORD AUS 166 ## 3 32.89595 -97.03720 41.93887 -72.68323 AA DFW BDL 162 ## 4 18.43942 -66.00183 41.93887 -72.68323 AA SJU BDL 56 ## 5 32.89595 -97.03720 33.56294 -86.75355 AA DFW BHM 168 ## 6 25.79325 -80.29056 36.12448 -86.67818 AA MIA BNA 56 ## info ## 1 <b>DFW - ABQ</b> ## 2 <b>ORD - AUS</b> ## 3 <b>DFW - BDL</b> ## 4 <b>SJU - BDL</b> ## 5 <b>DFW - BHM</b> ## 6 <b>MIA - BNA</b>
Finally, we can call the map. Again, this is just the example in the package’s website.
And this is pretty nice!