## Assignment

Your assignment is to create a choropleth map of an issue. It should incorperate a shapefile (such as states or zipcodes) and some data you believe it makes journalistic sense to map. You cannot use the Happiness polling data I demoed on, but you can use the state shapefile with a different Gallup question. You could also choose to go further and get zipcode shapefiles and visualize BLL data, or any other shapefile/data connection you can dream up!

Like the data visualization assignment, I would like a little text to support the map. Think of a headline/title, and 2-3 sentences explaining the lede/nutgraph of your mapped story. This can be submitted as part of the same document, or separately.

## When to make a map?

When you have geographic data! Maps are very popular in data journalism because almost all readers have some geographic context, so they can “see themselves” in the map.

## When not to make a map?

When the geographic information doesn’t add anything to the story. Matthew Ericson has some thoughts about when maps shouldn’t be maps.

## Geographic data

There are three main types: points, lines, and polygons.

• Points are useful when you’re interested in the very specific latitute/longitude location of something– where people are, where crimes occurred, etc.
• Lines are probably the least-used type, but could be used to show routes, rivers, roads, etc.
• Polygons (essentially, shapes) are used to show areas. There are many spatial polygons that get used to delineate where humans live– states, counties, school districts, voting districts, zipcodes, etc.

We’ll be focusing on polygons, but most of this stuff generalizes pretty well to the other types of geographic data.

## Getting polling data

I grabbed some polling data from Gallup for us to map. I chose the response about experiencing happiness, but you could choose a different question if one interests you more.

To get this data, go to the Gallup website. I think this only works from on-campus. If that link doesn’t work, try going to the library guide for Data Journalism and clicking through the link there.

Once you’re on the site, you can browse however you’d like. I recommend choosing:

• Find data by: Geography
• In the Geography dropdown, select “United States (All geographies/areas)”
• This will take you to a Summary page, where Gallup will recommend some metrics. You can browse here, but when you’re ready to download data…
• Select Tables (at the upper left of the screen)
• Make sure there is something in Geographies. Use the plus icon if there isn’t, and again, pick “United States (All geographies/areas)”
• Make sure there is something in Metrics. Use the plus icon if there isn’t, and select the question you’re interested in. I chose:
• Well-being
• Experienced Happiness Yesterday
• Make sure there is something in Time. I x-ed out of all the years except for 2016, to keep it simple
• Look at the preview of data, and decide whether you want Formatted or Unformatted (radio buttons in the upper right). I picked Unformatted, but it doesn’t really matter.
• Click Export!

## Getting data into RStudio

If you are working on the server, there is a three-part process to getting data into R/RStudio. You need to:

• Download
• Upload
• Load

If you followed the instructions above, you’ve downloaded the data. Now, you’ll need to upload it to the RStudio server.

• Login to rstudio.smith.edu or rstudio-dev.smith.edu
• Select the Files tab
• Click Upload, and navigate to where your downloaded data is.

## Loading data

The final step is to load the data into R. There are many ways to do this. RStudio has a convenience wizard that I like to use when I make my first attempt to load in data. I’ll often refine the code myself later, but using the wizard makes it less of a guess-and-check process.

To use this,

• click the Environment tab
• select Import Dataset
• Since the Gallup data came as an .xlsx file, we can select the From Excel option
• Look at the previewed version of the data, and adjust the choices at the bottom of the pane. I needed to skip some rows because of the header.

Here’s the code I ended up with:

library(readxl)
GallupAnalytics_Export_20180405_102840 <- read_excel("Downloads/GallupAnalytics_Export_20180405_102840.xlsx", skip=6)

Now, we have a flat, “tidy” file of people’s happiness in the States. But, it doesn’t have any polygons.

## Shapefiles

There’s one more piece of data you will need to make a map, and that’s a “shapefile” (basically, a set of outlines of whatever polygon you’re interested in).

In this case, we need the outlines of the states to be able to plot them. I got my shapefile by googling “census states shapefile.” The first result was this, so I scrolled down to State and clicked that.

There are files from every year, because some boundaries change over time (like voting districts). States are pretty stable, so it doesn’t really matter which you grab. You can choose the resolution you want, but for our purposes we don’t need anything too fine-grained. I chose 1:500,000k.

## Reading in shapefiles

Just like flat files, you need to Download, Upload, and Load your data to get R to know about it. Hopefully you’ve completed the Download part, and have a zip file.

When you Upload to RStudio, upload the zip file as-is, and RStudio should automatically un-zip it into a folder containing a bunch of similarly-name files with weird file extensions. Different spatial analysis packages use different numbers of those files, so it’s a good idea to just leave all of them there. Once you have that folder, you can load the data.

The Load part is going to be different for this special (and spatial) data type. We’ll use the old-school way, using readOGR, but there’s a new package called sf that is gaining traction as well. Here’s how readOGR works:

library(rgdal)
states_rgdal <- readOGR("Downloads/cb_2015_us_state_500k/", layer="cb_2015_us_state_500k")

It looks a little repetitive here, but that first quoted string is the filepath to my folder. I stuck my folder right in my “working directory” (check yours by running getwd() in your Console), but you can store the folder anywhere you want as long as you have the filepath correct. If I was working on my local computer, that filepath might be “/Users/amelia/Documents/cb_2015_us_state_500k” (note that filepaths are different on Windows and Mac)

The second argument to the function is the name of the files themselves. To make my life easier, I always just leave these the same as the name of the folder. Notice there’s no trailing slash here. And, even if I had used the full filepath for the first argument, I’d use the same name for this one.

## The sf way

If you’re on dev-rstudio.edu, you can use the sf package. It reads in shapefiles differently:

library(sf)
states_sf <- st_read("Downloads/cb_2015_us_state_500k/")

Notice that this looks a lot more like a “flat” dataset, but with a geometry column containing the specification for the polygon shape.

## Joining

In order to use a shapefile with another datset (like the Gallup data we found), we need to join them together.

The way you do the join depends on what datatype your shapefile is in. If you used rgdal, you have to join on the data “slot” (sort of like a variable, but it can contain a whole dataset!). Just follow this code:

library(dplyr)

states_rgdal@data <- left_join(states_rgdal@data, GallupAnalytics_Export_20180405_102840, by = c("NAME" = "Geography"))

If you used sf, it’s easier:

states_sf <- states_sf %>%
left_join(happiness, by=c("NAME" = "Geography"))

## Base plotting

Now that everything is joined, we can plot! Both datsets have some generic base plotting that works okay for checking that your data is there, but it’s not that pretty.

plot(states_rgdal)
plot(states_sf["Yes"])

## Leaflet

Leaflet is a Javascript library for interactive maps. A bunch of people worked to make an R package that works with leaflet, but you can use leaflet in many more situations (for example, if you do data visualization in d3.js, it’s easy to integrate with leaflet).

# install.packages("leaflet")
library(leaflet)

pal <- colorNumeric(
palette = "Greens",
domain = states_rgdal$Yes ) m <- leaflet(data=states_rgdal) %>% addProviderTiles("Stamen.Watercolor") %>% setView(lng = -98.35, lat = 39.8, zoom = 03) %>% addPolygons(stroke = FALSE, fillOpacity = 0.5, smoothFactor = 0.5, color =~pal(Yes) ) %>% addLegend("bottomright", pal = pal, values = ~Yes, title = "Percent of people reporting happiness", opacity = 1 ) pal <- colorNumeric( palette = "Blues", domain = states_sf$Yes
)

leaflet(data=states_sf) %>%
addProviderTiles("Stamen.Watercolor") %>%
setView(lng = -98.35, lat = 39.8, zoom = 04) %>%
addPolygons(stroke = FALSE, fillOpacity = 0.5, smoothFactor = 0.5, color =~pal(Yes)
) %>%
addLegend("bottomright", pal = pal, values = ~Yes,
title = "Percent of people reporting happiness",
opacity = 1
)

## Leaflet options

There are tons of things you can change! Lots of information is available on the RStudio page for leaflet.

### Basemap

I recommend checking out ?addProviderTiles in particular. The Stamen Toner map is a very simple, black and white basemap I like.

### Colors

The colors from RColorBrewer are based on ColorBrewer. You can see all the available palettes by using display.brewer.all().

display.brewer.all(type="seq")

### Legends

You can customize your legend– check out ?addLegend to see options. In particular, you might want to adjust the bins.

### Saving

The easiest way is probably just to “knit” your RMarkdown document. Another option could be

library(htmlwidgets)
saveWidget(m, file="m.html")