Overview

In this mini session, we will learn how to read in spatial data using the sf package, learn about the proporties of spatial data, and conduct a mini analysis using NYS county data and Census data from the tidycensus package. We will also create some maps with ggplot2 and mapview.

Required packages

#install.packages("tidyverse")
#install.packages("tigris")
#install.packages("tidycensus")
#install.packages("sf")
#install.packages("spData")
#install.packages("mapview")

Reading in data

  • read in vector data with the sf package and st_read() function
  • read in raster data using the raster package and either the raster() for single-band rasters or brick() for multi-band rasters

Here we will use the data from NYS GIS dataset to read in a NYS county shapefile.

In the sf package:

  • spatial objects are stored as dataframes with some special properties
  • sf dataframes include spatial metadata like the coordinate reference system
  • geometry is stored in a list column
  • tidyverse tools can be used with sf dataframes

using st_read

library(sf)

counties <- st_read("X:/spatial/NYS_Civil_Boundaries_SHP/Counties.shp")
## Reading layer `Counties' from data source `X:\spatial\NYS_Civil_Boundaries_SHP\Counties.shp' using driver `ESRI Shapefile'
## Simple feature collection with 62 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 105570.3 ymin: 4480951 xmax: 779932.1 ymax: 4985476
## epsg (SRID):    26918
## proj4string:    +proj=utm +zone=18 +datum=NAD83 +units=m +no_defs
counties
## Simple feature collection with 62 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 105570.3 ymin: 4480951 xmax: 779932.1 ymax: 4985476
## epsg (SRID):    26918
## proj4string:    +proj=utm +zone=18 +datum=NAD83 +units=m +no_defs
## First 10 features:
##           NAME ABBREV GNIS_ID POP1990 POP2000 POP2010 NYC     SP_ZONE
## 1       Albany   ALBA  974099  292594  294565  304204   N        East
## 2     Allegany   ALLE  974100   50470   49927   48946   N        West
## 3        Bronx   BRON  974101 1203789 1332650 1385108   Y Long Island
## 4       Broome   BROO  974102  212160  200536  200600   N     Central
## 5  Cattaraugus   CATT  974103   84234   83955   80317   N        West
## 6       Cayuga   CAYU  974104   82313   81963   80026   N     Central
## 7   Chautauqua   CHAU  974105  141895  139750  134905   N        West
## 8      Chemung   CHEM  974106   95195   91070   88830   N     Central
## 9     Chenango   CHEN  974107   51768   51401   50477   N     Central
## 10     Clinton   CLIN  974108   85969   79894   82128   N        East
##    DOS_LL DOSLL_DATE    DATEMOD COUNTYFIPS   SWIS  CALC_SqMi
## 1    <NA>       <NA> 2017-11-10        001 010000  532.79178
## 2    <NA>       <NA> 2018-10-04        003 020000 1035.21639
## 3    <NA>       <NA>       <NA>        005 600000   57.47109
## 4    <NA>       <NA> 2018-07-18        007 030000  715.28820
## 5    <NA>       <NA> 2017-08-01        009 040000 1324.28973
## 6    <NA>       <NA> 2018-07-18        011 050000  881.82489
## 7    <NA>       <NA> 2017-12-05        013 060000 1507.73487
## 8    <NA>       <NA> 2018-10-02        015 070000  410.75127
## 9    <NA>       <NA> 2018-10-03        017 080000  897.81864
## 10   <NA>       <NA> 2018-10-04        019 090000 1116.82523
##                          geometry
## 1  MULTIPOLYGON (((605729 4737...
## 2  MULTIPOLYGON (((229573.9 47...
## 3  MULTIPOLYGON (((595540.7 45...
## 4  MULTIPOLYGON (((428899.3 46...
## 5  MULTIPOLYGON (((169747.3 47...
## 6  MULTIPOLYGON (((369644.2 47...
## 7  MULTIPOLYGON (((161319.8 47...
## 8  MULTIPOLYGON (((357070.3 46...
## 9  MULTIPOLYGON (((464936.4 47...
## 10 MULTIPOLYGON (((629506.7 49...

The short report printed mentions that there are 62 features (records, represented as rows) and 14 fields (attributes, represented as columns). The defined CRS in this data is the Universal Transverse Mercator (UTM), which uses a 2-dimensional Cartesian coordinate system to give locations on the surface of the Earth.

using tigris package

library(tigris)

nys <- counties("New York", class= "sf")

nys2 <- counties("New York") #Output as Large SpatialPolygonsDataFrame (more complex, harder to decompose)

str(nys2) #compare to sf objects 

Methods of sf objects are:

methods(class = "sf")
##  [1] $<-                   [                     [[<-                 
##  [4] aggregate             as.data.frame         cbind                
##  [7] coerce                dbDataType            dbWriteTable         
## [10] filter                identify              initialize           
## [13] merge                 plot                  print                
## [16] rbind                 show                  slotsFromS3          
## [19] st_agr                st_agr<-              st_area              
## [22] st_as_sf              st_bbox               st_boundary          
## [25] st_buffer             st_cast               st_centroid          
## [28] st_collection_extract st_convex_hull        st_coordinates       
## [31] st_crop               st_crs                st_crs<-             
## [34] st_difference         st_geometry           st_geometry<-        
## [37] st_intersection       st_is                 st_line_merge        
## [40] st_nearest_points     st_node               st_point_on_surface  
## [43] st_polygonize         st_precision          st_segmentize        
## [46] st_set_precision      st_simplify           st_snap              
## [49] st_sym_difference     st_transform          st_triangulate       
## [52] st_union              st_voronoi            st_wrap_dateline     
## [55] st_write              st_zm                
## see '?methods' for accessing help and source code

Simple Plot

plot(counties[4], reset = FALSE) # reset = FALSE: we want to add to a plot with a legend
plot(counties[1,1], col = 'grey', add = TRUE) #color first county as grey

Using functions in sf package

  • st_area() returns the area of your features
  • st_length() returns the length of your features
  • st_crs() to assign a CRS to the data
  • st_transform() to transform CRS to match CRS for different layers
library(tidyverse)


#compute areas of counties
areas <- st_area(counties)

summary(areas)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 8.742e+07 1.287e+09 2.094e+09 2.279e+09 3.233e+09 7.300e+09
class(areas)
## [1] "units"
# Create a quick histogram of the areas using hist
hist(areas, xlim = c(0,  7.300e+09 ), breaks = 30)

#dplyr example
big_counties <- counties %>% filter(unclass(areas) > 3.233e+09)

plot(st_geometry(big_counties))

Note that the result of functions like st_area() and st_length() will not be a traditional vector. Instead the result has a class of units which means the vector result is accompanied by metadata describing the object’s units.

Loading data from spData

library("spData")

#get US-states data 
data(us_states)

class(us_states)
## [1] "sf"         "data.frame"
# Plot the us_states object using all defaults
plot(us_states)

# Plot just the total_pop_10 attribute of the us_states data
plot(us_states["total_pop_10"])

# Create a new object of just the us_states geometry
states_geo <- st_geometry(us_states)

# Plot the geometry of the us_states data
plot(states_geo)

Using tidycensus

The United States Census Bureau has a number of different datasets that are available to anyone for free. tidycensus is an R package that allows users to interface with the US Census Bureau’s decennial Census and five-year American Community APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.

API keys can be optained through (Census API website) [https://api.census.gov/data/key_signup.html].

#get data from 2010 decennial Census
state <- get_decennial(geography = "state", variables = "P001001")

state
## # A tibble: 52 x 4
##    GEOID NAME                 variable    value
##    <chr> <chr>                <chr>       <dbl>
##  1 01    Alabama              P001001   4779736
##  2 02    Alaska               P001001    710231
##  3 04    Arizona              P001001   6392017
##  4 05    Arkansas             P001001   2915918
##  5 06    California           P001001  37253956
##  6 08    Colorado             P001001   5029196
##  7 09    Connecticut          P001001   3574097
##  8 10    Delaware             P001001    897934
##  9 11    District of Columbia P001001    601723
## 10 12    Florida              P001001  18801310
## # ... with 42 more rows
#Get a dataset of median home values from the 2017 1-year ACS 
state_home <- get_acs(geography = "state",
                     variables = "B25077_001",
                     survey = "acs1",
                     geometry = TRUE,
                     shift_geo = TRUE)

Variable search within tidycensus

To find Census variable IDs, use:

  • Online resources like Census Reporter
  • Built-in variable searching in tidycensus
v17 <- load_variables(year = 2017,
           dataset = "acs1",
           cache = TRUE)

# Filter for table B19001
filter(v17, str_detect(name, "B19001"))
## # A tibble: 170 x 3
##    name     label                 concept                                  
##    <chr>    <chr>                 <chr>                                    
##  1 B19001_~ Estimate!!Total       HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  2 B19001_~ Estimate!!Total!!Les~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  3 B19001_~ Estimate!!Total!!$10~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  4 B19001_~ Estimate!!Total!!$15~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  5 B19001_~ Estimate!!Total!!$20~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  6 B19001_~ Estimate!!Total!!$25~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  7 B19001_~ Estimate!!Total!!$30~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  8 B19001_~ Estimate!!Total!!$35~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
##  9 B19001_~ Estimate!!Total!!$40~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
## 10 B19001_~ Estimate!!Total!!$45~ HOUSEHOLD INCOME IN THE PAST 12 MONTHS (~
## # ... with 160 more rows
# Use median value to search for related variables
filter(v17, str_detect(label, fixed("median value", 
                                ignore_case = TRUE)))
## # A tibble: 23 x 3
##    name     label                                  concept                 
##    <chr>    <chr>                                  <chr>                   
##  1 B25077_~ Estimate!!Median value (dollars)       MEDIAN VALUE (DOLLARS)  
##  2 B25083_~ Estimate!!Median value (dollars)       MEDIAN VALUE (DOLLARS) ~
##  3 B25097_~ Estimate!!Median value!!Total          MORTGAGE STATUS BY MEDI~
##  4 B25097_~ Estimate!!Median value!!Total!!Median~ MORTGAGE STATUS BY MEDI~
##  5 B25097_~ Estimate!!Median value!!Total!!Median~ MORTGAGE STATUS BY MEDI~
##  6 B25107_~ Estimate!!Median value!!Total          MEDIAN VALUE BY YEAR ST~
##  7 B25107_~ Estimate!!Median value!!Total!!Built ~ MEDIAN VALUE BY YEAR ST~
##  8 B25107_~ Estimate!!Median value!!Total!!Built ~ MEDIAN VALUE BY YEAR ST~
##  9 B25107_~ Estimate!!Median value!!Total!!Built ~ MEDIAN VALUE BY YEAR ST~
## 10 B25107_~ Estimate!!Median value!!Total!!Built ~ MEDIAN VALUE BY YEAR ST~
## # ... with 13 more rows

tidyverse example

# Map through ACS1 estimates to see how they change through the years

ny_cities <- map_df(2012:2016, function(x) {
  get_acs(geography = "place", 
          variables = c(totalpop = "B01003_001"),  #total population
          state = "NY", 
          survey = "acs1", 
          year = x) %>%
    mutate(year = x)
})

ny_cities %>% arrange(NAME, year)
## # A tibble: 53 x 6
##    GEOID   NAME                    variable estimate   moe  year
##    <chr>   <chr>                   <chr>       <dbl> <dbl> <int>
##  1 3601000 Albany city, New York   totalpop    97905    26  2012
##  2 3601000 Albany city, New York   totalpop    98441    59  2013
##  3 3601000 Albany city, New York   totalpop    98566    26  2014
##  4 3601000 Albany city, New York   totalpop    98452    63  2015
##  5 3601000 Albany city, New York   totalpop    98106    53  2016
##  6 3608026 Brentwood CDP, New York totalpop    68580  6910  2014
##  7 3608026 Brentwood CDP, New York totalpop    63792  7108  2015
##  8 3608026 Brentwood CDP, New York totalpop    62132  6543  2016
##  9 3611000 Buffalo city, New York  totalpop   259386    32  2012
## 10 3611000 Buffalo city, New York  totalpop   258945    66  2013
## # ... with 43 more rows

Mapping data

#simple choropleth plot of median home values
plot(state_home["estimate"])

#make a choropleth map with ggplot2
ggplot(state_home, aes(fill = estimate)) + geom_sf()

#use the viridis palettes for choropleth mapping
ggplot(state_home, aes(fill = estimate, col = estimate)) + 
  geom_sf() + 
  scale_fill_viridis_c() +  
  scale_color_viridis_c()

ggplot(state_home, aes(fill = estimate, color = estimate)) + 
  geom_sf() + 
  scale_fill_viridis_c(labels = scales::dollar) +  
  scale_color_viridis_c(guide = FALSE) + 
  theme_minimal() + 
  coord_sf(crs = 26911, datum = NA) + 
  labs(title = "Median owner-occupied housing value by state", 
       subtitle = "USA", 
       caption = "Data source: 2017 ACS.\nData acquired with the R tidycensus package.", 
       fill = "ACS estimate")

Graduated symbol maps

#generate point centers for each US state
centers <- st_centroid(state_home)

#create graduated symbol map
ggplot() + 
  geom_sf(data = state_home, fill = "white") + 
  geom_sf(data = centers, aes(size = estimate), shape = 21, 
          fill = "lightblue", alpha = 0.7, show.legend = "point") + 
  scale_size_continuous(range = c(1, 20)) #the scale_size_continuous() function adjusts the range of sizes on the map.

Interactive maps with mapview

library(mapview)

m <- mapview(state_home, 
         zcol = "estimate")
m@map

References: