Counties are scrambled in R
I'm using ggplot2 to create a population density choropleth. It's
currently working for single states, but not for multiples. It appears
that the densities of various counties (that often have the same name) get
mixed up, and sometimes even non-name matching counties are mixed up
between states. For example, "New Jersey" gives the correct densities, but
"New Jersey", "New York" tells me that the very populous Essex County in
NJ has a density <30p/mi^2. Why is this?
library(stringr)
library(ggplot2)
library(scales)
library(maps)
popdensitymap <- function(...){
path <- "U:/maps-county2011.csv"
states <- list(...)
countydata <- read.csv(path, sep=",")
countydata <- data.frame(countydata$X, countydata$Population.Density)
names(countydata) <- c("fips", "density")
data(county.fips)
cdata <- countydata
cdata$fips <- gsub("^0", "", cdata$fips)
countyinfo <- merge(cdata, county.fips, by.x="fips", by.y="fips")
countyinfo <- data.frame(countyinfo, str_split_fixed(countyinfo$polyname,
",", 2))
names(countyinfo) <- c('fips', 'density', 'polyname', 'state', 'county')
countyshapes <- map_data("county", states)
countyshapes <- merge(countyshapes, countyinfo, by.x="subregion",
by.y="county")
choropleth <- countyshapes
choropleth <- choropleth[order(choropleth$order), ]
choropleth$density_d <- cut(choropleth$density,
breaks=c(0,30,100,300,500,1000,3000,5000,100000))
state_df <- map_data("state", states)
density_d <- choropleth$density_d
choropleth <- choropleth[choropleth$state %in% tolower(states),]
p <- ggplot(choropleth, aes(long, lat, group=group))
p <- p + geom_polygon(aes(fill=density_d), colour=alpha("white", 1/2),
size=0.2)
p <- p + geom_polygon(data = state_df, colour="black", fill = NA)
p <- p + scale_fill_brewer(palette="PuRd")
p
}
To use,
popdensitymap("New Jersey")
popdensitymap("New York", "New Jersey")
Here is the csv. It is very ugly, but I do not have access to file sharing
system right now.
Here is an example of the output. As you can see, the extremely populous
Essex County by New York City is inaccurately represented.
No comments:
Post a Comment