Using ggmap to visualize fraud trends
In Part I of this series I showed how to get the data used in this portion. Let’s start by loading that data into R so we can map it.
df <- read.csv(file = choose.files(), header = TRUE)
Your data is stored in an R dataframe named df
. Next we’ll get David Kahle and Hadley Wickham’s ggmap
, an offshoot of Hadley Wickham’s famous R package ggplot
. The first line will install the package, the second one tells R to import it.
install.packages("ggmap") library(ggmap)
The next step is getting a map and telling ggmap
to use that in a plot.
map <- get_map('United States', zoom = 4) mapplot <- ggmap(map)
Think of mapplot
as the base layer of your plot. Now we want to use our latitude and longitude data to plot our points. We’ll tie the size of the points to the Complaints.Per.100.000.Population data.
mapplot + geom_point(data = df, aes(x = Lon, y = Lat, size = Complaints.Per.100.000.Population))
Nice! This is an interesting plot, but it would be more useful if we labelled the points. We probably shouldn’t label each point, but let’s do the top 5.
mapplot + geom_point(data = df, aes(x = Lon, y = Lat, size = Complaints.Per.100.000.Population))+ geom_text(data = df[order(-df$Complaints.Per.100.000.Population),][1:5,], aes(x = Lon[1:5], y = Lat[1:5],label = Metropolitan.Area), color = 'navyblue', fontface = 'bold', position = position_dodge(), size = 5)
Watch out next time you’re in Florida, 3 of the top 5 hot spots for fraud complaints are in the state!