Visualizing Geographic Trends in Identity Theft: Python, R, and ggmap (Part II)

Using ggmap to visualize fraud trends

In Part I of this series I showed how to get the data used in this portion. Let’s start by loading that data into R so we can map it.

df <- read.csv(file = choose.files(), header = TRUE)

Your data is stored in an R dataframe named df. Next we’ll get David Kahle and Hadley Wickham’s ggmap, an offshoot of Hadley Wickham’s famous R package ggplot. The first line will install the package, the second one tells R to import it.

install.packages("ggmap")

library(ggmap)

The next step is getting a map and telling ggmap to use that in a plot.

map <- get_map('United States', zoom = 4)
mapplot <- ggmap(map)

Think of mapplot as the base layer of your plot. Now we want to use our latitude and longitude data to plot our points. We’ll tie the size of the points to the Complaints.Per.100.000.Population data.

mapplot + geom_point(data = df, aes(x = Lon, y = Lat, size = Complaints.Per.100.000.Population))
U.S. Fraud Complaints

Nice! This is an interesting plot, but it would be more useful if we labelled the points. We probably shouldn’t label each point, but let’s do the top 5.

mapplot + geom_point(data = df, aes(x = Lon, y = Lat, size = Complaints.Per.100.000.Population))+
  geom_text(data = df[order(-df$Complaints.Per.100.000.Population),][1:5,], 
            aes(x = Lon[1:5], y = Lat[1:5],label = Metropolitan.Area), 
            color = 'navyblue',
            fontface = 'bold',
            position = position_dodge(), 
            size = 5)
plot_with_names

Watch out next time you’re in Florida, 3 of the top 5 hot spots for fraud complaints are in the state!