Metropolitan City Crime Analysis

City of Albuquerque

September 2018 to March 2019

Data Information

In this project, the performance and predictive power of a model that has been trained and tested on data collected from Albuquerque police incident calls. A model trained on this data that is seen as a good fit could then be used to make certain predictions about crime in Albuquerque, New Mexico. This model may be useful for law enforcement to aid in data-driven decision making. The datasets for this project originates from The City of Albuquerque, New Mexico and The United States Census.

1. The City of Albuquerque This dataset represents the prior 180 rolling days of police incident calls and contains the block location, case number description and date of calls for service received by The Albuquerque Police Department. The incidents have been entered into the Computer Aided Dispatch (CAD) system and closed. This dataset contains 180 rolling days of incidents (September 2018 through March 2019). Accompanying the Incidents table is a codes table for describing each type of incident. No personally identifiable information (PII) is released.

http://data.cabq.gov/publicsafety/policeincidents/

2. The United States Census This 2013-2017 American Community Survey 5-Year Estimates dataset originates from The United States Census and contains demographic information per each census tract in Bernalillo County, New Mexico. Two tables were used in this project: Table S0101 - AGE AND SEX and S1903 - MEDIAN INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS).

Table S0101 - AGE AND SEX and S1903: https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t

Table S1903 - MEDIAN INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS): https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t

Data Preprocessing

For the purposes of this project, the following preprocessing steps have been made to the dataset:

  • Every incident included in the dataset had a Pseudo-Mercator coordinate that was transformed into the WGS84 in order to align with geography points within the city of Albuquerque.

  • Each incident has been associated with a United States census tract, median age and median income.

  • Dates were originally provided in milliseconds and were transformed to a calendar date and time for ease of visualization.

In [1]:
#Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

#Hiding the warnings
import warnings
warnings.filterwarnings('ignore')

# these libraries are essential for rendering maps
from IPython.display import IFrame, display, HTML, Image


# Notebook display
%matplotlib inline

# Import supplementary visualizations code visuals.py
import visuals as vs

incidents= pd.read_csv('FINAL_MERGED_FILE_5_20_19.csv', low_memory=False)
crime_counts=pd.read_csv('CENSUS_CRIME_COUNTS.csv')

features = crime_counts[['MED_AGE','MED_INCOME']]
number_per_tract=crime_counts['COUNTS']

age=incidents['HC01_EST_VC37_x']
income=incidents['HC03_EST_VC02']

# view shape of dataset
print("Incidents dataset has {} data points with {} variables each.".format(*incidents.shape))

# Display the first record
display(incidents.head(n=1))
Incidents dataset has 28886 data points with 718 variables each.
RECORDIDNUMBER BlockAddress State IncidentType Date_Milliseconds Date_Time Day_of_Week Month Day_of_Month Year ... HC02_EST_VC52 HC02_MOE_VC52 HC03_EST_VC52 HC03_MOE_VC52 HC01_EST_VC53 HC01_MOE_VC53 HC02_EST_VC53 HC02_MOE_VC53 HC03_EST_VC53 HC03_MOE_VC53
0 41625857 MONTGOMERY BL NE / EUBANK BL NE NM DISTURBANCE 1.540000e+12 10/22/2018 23:13 Monday October 22 2018 ... 23.3 9.7 46875 23476 13.0 12.0 5.9 5.2 - **

1 rows × 718 columns

Data Exploration

In this section we would like to gain an understanding of the type of incidents contained within this dataset as well as the frequency.

The purpose of this section is to provide initial investigation and to provide observations.

The goal of this project is to develop a working model that has the ability to predict crime incidents in Albuquerque, the dataset will be separated into features and a target variable.

The features are: 'MED_AGE' (Median Age) and 'MED_INCOME' (Median Income) will give useful information about each data point. The target variable, 'COUNTS', will be the variable we seek to predict, which is the number of incidents per census tract.

Overall Investigation into data

This is done to provide a first glance at the data, for an initial view. The only columns in which statistics are helpful are 'MED_AGE' and 'MED_Income'

In [661]:
crime_counts.describe()
Out[661]:
CENSUS_TRACT MED_AGE MED_INCOME COUNTS LONGITUDE LATITUDE LONG_CENSUS CENSUS STATE_ID
count 150.000000 150.000000 150.000000 150.000000 150.000000 150.000000 150.000000 150.000000 150.000000
mean 213.637933 38.962667 54470.280000 192.566667 -103.748722 35.128191 21363.793333 213.637933 75.500000
std 1317.718534 7.246740 23290.457153 198.275533 24.530972 0.055839 131771.853399 1317.718534 43.445368
min 1.070000 20.200000 18306.000000 1.000000 -106.753181 34.939182 107.000000 1.070000 1.000000
25% 7.047500 33.500000 37710.500000 54.250000 -106.665293 35.091717 704.750000 7.047500 38.250000
50% 34.505000 38.650000 48765.500000 124.500000 -106.582616 35.121983 3450.500000 34.505000 75.500000
75% 46.850000 42.975000 66326.500000 257.250000 -106.516961 35.169194 4685.000000 46.850000 112.750000
max 9407.000000 60.400000 140833.000000 941.000000 106.568750 35.248188 940700.000000 9407.000000 150.000000

Income and Age

In this particular section of exploration, we see that the majority of individuals are under the age of 55, with incomes less than $60,000 annually.

The minimum age associated with this dataset is 20, and minimum income is $18,306 annually. The maximum age associated with our dataset is 60, and maximum income is over one hundred and forty thousand annually (140,000).

In [662]:
#Income

# Minimum median income
min_med_income = np.min(income)

# Maximum median income
max_med_income = np.max(income)

# Mid point median income 
mid_med_income= (min_med_income + max_med_income  // 2)

# Standard deviation of median income 
std_med_income = np.std(income)

#Age

# Minimum median age
min_med_age= np.min(age)

# Maximum median age
max_med_age = np.max(age)

# Mid point median age
mid_med_age = (min_med_age + max_med_age  // 2)

# Standard deviation of median age
std_med_age = np.std(age)


# Show the calculated statistics
print("Statistics for Metropolitan City Crime Analysis:\n")
print("Minimum Median Income: ${}".format(min_med_income))
print("Minimum Median Age: {}".format(min_med_age))

print("Maximum Median Income: ${}".format(max_med_income))
print("Maximum Median Age: {}".format(max_med_age))

print("Mid-point of Median Income: ${}".format(mid_med_income))
print("Mid-point of Median Age: {}".format(mid_med_age))

print("Standard Deviation of Median Income: ${}".format(std_med_income))
print("Standard Deviation of Median Age: {}".format(std_med_age))
Statistics for Metropolitan City Crime Analysis:

Minimum Median Income: $18306.0
Minimum Median Age: 20.2
Maximum Median Income: $140833.0
Maximum Median Age: 60.4
Mid-point of Median Income: $88722.0
Mid-point of Median Age: 50.2
Standard Deviation of Median Income: $17945.913627339538
Standard Deviation of Median Age: 5.672234613036816
In [663]:
plt.figure(figsize=(8,6))
sns.distplot(incidents.HC01_EST_VC37_x.dropna(), color="darkblue")
plt.xlabel("Median Age")
plt.title("Median Age Distribution for Albuquerque Incidents") 
plt.show()

plt.figure(figsize=(8,6))
sns.distplot(incidents.HC03_EST_VC02.dropna(), color="black")
plt.xlabel("Median Income")
plt.title("Median Income Distribution for Albuquerque Incidents") 
plt.show()

Incidents

This dataset contains 49 unique incident types. The highest frequency of incident types are provided in a visualization below.

In [664]:
#This is the unique count of Incident types contained within this dataset
unique_incidents=incidents.IncidentType.nunique()

print ("The Incidents dataset contains {} unique Incident Types.".format(unique_incidents))
The Incidents dataset contains 49 unique Incident Types.
In [665]:
#Create graph to view frequency of incidents by incident type
sns.set(style="darkgrid", )
plt.figure(figsize=(19,15))
sns.countplot(y="IncidentType", data=incidents,
                 palette="Purples_d", order = incidents['IncidentType'].value_counts().index)
plt.xlabel("Frequency Counts")
plt.ylabel("Incidents")
plt.title("Albuquerque Crime Incidents by Type") 
plt.show()
In [666]:
incidents['IncidentType'].value_counts()
Out[666]:
SUSP PERS/VEHS      6039
DISTURBANCE         4930
TRAFFIC STOP        4320
TRAFF ACC NO INJ    1902
FAMILY DISPUTE      1407
THEFT/FRAUD/EMBE    1292
DIRECT TRAFFIC      1093
ONSITE SUSPICIOU     983
BURGLARY AUTO        682
TRAFF ACC INJURI     554
VANDALISM            538
AUTO THEFT           514
SHOTS FIRED          490
ONSITE TRAFFIC       404
AGGR ASSAULT/BAT     398
WANTED PERSON        374
STOLEN VEH FOUND     290
BURGLARY RES         288
SHOPLIFTING          276
LOUD MUSIC           264
MISSING PERSON       240
BURGLARY COMM        184
LOUD PARTY           167
DRUNK DRIVER         133
FORGERY/CC/CHECK     130
FIGHT INPROGRESS     110
NEIGHBOR TROUBLE     105
RESCUE CALL           99
PRISONER PU/INCU      97
ONSITE DISTURBAN      72
AGGR DRIVER           69
ESCORT                57
ARMED ROB COMM        53
ANIMAL CALL           49
ONSITE AUTO THEF      42
NARCOTICS             40
ARMED ROB INDIV       33
SHOOTING              28
WARMUP VEH THEFT      27
THEFT/METAL           27
FIRE CALL             24
AUTO/CAR JACKING      23
BURGLARY              17
PROWLER               11
SUSP/INTOX PERS        4
KID/ABDUCT/HOSTA       3
DEMONSTRATION          2
DRUNK                  1
ROBBERY                1
Name: IncidentType, dtype: int64

Year

When observing the overall incidents for the city of Albuquerque from October 2018 through March 2019, it appears that the number of incidents was far more frequent during the year of 2018. After looking into the data more, the number of incidents was significantly higher in October than in any other month within the time period. In 2018, incidents occurred frequently on Tuesdays, Wednesdays and Saturdays. In 2019, incidents occurred most frequently on Friday and Saturday.

In [667]:
#Create a graph to view overall incidents by year
sns.set(style="darkgrid",)
plt.figure(figsize=(8,6))
sns.countplot(x="Year", color="darkslateblue", data=incidents)
plt.xlabel("Year")
plt.ylabel("Frequency Counts")
plt.title("Albuquerque Incidents by Year") 
plt.show()
In [668]:
#Create a graph to view incidents by year and month
sns.set(style="darkgrid")
sns.catplot("Month", col="Year", data=incidents, kind="count",color="grey", order=["October", "November", "December", "January", "February", "March"]);

Month

As noted above, October 2018 had the highest number of incidents. When viewing incidents by type, the majority occurred in the month of October. The highest contributing incidents were Disturbance, Suspicious Persons and Traffic Stop.

In [669]:
#Create graph to view incidents by month
sns.set(style="darkgrid",)
plt.figure(figsize=(8,6))
sns.countplot(x="Month", palette="GnBu_d", data=incidents, order=["October", "November", "December", "January", "February", "March"])
plt.xlabel("Month")
plt.ylabel("Frequency Counts")
plt.title("Albuquerque Incidents by Month") 
plt.show()

We would like to investigate what day in the month of October experienced the highest incidents. We suspect that the highest number of incidents occurred during Balloon Fiesta or Halloween.

In [670]:
#Subset the Month of October into a dataframe
October=incidents[incidents['Month']=='October']
In [671]:
#Create a graph to view the days of acticity in October
plt.figure(figsize=(8,6))
sns.countplot(October.Day_of_Month, color="black")
plt.xlabel("Days of Month")
plt.title("October 2018 Incident Activity") 
plt.show()

This graph confirms the suspicion that the highest activities occurred during Tuesday, October 2nd 2018 and Monday, October 8th 2018. October 8th was a holiday (Columbus Day or Indigenous Day), and equated to a three day weekend for some individuals residing in Albuquerque. Also, October 8th was at the start of The International Balloon Fiesta. This event attracts tourists and visiting relatives. The factors that led to the increase in incidents on October 2nd is unknown at this time. We suspect perhaps there was an increase of individuals visiting town in preparation for the Balloon Fiesta at that time.

In [672]:
#Create graph to view incidents by type
sns.set(style="darkgrid")
sns.catplot("Month", col="IncidentType", data=incidents, palette= "rocket", kind="count",col_wrap=3, order=["October", "November", "December", "January", "February", "March"]);

Day of Week

These are the frequencies of incident types by days of week. The majority of incident types occur most frequently on the weekends. Interestingly enough, Incident Type-Traffic Accident without Injury, occurs during the week most likely during commutes to or from work.

In [673]:
sns.set(style="darkgrid")
sns.catplot("Day_of_Week", col="IncidentType", data=incidents, palette="rocket", kind="count",col_wrap=3, order=["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]);