Do Machine Learning Methods Outperform Traditional Statistical Models in Crime Prediction? A Comparison Between Logistic Regression and Neural Networks

Na, Chongmin; Oh, Gyeongseok; Song, Juyoung; Park, Hyoungah

doi:10.52372/kjps36101

Korean J. Policy Stud. 2021; 36(1):1-13

pISSN: 1225-5017, eISSN: 2765-2807

DOI: https://doi.org/10.52372/kjps36101

Article

Do Machine Learning Methods Outperform Traditional Statistical Models in Crime Prediction? A Comparison Between Logistic Regression and Neural Networks

Chongmin Na¹^,^*, Gyeongseok Oh², Juyoung Song³, Hyoungah Park⁴

Author Information & Copyright ▼

¹Graduate School of Public Administration, Seoul National University, South Korea, E-mail: chongmin20@snu.ac.kr

²Police Science Institute, Korean National Police University, South Korea, E-mail: safecorea@police.go.kr

³Department of Administration of Justice, Penn State University, Schuylkill, PA, USA, E-mail: jxs6190@psu.edu

⁴Criminal Justice Department, Saint Perter’s University, E-mail: hpark1@saintpeters.edu

^*Corresponding author : E-mail: chongmin20@snu.ac.kr

ⓒ Copyright 2021 Graduate School of Public Administration, Seoul National University. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Oct 19, 2020; Accepted: Nov 24, 2020

Published Online: Mar 31, 2021

Abstract

Although machine learning (ML) methods have recently gained popularity in both academia and industry as alternative risk assessment tools for efficient decision-making, inconsistent patterns are observed in the existing literature regarding their competitiveness and utility in predicting various outcomes. Drawing on a sample of the general youth population in the U.S., we compared the predictive accuracy of logistic regression (LR) and neural networks (NNs), which are the most widely applied approaches in conventional statistics and contemporary ML methods, respectively, by adopting many theoretically relevant predictors of the future arrest outcome. Even after fully implementing rigorous ML protocols for model tuning and up-sampling and down-sampling procedures recommended in recent literature to optimize learning algorithms, NNs did not yield substantially improved performance over LR if we still rely on a conventional dataset with relatively small sample sizes and a limited number of predictors. Nonetheless, we encourage more rigorous, comprehensive, and diverse evaluation research for a complete understanding of the ML potential in predictive capacity and the contingencies in which modern ML methods can perform better than conventional parametric statistical models.

Keywords: Machine Learning; Prediction; Neural Networks; Logistic Regression