Korean Journal of Policy Studies
Graduate School of Public Administration, Seoul National University
Article

Do Machine Learning Methods Outperform Traditional Statistical Models in Crime Prediction? A Comparison Between Logistic Regression and Neural Networks

Chongmin Na1,*, Gyeongseok Oh2, Juyoung Song3, Hyoungah Park4
1Graduate School of Public Administration, Seoul National University, South Korea, E-mail: chongmin20@snu.ac.kr
2Police Science Institute, Korean National Police University, South Korea, E-mail: safecorea@police.go.kr
3Department of Administration of Justice, Penn State University, Schuylkill, PA, USA, E-mail: jxs6190@psu.edu
4Criminal Justice Department, Saint Perter’s University, E-mail: hpark1@saintpeters.edu
*Corresponding author : E-mail: chongmin20@snu.ac.kr

ⓒ Copyright 2021 Graduate School of Public Administration, Seoul National University. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Oct 19, 2020; Accepted: Nov 24, 2020

Published Online: Mar 31, 2021

Abstract

Although machine learning (ML) methods have recently gained popularity in both academia and industry as alternative risk assessment tools for efficient decision-making, inconsistent patterns are observed in the existing literature regarding their competitiveness and utility in predicting various outcomes. Drawing on a sample of the general youth population in the U.S., we compared the predictive accuracy of logistic regression (LR) and neural networks (NNs), which are the most widely applied approaches in conventional statistics and contemporary ML methods, respectively, by adopting many theoretically relevant predictors of the future arrest outcome. Even after fully implementing rigorous ML protocols for model tuning and up-sampling and down-sampling procedures recommended in recent literature to optimize learning algorithms, NNs did not yield substantially improved performance over LR if we still rely on a conventional dataset with relatively small sample sizes and a limited number of predictors. Nonetheless, we encourage more rigorous, comprehensive, and diverse evaluation research for a complete understanding of the ML potential in predictive capacity and the contingencies in which modern ML methods can perform better than conventional parametric statistical models.

Keywords: Machine Learning; Prediction; Neural Networks; Logistic Regression