Deep Learning/Machine Learning

Kay Giesecke (Stanford, CDAR Board member), Justin Sirignano (University of Illinois at Urbana-Champaign), and Apaar Sadhwani (Stanford) authored Deep Learning for Mortgage Risk (link to research page).

Abstract: An unprecedented number of mortgage defaults in 2007 precipitated one of the greatest financial crises in recent memory.  Leading up to the crisis, many lenders, originators, servicers, rating agencies, and investors did not accurately model the risks associated with mortgages, leading to inaccurate evaluations of risk exposures. There is a pressing need for renewed research into the delinquency and prepayment behavior of mortgages.

In this project, we analyze multi-period mortgage risk at loan and pool levels using an unprecedented data set of over 120 million prime and subprime mortgages originated across the United States between 1995 and 2014, which includes the individual characteristics of each loan/borrower, monthly updates on loan performance over the life of a loan, and a number of time-varying economic variables at the zip-code level. We develop, estimate, and test dynamic machine learning models for mortgage prepayment, delinquency, and foreclosure which capture loan-to-loan correlation due to geographic proximity and exposure to common risk factors. The basic building block is a deep neural network which addresses the significantly nonlinear relationships between explanatory variables and loan performance we find in the data. Our likelihood estimators, which are based on over 3.5 billion monthly observations and are implemented using GPU parallel computing, indicate the importance for mortgage risk of local economic factors such as county unemployment rates as well as housing prices and foreclosure rates at the zip-code level. The out-of-sample predictive performance of our deep learning model is a significant improvement over that of other available models, such as logistic regression.