SEM217: Nick Gunther, Infima: NLP-based Information Extraction Applied to MBS Offering Documents

Tuesday, April 30th @ 11:00-12:30 PM, 648 Evans Hall and Zoom

Large Language Models ("LLMs") have impressed researchers and observers with their success at classification, translation, text generation and other standard NLP tasks. Starting with word2vec in 2013 and accelerating to contemporary transformer models such as BERT, researchers have continued to discover exciting new applications and improve existing ones.

Because financial data is largely numerical and NLP’s prominent successes arose in unrelated areas, applications to finance have generally lagged outside a few bespoke areas, such as sentiment analysis for stock price prediction. This report applies contemporary NLP to a novel area that is important to the debt markets – learning the terms of complex mortgage-baked securities from their description in published offering documents, a critical task currently conducted manually and offered as a paid service to institutional investors by third-party vendors. The ultimate goal is to generate automatically from offering documents terms summaries together with executable code reflecting the detailed security terms, allowing immediate simulation, scenario investigation, stress testing and valuation.

This talk will discuss the work in process, highlighting the successes, challenges and results to date.