github

Our Codebase

Repository link pending

EDGAR Data Wrangling
Download, process, and wrangle the EDGAR data from Huggingface
Label Creation and Matching
Before doing any EDA or Modelling we need to match the 10-Ks from EDGAR to bankruptcy filings in the following year
Fine-tuning
The bulk of our predictions come from a fine-tuned BERT model pre-trained on financial documents (finBERT, Yang 2020)
Metrics and Evaluation
We evaluated our results on a few different metrics, across different thresholds, ratios and evaluation methods