Thesis project: Models for the prediction of overdue invoices

Keeping a healthy cash flow is extremely vital for businesses of every size and kind. One of the biggest influences in keeping cash flows healthy is paying invoices on time and getting paid on time. Some researchers suggest that getting paid on time would actually prevent the collapse of over 50.000 small business every year. And this is even before the economic uncertainty caused by COVID-19.

Many companies, including multiple payment providers, reported rising payment periods during the first and second lockdown periods. To keep payment periods short and cash flows steady, companies can start paying more attention to the Accounts Receivable processes. However, deciding which clients or invoices to put time and resources in can be a tricky process. To solve the problem of determining which clients need to be contacted it needs to be determined which invoices are likely to be overdue. By applying the methodological framework CRISP-DM, different Machine Learning models were studied to predict which invoices are likely to be overdue. For building these models a dataset consisting of 290.000 invoices from a Dutch top 30 accounting firm was used.

After thoroughly executing the processes of data preparation, feature selection and applying different techniques of feature engineering and hyperparameter tuning, it is concluded that weighted Random Forest models yield the best predictive performance. When evaluating these models, historical behavior of clients is determined to be the best predictor of overdue invoices. Interestingly though, models that solely rely on client demographics without any historical behavior also predict overdue invoices relatively well. This means even for new clients without any historical record, relatively accurate predictions can also be provided.

Frank van den Berg is working on this topic. LinkedIn: