Presentation at ICT OPEN 2021: ‘An Optimization Method for Entity Resolution in Databases’

With a Case Study on the Cleaning of Scientific references in bibliographic databases

Dr. Emiel Caron, Dr. Ekaterini Ioannou, & Wen Xin Lin

Many databases contain ambiguous and unstructured data which makes the information it contains difficult to use for further analysis. In order for these databases to be a reliable point of reference, the data needs to be cleaned. Entity resolution focuses on disambiguating records that refer to the same entity. In this paper we propose a generic optimization method for disambiguating large databases. This method is used on a table with scientific references from the Patstat database. The table holds ambiguous information on citations to scientific references. The research method described is used to create clusters of records that refer to the same bibliographic entity. The method starts by pre-cleaning the records and extracting bibliographic labels. Next, we construct rules based on these labels and make use of the tf-idf algorithm to compute string similarities. We create clusters by means of a rule-based scoring system. Finally, we perform precision-recall analysis using a golden set of clusters and optimize our parameters with simulated annealing. Here we show that it is possible to optimize the performance of a disambiguation method using a global optimization algorithm

10 minute presentation at ICT Open 2021

Thesis project: An operational alignment model for developing business performance dashboards

Business performance dashboards are a well-known solution to support business decision making. However, developing such a dashboard is not applying an off- the-shelve solution. Questions that raise are for example: what KPIs should the dashboard contain? What data should be used? What is the most appropriate visual to display the information? What functionalities should the dashboard contain? Therefore, the aim of this research is to evolve an operational alignment model for developing business performance dashboard: Cross-industry standard process for business performance dashboards (CRISP-PD). This research presents a first exploration of this model.

Alignment between the requirements of the business and the IT solution is essential when designing a dashboard. Alignment is guaranteed in every step of the development process that is presented in this research. This is accomplished by implementing feedback mechanism between the business and IT.

First, literature is reviewed to build a theoretical framework. This review contains the concept of alignment and theory about business performance dashboards. Second, the alignment model, based on the literature, is presented. Finally, to verify the model, it is applied in a case study. The goal of developing a business performance dashboard in the case study is supporting business decision making at commercial departments.

The case study resulted in a concept of the business performance dashboard which is evaluated by the problem owners. Since developing a business performance dashboard is an iterative process, the dashboard is not finalized. Changing circumstances (e.g. market conditions, business goals, information need) can cause changes in the design of the dashboard. Therefore, several recommendations are made for further developing the dashboard.

However, this research has some limitations. First, CRISP-PD is not fully applied in the case study. Second, the case study is based on a limited data set. Therefore, basic statistical parameters could not be calculated. Third, CRISP-PD is only applied in one case study and therefore lacks reliability.

In addition, recommendations are made for future research. Since this is an operational alignment model, nothing is said about strategic and tactical performance dashboards. Therefore, in future research the alignment model should be tested for aligning strategic and tactical performance dashboards. Furthermore, the model should be validated in other case studies in order to improve the research reliability.

Yannick Visser is working on this topic.