{"id":274,"date":"2021-02-14T03:51:09","date_gmt":"2021-02-14T01:51:09","guid":{"rendered":"https:\/\/emielcaron.nl\/?p=274"},"modified":"2021-02-14T20:14:09","modified_gmt":"2021-02-14T18:14:09","slug":"presentation-at-ict-open-2021-an-optimization-method-for-entity-resolution-in-databases","status":"publish","type":"post","link":"https:\/\/emielcaron.nl\/?p=274","title":{"rendered":"Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;"},"content":{"rendered":"\n<p><em>With a Case Study on the Cleaning of Scientific references in bibliographic databases<\/em><\/p>\n\n\n\n<p>Dr. Emiel Caron, Dr. Ekaterini Ioannou, &amp; Wen Xin Lin<\/p>\n\n\n\n<p>Many databases contain ambiguous and unstructured data which makes the information it contains difficult to use for further analysis. In order for these databases to be a reliable point of reference, the data needs to be cleaned. Entity resolution focuses on disambiguating records that refer to the same entity. In this paper we propose a generic optimization method for disambiguating large databases. This method is used on a table with scientific references from the Patstat database. The table holds ambiguous information on citations to scientific references. The research method described is used to create clusters of records that refer to the same bibliographic entity. The method starts by pre-cleaning the records and extracting bibliographic labels. Next, we construct rules based on these labels and make use of the tf-idf algorithm to compute string similarities. We create clusters by means of a rule-based scoring system. Finally, we perform precision-recall analysis using a golden set of clusters and optimize our parameters with simulated annealing. Here we show that it is possible to optimize the performance of a disambiguation method using a global optimization algorithm<\/p>\n\n\n\n<figure class=\"wp-block-video\"><video height=\"1080\" style=\"aspect-ratio: 1680 \/ 1080;\" width=\"1680\" controls poster=\"https:\/\/emielcaron.nl\/wp-content\/uploads\/2021\/02\/ICT_open.jpg\" src=\"https:\/\/emielcaron.nl\/wp-content\/uploads\/2021\/02\/Presentation_ICT_OPEN_2021.mp4\"><\/video><figcaption>10 minute presentation at ICT Open 2021<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>With a Case Study on the Cleaning of Scientific references in bibliographic databases Dr. Emiel Caron, Dr. Ekaterini Ioannou, &amp; Wen Xin Lin Many databases contain ambiguous and unstructured data which makes the information it contains difficult to use for further analysis. In order for these databases to be a reliable point of reference, the &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/emielcaron.nl\/?p=274\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-274","post","type-post","status-publish","format-standard","hentry","category-research"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Presentation at ICT OPEN 2021: &#039;An Optimization Method for Entity Resolution in Databases&#039; - Emiel Caron<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/emielcaron.nl\/?p=274\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Presentation at ICT OPEN 2021: &#039;An Optimization Method for Entity Resolution in Databases&#039; - Emiel Caron\" \/>\n<meta property=\"og:description\" content=\"With a Case Study on the Cleaning of Scientific references in bibliographic databases Dr. Emiel Caron, Dr. Ekaterini Ioannou, &amp; Wen Xin Lin Many databases contain ambiguous and unstructured data which makes the information it contains difficult to use for further analysis. In order for these databases to be a reliable point of reference, the &hellip; Continue reading &quot;Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/emielcaron.nl\/?p=274\" \/>\n<meta property=\"og:site_name\" content=\"Emiel Caron\" \/>\n<meta property=\"article:published_time\" content=\"2021-02-14T01:51:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-02-14T18:14:09+00:00\" \/>\n<meta name=\"author\" content=\"Emiel Caron\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Emiel Caron\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/?p=274#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/?p=274\"},\"author\":{\"name\":\"Emiel Caron\",\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/#\\\/schema\\\/person\\\/992b3c38031ce991eef0e83dd12e11cd\"},\"headline\":\"Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;\",\"datePublished\":\"2021-02-14T01:51:09+00:00\",\"dateModified\":\"2021-02-14T18:14:09+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/?p=274\"},\"wordCount\":222,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/#\\\/schema\\\/person\\\/992b3c38031ce991eef0e83dd12e11cd\"},\"articleSection\":[\"Research projects\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/emielcaron.nl\\\/?p=274#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/?p=274\",\"url\":\"https:\\\/\\\/emielcaron.nl\\\/?p=274\",\"name\":\"Presentation at ICT OPEN 2021: 'An Optimization Method for Entity Resolution in Databases' - Emiel Caron\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/#website\"},\"datePublished\":\"2021-02-14T01:51:09+00:00\",\"dateModified\":\"2021-02-14T18:14:09+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/?p=274#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/emielcaron.nl\\\/?p=274\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/?p=274#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/emielcaron.nl\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/#website\",\"url\":\"https:\\\/\\\/emielcaron.nl\\\/\",\"name\":\"Emiel Caron\",\"description\":\"PhD, Lecturer &amp; Researcher in Business Intelligence &amp; Analytics, Data science\",\"publisher\":{\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/#\\\/schema\\\/person\\\/992b3c38031ce991eef0e83dd12e11cd\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/emielcaron.nl\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/emielcaron.nl\\\/#\\\/schema\\\/person\\\/992b3c38031ce991eef0e83dd12e11cd\",\"name\":\"Emiel Caron\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g\",\"caption\":\"Emiel Caron\"},\"logo\":{\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Presentation at ICT OPEN 2021: 'An Optimization Method for Entity Resolution in Databases' - Emiel Caron","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/emielcaron.nl\/?p=274","og_locale":"en_US","og_type":"article","og_title":"Presentation at ICT OPEN 2021: 'An Optimization Method for Entity Resolution in Databases' - Emiel Caron","og_description":"With a Case Study on the Cleaning of Scientific references in bibliographic databases Dr. Emiel Caron, Dr. Ekaterini Ioannou, &amp; Wen Xin Lin Many databases contain ambiguous and unstructured data which makes the information it contains difficult to use for further analysis. In order for these databases to be a reliable point of reference, the &hellip; Continue reading \"Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;\"","og_url":"https:\/\/emielcaron.nl\/?p=274","og_site_name":"Emiel Caron","article_published_time":"2021-02-14T01:51:09+00:00","article_modified_time":"2021-02-14T18:14:09+00:00","author":"Emiel Caron","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Emiel Caron","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/emielcaron.nl\/?p=274#article","isPartOf":{"@id":"https:\/\/emielcaron.nl\/?p=274"},"author":{"name":"Emiel Caron","@id":"https:\/\/emielcaron.nl\/#\/schema\/person\/992b3c38031ce991eef0e83dd12e11cd"},"headline":"Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;","datePublished":"2021-02-14T01:51:09+00:00","dateModified":"2021-02-14T18:14:09+00:00","mainEntityOfPage":{"@id":"https:\/\/emielcaron.nl\/?p=274"},"wordCount":222,"commentCount":0,"publisher":{"@id":"https:\/\/emielcaron.nl\/#\/schema\/person\/992b3c38031ce991eef0e83dd12e11cd"},"articleSection":["Research projects"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/emielcaron.nl\/?p=274#respond"]}]},{"@type":"WebPage","@id":"https:\/\/emielcaron.nl\/?p=274","url":"https:\/\/emielcaron.nl\/?p=274","name":"Presentation at ICT OPEN 2021: 'An Optimization Method for Entity Resolution in Databases' - Emiel Caron","isPartOf":{"@id":"https:\/\/emielcaron.nl\/#website"},"datePublished":"2021-02-14T01:51:09+00:00","dateModified":"2021-02-14T18:14:09+00:00","breadcrumb":{"@id":"https:\/\/emielcaron.nl\/?p=274#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/emielcaron.nl\/?p=274"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/emielcaron.nl\/?p=274#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/emielcaron.nl\/"},{"@type":"ListItem","position":2,"name":"Presentation at ICT OPEN 2021: &#8216;An Optimization Method for Entity Resolution in Databases&#8217;"}]},{"@type":"WebSite","@id":"https:\/\/emielcaron.nl\/#website","url":"https:\/\/emielcaron.nl\/","name":"Emiel Caron","description":"PhD, Lecturer &amp; Researcher in Business Intelligence &amp; Analytics, Data science","publisher":{"@id":"https:\/\/emielcaron.nl\/#\/schema\/person\/992b3c38031ce991eef0e83dd12e11cd"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/emielcaron.nl\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/emielcaron.nl\/#\/schema\/person\/992b3c38031ce991eef0e83dd12e11cd","name":"Emiel Caron","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g","caption":"Emiel Caron"},"logo":{"@id":"https:\/\/secure.gravatar.com\/avatar\/16d7767d69c769cde896a0f5e53533595a081cfaeab0aca485f4736e51e08ae0?s=96&d=mm&r=g"}}]}},"_links":{"self":[{"href":"https:\/\/emielcaron.nl\/index.php?rest_route=\/wp\/v2\/posts\/274","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emielcaron.nl\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emielcaron.nl\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emielcaron.nl\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/emielcaron.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=274"}],"version-history":[{"count":9,"href":"https:\/\/emielcaron.nl\/index.php?rest_route=\/wp\/v2\/posts\/274\/revisions"}],"predecessor-version":[{"id":297,"href":"https:\/\/emielcaron.nl\/index.php?rest_route=\/wp\/v2\/posts\/274\/revisions\/297"}],"wp:attachment":[{"href":"https:\/\/emielcaron.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=274"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emielcaron.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=274"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emielcaron.nl\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=274"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}