Computational Classics: Finding errors in annotated ancient Greek texts with association rules mining

Posted on zo 13 november 2022 in Blog • Tagged with Natural Language Processing, association rules mining

This blog describes some experiments with ruleminer for finding morphological patterns in annotated data of ancient Greek texts. Ruleminer is a Python package for association rules mining, a rules-based machine learning method for discovering patterns in large data sets. The regex and dataframe approach in ruleminer (set out in this …


Continue reading

Multilingual termbases with metadata from reporting templates

Posted on di 26 juli 2022 in Blog • Tagged with Natural Language Processing, termbase, xbrl

Domain-specific termbases are of great importance to many domain-specific NLP-tasks. They enable identification and annotation of terms in documents in situations where often not enough text is available to use statistical approaches. And more importantly, they form a step towards extracting structured facts from unstructured text data.

This blog shows …


Continue reading

Explainable outlier detection with decision trees and ruleminer

Posted on zo 12 juni 2022 in Blog • Tagged with Association rules mining, decision trees

This is a note on an extension of the ruleminer package to convert the content of decision trees into rules to provide an approach to unsupervised and explainable outlier detection.

Here is a way to use decision trees for unsupervised outlier detection. For an arbitrary data set (in the form …


Continue reading

The Solvency termbase for NLP

Posted on wo 09 februari 2022 in Blog • Tagged with Natural Language Processing, solvency, termbase

This blog describes a way to construct a terminology database for the insurance supervision knowledge domain. The goal of this termbase is provide a reliable basis to extract insurance supervision terminology within different NLP analyses.

The terminology of solvency and insurance supervision forms an expert domain of terminology based on …


Continue reading

Europe's insurance register linked to the GLEIF RDF dataset

Posted on za 08 januari 2022 in Blog • Tagged with linked-data, rdf, EIOPA, xbrl, GLEIF

Number 7 of my New Year's Resolutions list reads "only use and provide linked data". So, to start the year well, I decided to do a little experiment to combine insurance undertakings register data with publicly available legal entity data. In concrete terms, I wanted to provide the European insurance …


Continue reading

EIOPA's Solvency 2 taxonomy in RDF

Posted on vr 18 juni 2021 in Blog • Tagged with linked-data, rdf, EIOPA, xbrl

To use the metadata from XBRL taxonomies, like labels, hierarchies, template structures and formulas, often licensed software is needed to process the taxonomy and convert the XML content to readable formats. In an earlier blog I have shown that it is useful to convert XBRL instance data to a linked …


Continue reading

Converting XBRL to RDF-star

Posted on wo 31 maart 2021 in Blog • Tagged with linked-data, rdf, rdf-star, xbrl

Lately I have been working on the conversion of XBRL instances and related taxonomy schemas and linkbases to RDF and RDF-star. In these semantic data formats, you can link data in XBRL data with other data sources and you can query the data in a fairly easy manner. RDF-star is …


Continue reading

Converting supervisory reports to Semantic Webs: from XBRL to RDF

Posted on wo 28 oktober 2020 in Blog • Tagged with linked-data, rdf, xbrl

A growing number of supervisory reports across Europe are based on the XML Extensible Business Reporting Language standard (XBRL). Financial entities such as banks, insurance undertakings and pension institutions are required to submit their reports to their supervisors in this format.

XBRL is a language for modeling, exchanging and automatically …


Continue reading