Data Quality Overview of the IISG Knowledge Graph

Improving the data quality of the IISG Knowledge Graph makes it easier to process and analyze the data. To obtain high-quality data, it must be consistent and unambiguous. Due to schema and format, data can become inconsistent and lose quality. data quality the data needs to undergo data cleansing, but to effectively clean the data insight into the data is needed.

This data story shows the improvements that have already been made to the IISG Knowledge Graph, by comparing the quality of the current version (2018-09) with a previous version (2017). The first six pages show data quality issues that could improve the data quality of the entire knowledge base. The final page shows the data quality issue of the previous knowledge graph and shows the improvement made by the IISG.

Quality Comparison: Person entities
Provides an overview of some of the quality improvements that have been made to person entities that appear in the IISG Knowledge Graph.
Quality Indicator 1: Dates & Times
Overview of data quality issues with dates & times.
Quality Indicator 2: Persons & locations with multiple labels
Overview of person and location authorities that have alternative spellings.
Quality Indicator 3: Persons & locations with multiple identifiers
Overview of persons and locations with the same name, but different identifiers.
Quality Indicator 4: Encoding issues
Overview of encoding issues in various strings.
Quality Indicator 5: Size of collection items
Overview of numeric errors with respect to the size (width & height) of collection items.
Quality Indicator 6: Languages
Overview of language authorities.