This datastory uses Dutch death certificates from 1910-20 to map the temporal, spatial and social distribution of the 'Spanish' flu epidemic that hit The Netherlands in 1918-19.
Thanks to the indexation efforts of archives and the LINKS project, large parts of the Dutch civil registry ('Burgerlijke Stand') are now becoming available for historians. The death certificates used here are retrieved from openarch.nl (available here). From the individual death certificates files per archive, one combined dataset was created. One challenge of working with these certificates is that that the same certificate may have been indexed by more than one archive. These were de-duplicated using a combination of date of death, location and similar full names (using string distances). Key variables were subsequently cleaned and standardized, such as age at death, placenames, and occupations. Using different string match combinations, place names were standardized against the Historical Dutch Toponyms dataset, to retrieve coordinates and Amsterdam Codes. Occupations were matched against the file HSN Occupations to retrieve HISCO, HISCLASS and HISCAM categorizations. All related R scripts can be found at our Github repository. The dataset was then converted into Linked Data with COW, using the "Persons table" datamodel designed by Gerrit Bloothooft and Kees Mandemakers in the LINKS project. The complete Linked Dataset and the related metadata scheme can be explored at Druid
The table below demonstrates the success of the standardization efforts, presented as Linked Data.
Sources:
Gerrit Bloothooft & Kees Mandemakers, The Zeeland Challenge, document for benchmark, presented at workshop Data Linkage: Techniques, Challenges and Applications at Isaac Newton Institute for Mathematics, Cambridge, UK, 15 September 2016;
Van Leeuwen, M. H. D. , Maas, I. & Miles, A. (2003). HISCO: Historical International Standard Classification of Occupations. Leuven: Leuven University Press;
Zijdeman, R., & Lambert, P. (2010). Measuring social structure in the past: A comparison of historical class schemes and occupational stratification scales on Dutch 19th and early 20th century data. Journal of Belgian History/Belgisch Tijdschrift voor Nieuwste Geschiedenis/Revue Belge de Histoire Contemporaine, 40(1-2), 111-141.
Because archives generally decide for themselves which variables from the civil registry they will index, occupations of the deceased are not available for all regions. Archives from the provinces of Drenthe, Gelderland, and Zeeland are, regrettably, the only ones to have structurally indexed all occupations of the deceased for this period.
The table above demonstrated that the flu did most of its damage in 1918, with much more deaths than surrounding years. As with most flu epidemics, seasonality played a role as well. Deaths surged in the fall in particular, with November 1918 being the most deadly month. But also in January through March 1919 many people seem to have succumbed to the flu when compared to a more ordinary year such as 1910.
Linking the Amsterdam Codes in our dataset to those in Gemeentegeschiedenis, as above, allows to map the spatial distribution of the epidemic. We compared total deaths by municipality in 1918 with the mean number of deaths during 1910-17. Excess death rates were divided into three catagories: municipalities with excess death rates below the mean of 1.38 (light red); those with rates from 1.38 through 1.60 (red); and municipalities with even higher excess death rates (dark red). The map suggests hotbeds of the epidemic in Groningen, Drenthe, and Limburg, with smaller clusters in parts of North-Brabant and North-Holland. The Achterhoek, the south west of North-Brabant and Schouwen-Duiveland seem to have been affected relatively less severe. By hovering over the map, the excess death rate and the name of the municipality will show.
The next query zooms in on the excess mortality in November 1918, compared to the mean deaths in the same month between 1910-17. It shows that 's-Hertogenbosch, Finsterwolde, and Harderwijk experienced death rates well over fifteen times what was normal. Only nineteen of all 967 municipalities we have data for experienced below-average mortality rates, and these were mostly caused by their small size - again highlighting the severity of the epidemic. Municipalities can be browsed by entering their name in the searchbar 'Filter query results' just above the table at the left-hand side, after clicking the blue upward-facing arrow. To put the municipal rates into context: the overall excess death rate in 1918 was about 1.41.
In contrast to many other flu outbreaks, it has been demonstrated that the 1918-19 epidemic was deadly for young adults in particular. We can use the age at death to examine if this holds for The Netherlands as well, and whether this was true during the wave in November 1918 and the first three months of 1919. Surprisingly, only the November 1918 wave affected young adults in - extremely - large numbers. During the next outbreak in January-March 1919 the age distribution was more in line with years without an epidemic. At the moment we can only speculate as to why. Perhaps young adults who overcame the first wave had acquired immunity by 1919, but that does not explain why the elderly, who had not been at much risk in 1918, were now affected in relatively large numbers.
Last, we examine whether people from certain occupations experienced higher death rates than usual. This could suggest they had a higher exposure to the virus. To do so, occupational titles of the deceased are grouped under occupational categories, using the HISCO Unit Group code. Only those of age of 18-45 are examined, to see if their high death rates can be (partially) explained by their occupation. Deaths per occupational group in 1918 are again compared to their corresponding mean during 1910-17. The excess death rate of this age group in general was a staggering 2.95, so people from occupational groups above this figure were even worse off. Next to nurses, who had by nature high exposure to the virus, it shows that clerks, miners, weavers and fishermen died relatively more often in 1918. Perhaps working in closed-off environments, or with many people in the same place provides an explanation. Those working mostly outside, such as street vendors, dockers, and brick layers were better off. Yet some other groups working inside, such as tailors and shoe makers, were also affected relatively modestly, suggesting that working environments alone may not provide a sufficient explanation. In any case, and in line with other studies, we observe no discernable hierarchical relation between social class - measured by HISCAM - and excess mortality.
By using Linked Data to connect Dutch death certificates to other datasets, such as Gemeentegeschiedenis and HISCO, we can for the first time attempt an encompassing study into the 1918-19 epidemic. Our, still preliminary, results suggest that the odds of meeting someone who carried the virus probably mattered most, and that this was partially affected by working environments, such as the number of co-workers and working inside, and also by where you lived in The Netherlands. Some regions, such as Groningen, were notably more affected than others. The next step would be to study these variables in combination, and set up a case-control study comparing the social-demographic structure of such regions with regions or municipalities that were affected to a smaller degree. In the near future we will also be able to link these death certificates across the complete civil registry, allowing to reconstruct family trees. In this way it will be possible to take into account household composition and genetic traits as well, such as a comparison of mortality between siblings or between different generations of the same family.
This datastory was created by the CLARIAH project (WP4).
For questions and suggestions please email Ruben Schalk
This version: October 9, 2020