Hu

István Hegedűs

István Hegedűs is the Head of the Digital Content Development Department at the National Archives of Hungary. He obtained MA degree (2009) as a librarian specialised to computer sciences, MA degree (2010) as historian, and MSc degree (2012) as agricultural engineer specialised in rural development. His papers about the Public Collection Digitization Strategy of Hungary and the role of AI and ML in archives were published in English. His main research interests include AI in archives, digitization technology, databases in public collections, rural history, and history of land.

Artificial Intelligence and Mass Processing of Archival Documents

Although artificial intelligence is the product of science-fiction literature, it currently represents a significant branch of computer science dealing with intelligent behaviour, machine learning, and machine adaptation. It has become a discipline that attempts to answer real-world problems. Artificial intelligence systems are nowadays widely used in economics and medicine, design, or military. Although the role of archives is changing worldwide, the last treasures in data collection are still hiding deep inside archives. In this grandiose transformation, archives need to be at the forefront of their own future, so that they can steer, guide, and not lose out.

The vast masses of information in archives provide an excellent platform for the exploitation of artificial intelligence. The mass of data can be a great help not only for research but also for decision making, policy preparation and in some areas of public administration in the not-too-distant future.

The desire to connect data on persons and other entities found in historical sources (census, population censuses, registers) in a computerised way was already formulated by researchers in the 1960s. Linking long-term data sets on population and generations of regions and countries would be of great benefit to social scientists and historians, but also to citizens and family researchers. However, the analysis of these on large data sets, and only partly due to technological limitations, took until the mid-2000s. After that, most attempts were made in connection with already digitised and text-recognised censuses, but there are also examples of connecting fragmented archival data created in connection with religious registers and Holocaust documents.

In my presentation, I will show the possible methods and realised projects, and will briefly present the current, ongoing and upcoming AI and record linking program of the National Archives of Hungary, such as the record linking project of the registers of Hungarian prisoners of war, the handwritten text recognition of the tax conscription of 1828, and last but not least the combination of previous ones in the digitisation of birth, death, and marriage registers of Hungary between 1895 and 1980.

Recommended exhibitions