Header image

PAPER SESSION 7: In the Lab: Tools & Technologies

Tracks
Rongomātāne B
Wednesday, November 5, 2025
10:45 AM - 12:00 PM
Rongomātāne B

Speaker

Claire Fox
Software Preservation And Emulation Librarian
Yale University Library

Understanding The Digital Format Ecosystem

Summary Abstract

One of the most fundamental facets of preserving born-digital cultural materials is understanding the data formats in our collections. In contrast to the relatively constrained worlds of metadata and digitised materials, born-digital items can come in a huge range of formats, making format identification a crucial step towards understanding the informational and software dependencies we need to capture to make future access possible. How can today’s available file format identification tools and strategies help us understand the formats in our collections, and where are we still lacking ways to gather this critical information?

As part of the Registries of Good Practice project (a collaboration between the Digital Preservation Coalition and Yale University Library), we have been analysing and comparing data from a wide range of format identification tools and registries to better understand the current landscape. Combining this with methods drawn from ecological studies of species diversity, we have used the gaps between format registries to estimate the true scale of the format problem. Building on this evidence, we outline how a more distributed and responsive strategy would help us cope with the challenge of preserving our born-digital heritage.

Biography

Claire Fox is the Software Preservation and Emulation Librarian at Yale Library. She collaborates with colleagues at Yale and beyond to identify and implement strategies that support long-term preservation and access for born-digital collection materials. Andrew N. Jackson is the Preservation Registries Technical Architect working on the Registries of Good Practice Project at the Digital Preservation Coalition. He has previously led technical work on the UK Web Archive and worked on digital preservation research and development projects such as Planets and Scape. Paul Wheatley is a digital preservation consultant at Preserve Together Ltd, and has previously worked for the Digital Preservation Coalition, the British Library and the University of Leeds in a variety of digital preservation focused roles.
Mr David Clipsham
File Format Analyst
Preservica

Bit by Bit: A Rapid Assessment Methodology for Evaluating File Format Obsolescence Risk

Summary Abstract

In this paper I provide a practical rapid-assessment methodology for determining whether a preservation intervention is recommended based upon a prioritized, sequential list of positive and negative file format obsolescence risk factors.

Rather than a complex scoring system, the core output of this methodology is a simple set of conditions that aim to suit general use-cases for quickly determining whether a given format is at immediate risk of obsolescence and therefore whether an immediate preservation intervention is required.

Biography

David Clipsham is the File Format Analyst at Preservica, supporting system users in overcoming their file format-related challenges. He previously worked at The National Archives (UK), where among many roles, he spear-headed PRONOM file format research for around 10 years.
Mr Ulf Preuss
Head Of Coordination Office Brandenburg-digital
University Of Applied Sciences Potsdam / Information Sciences Department

AUTOMATED PRESERVATION OF CULTURAL HERITAGE

Summary Abstract

Ensuring metadata compliance with the submission guidelines of preservation service providers is a labor-intensive process, requiring complex manual mapping and transformation. In this paper, we present a work-in-progress prototype for the automated preservation of digital objects from cultural heritage collections. To achieve this, we developed the Aggregator - a tool that manages preservation key functions such as SIP compilation, object upload, data integration, package handling, access control and client management. Designed for multi-client use, the system allows the preservation process to be entirely controlled by the client. As a result, no additional personnel resources are required from the digital preservation service provider for ingest, access, or package handling.

Biography

Rolf Daessler has been a Professor of Information Technology at the University of Applied Sciences Potsdam, Information Sciences Department, since 2001. Ulf Preuss has headed the Coordination Office Brandenburg-digital at the University of Applied Sciences Potsdam, Information Sciences Department, since 2012.
Dr Juha Lehtonen
Senior Service Manager
CSC - IT Center for Science Ltd.

From File Validation to Structural Maps, We Got Your Pre-Ingest Covered

Summary Abstract

Tools and services that are both reliable and approachable to the end user benefit everyone in digital preservation. From validating digital objects to creating a submission information package, the goal is to enable a dependable and streamlined process. We at the National Digital Preservation Services in Finland have developed tools to help with this, two of which will be covered in this paper. The File Validation Service is a stand-alone online file validator that helps in catching errors in files as early as possible in the life cycle of digital content. The Pre-Ingest Library is a Python library that automates the creation of complex metadata documents needed for ingesting content into the services.

Biography

Tiina Koho, Software Developer for the National Digital Preservation Services at CSC – IT Center for Science. They are a core member of the digital preservation developer team. Johan Kylander, Digital Preservation Specialist at CSC – IT Center for Science, has worked with digital preservation for over a decade and works as a product owner for the National Digital Preservation Services. Juha Lehtonen, PhD, Senior Service Manager of the Digital Preservation and Fairdata Services at CSC – IT Center for Science, Finland, has an extensive expertise on digital preservation and data management processes. He is the technical coordinator of the EOSC EDEN EU-funded project and a member of the METS Editorial Board. Heikki Helin, PhD, is a Development Manager at CSC – IT Center for Science. He leads the development group implementing the National Digital Preservation Services in Finland. He has been involved with digital preservation for more than 15 years.
Agenda Item Image
Mr Yun-Man Fan
Chinese Academy Of Medical Sciences & Peking Union Medical College Institute Of Medical Information & Library

A long-term Preservation Model of Prompt Digital Repository

Summary Abstract

This paper examines the challenges and methodologies associated with the long-term preservation of Prompt resources within the domain of artificial intelligence, and proposes a storage framework. The article delves into the dynamic dependencies, as well as the multimodal characteristics encompassing text, images, and videos associated with Prompt. It develops a five-layer storage model comprising: an infrastructure layer (utilizing Docker and hybrid cloud solutions), a data storage layer (incorporating relational and vector databases), a embedding layer (focusing on multimodal embedding), and an application layer (encompassing OAIS processes and API services). The study identifies several technical challenges, including the optimization of storage costs, the safeguarding of privacy, and ensuring version compatibility. Furthermore, it outlines prospective applications in the realm of large medical models, with the objective of providing a systematic solution that ensures the integrity, accessibility, and compliance of high-value Prompt resources.

Biography

Yun-Man Fan:Assistant Professor,dedicated to research on Medical Long-term preservation of medical digital resources,Medical frontier exploration, Medical Large Language Model Research and Medical AI Agent Research. An Fang:Professor of Library Science, dedicated to research on Medical Long-term preservation of medical digital resources, network information systems, medical knowledge organization and digital libraries. Jia-Hui Hu:Associate Researcher, dedicated to research on Medical Long-term preservation of medical digital resources,Medical Artificial Intelligence. Chen-Liu Yang, Associate Researcher, dedicated to research on Medical long-term preservation of digital resources, network and data security management, big data analysis, and related fields. Qian Wang:Associate Research Fellow, dedicated to research on Medical Long-term preservation of medical digital resources,Medical Artificial Intelligence. Lei Wang:Assistant Professor, dedicated to research on Medical Long-term preservation of medical digital resources,Medical Artificial Intelligence.
loading