2014 Presentations (Communicative Events)
Adding Metadata and Ingesting Large Born-Digital Archives with Archivematica
Columbia University Libraries is working on a large-scale project, funded by the Ford Foundation grant, to permanently preserve and make accessible the archives of the International Fellowships Program. Active in 2001-2013, the IFP offered fellowships for post-graduate study to social justice leaders from underserved communities in Asia, Africa, Latin America, Russia, and the Middle East. The IFP records included a substantial digital component: 3.6 TB from 22 countries, in 245 file formats, 10 languages and 7 non-Roman character sets. This presentation focuses on metadata and ingest issues we faced when processing this major born-digital acquisition, and on procedural and technological solutions we adopted. The only descriptive metadata on the file level was contained in file names and directory paths, so these were retained as an originalName metadata element in AIP METS file. Files from each office were sorted into three groups by desired access level (online, reading room, and embargoed until 2075). Archivematica software was used to create the Submission Information Packages (SIPs) and subsequently transform them into Archival Information Packages (AIPs). One or more SIPs were created for each access group, depending on the directory size. We developed a formula to calculate if a group of files was small enough to fit in one SIP. Audiovisual materials, databases, emails, and compressed files were addressed separately. Processing included character conversion and format normalization. Access restrictions and SIP-specific descriptive metadata for each package were entered manually. AIPs were transferred to preservation storage in BagIt format.
- AddingMetadata_SAA_Forum_2014.pdf application/pdf 1.6 MB Download File