2020 Presentations (Communicative Events)
Automated Reporting to Inform Archival Data Remediation
(Presentation given at the ArchivesSpace Annual Forum, 3 Aug. 2020.)
As Columbia University Libraries have progressed from migration to operations in ArchivesSpace, we have developed tools and processes to monitor ongoing work, flag errors and problematic constructs, and report information to archivists and support staff in an automated fashion.
The long tail of cleanup after a large-scale migration left a number of known, widespread issues across the corpus, such as missing scope and access notes, incorrectly encoded language information, etc. The regularization required by the migration also opened the opportunity to implement more consistent practices and incremental improvement across repositories in line with DACS principles. To provide continuous visibility and timely warnings, CUL developed a number of automated tools in Python that gather data from both the API and cached EAD files and report information at regular intervals:
* Select resource data from entire corpus, reported nightly.
* Subjects, agents, and accessions, reported weekly.
* EAD checked against custom RelaxNG schema and XSLT rules (formerly Schematron assertions). Status of well-formedness and validity, along with any DACS compliance warnings, reported daily.
* ILS catalog sync, reported nightly.
The automated reports generate email summaries with links to complete reports in shared Google Sheets. This information helped inform numerous remediation efforts during COVID-19 while all staff are working remotely, such as origination review, access and scope note review, character counts for scope and bio notes to flag collections with limited description.
- AS Online Forum 2020 Presentation.pdf application/pdf 1.65 MB Download File
More About This Work
- Academic Units
- Published Here
- August 10, 2020