WEBVTT
Kind: captions
Language: en

00:00:01.120 --> 00:00:03.120
Thanks so much for inviting me.&nbsp;&nbsp;

00:00:03.120 --> 00:00:07.440
I want to start by recognizing that this&nbsp;
was a team effort and we had a great team.&nbsp;

00:00:12.320 --> 00:00:16.720
Crises like the pandemic emerge and we&nbsp;
find ourselves in urgent need of data.&nbsp;&nbsp;

00:00:16.720 --> 00:00:22.560
Sometimes the data we need are about biodiversity.&nbsp;
In this case, we'd like to know things about bats.&nbsp;&nbsp;

00:00:23.600 --> 00:00:28.960
In other cases, like oil spills, it might&nbsp;
be the entire biota in a particular region&nbsp;&nbsp;

00:00:28.960 --> 00:00:34.960
about which we need data. Until our work,&nbsp;
we didn't have a set of crisis response&nbsp;&nbsp;

00:00:34.960 --> 00:00:43.840
protocols to rapidly enhance data about an&nbsp;
important source of biodiversity information:&nbsp;&nbsp;

00:00:46.240 --> 00:00:49.840
that is the world's 3 to 4&nbsp;
billion biodiversity specimens.&nbsp;

00:00:50.720 --> 00:00:55.120
As we see in our current pandemic, it could be&nbsp;
that a narrow subset of those specimens suddenly&nbsp;&nbsp;

00:00:55.120 --> 00:01:03.360
become critical to crisis response. Specimens have&nbsp;
associated information that documents the what was&nbsp;&nbsp;

00:01:03.360 --> 00:01:09.680
collected, where it was collected, who collected&nbsp;
it, and other information. There are also time&nbsp;&nbsp;

00:01:09.680 --> 00:01:15.840
capsules of potential information, since genomic&nbsp;
data can often be derived from the specimen or&nbsp;&nbsp;

00:01:15.840 --> 00:01:22.240
its disease-causing agents. These are just a&nbsp;
few specimens of the species of Horseshoe bat&nbsp;&nbsp;

00:01:22.240 --> 00:01:28.560
in which the closest relative of SARS-CoV-2&nbsp;
has been found. That is: Rhinolophus affinis.&nbsp;

00:01:31.760 --> 00:01:35.280
We targeted a set of three&nbsp;
closely-related families,&nbsp;&nbsp;

00:01:35.280 --> 00:01:40.960
including the family of Rhinolophus&nbsp;
affinis for specimen data enhancement.&nbsp;

00:01:44.880 --> 00:01:50.640
These are the mappings of Horseshoe bat specimens&nbsp;
at the two major aggregators of specimen data.&nbsp;&nbsp;

00:01:51.360 --> 00:01:56.320
I want to emphasize that data coming from&nbsp;
collections and served by these aggregators&nbsp;&nbsp;

00:01:56.320 --> 00:02:01.120
are valuable in their current state. However,&nbsp;
the data have some qualities that can be&nbsp;&nbsp;

00:02:01.120 --> 00:02:06.960
improved through consideration of the data in&nbsp;
aggregate and the data have been created over&nbsp;&nbsp;

00:02:06.960 --> 00:02:12.000
two or more decades, meaning that the data&nbsp;
have not all benefited from our current&nbsp;&nbsp;

00:02:12.000 --> 00:02:16.880
understanding of best practices and the&nbsp;
availability of software to improve some steps.&nbsp;

00:02:20.480 --> 00:02:25.440
We focused on enhancing the data in these ways&nbsp;
and I'll walk through those in bold with you.&nbsp;&nbsp;

00:02:26.080 --> 00:02:31.520
If you pay attention to the change in slide title&nbsp;
you can follow our relatively rapid progression&nbsp;&nbsp;

00:02:31.520 --> 00:02:37.200
through these activities.
The specimen data coming&nbsp;&nbsp;

00:02:37.200 --> 00:02:41.200
from the two major aggregators has&nbsp;
overlapped but it's not identical.&nbsp;&nbsp;

00:02:42.000 --> 00:02:47.200
De-duplicating their records produced&nbsp;
about 90,000 in scope records.&nbsp;

00:02:49.680 --> 00:02:56.960
The records are curated by 118 institutions&nbsp;
worldwide. The top 10 of these institutions&nbsp;&nbsp;

00:02:56.960 --> 00:03:06.640
together share 63% of the records.
We could only assign or assess coordinates&nbsp;&nbsp;

00:03:06.640 --> 00:03:12.560
for collections - collection locations when&nbsp;
those locations are described in the shared data.&nbsp;&nbsp;

00:03:13.120 --> 00:03:16.400
And about two-thirds of the&nbsp;
records had that information.&nbsp;&nbsp;

00:03:17.440 --> 00:03:22.560
Of those, about two-thirds arrived with&nbsp;
pre-assigned coordinates and one third did not.&nbsp;&nbsp;

00:03:25.680 --> 00:03:28.640
We were able to assess or&nbsp;
assign coordinates in 95%&nbsp;&nbsp;

00:03:29.200 --> 00:03:34.800
of the total possible cases and we modified&nbsp;
pre-existing coordinates about half the time.&nbsp;&nbsp;

00:03:36.160 --> 00:03:40.560
The median amount that a pre-existing&nbsp;
coordinate was moved was six kilometers.&nbsp;

00:03:44.640 --> 00:03:52.560
Importantly, the relevant metadata fields went&nbsp;
from mostly empty to mostly complete with such&nbsp;&nbsp;

00:03:52.560 --> 00:03:59.840
useful information added as geo-referencing&nbsp;
protocol and geo-referencing resources.&nbsp;

00:04:01.680 --> 00:04:04.720
In this summary, at the country&nbsp;
level, you can see where the&nbsp;&nbsp;

00:04:04.720 --> 00:04:08.800
greatest number of specimens have been&nbsp;
collected by the size of the pie chart&nbsp;&nbsp;

00:04:08.800 --> 00:04:13.840
and the relative number of new coordinates&nbsp;
added to the specimens from those countries.&nbsp;

00:04:17.680 --> 00:04:22.080
Here are the coordinates for collecting&nbsp;
locations for each of our focal families.&nbsp;

00:04:26.480 --> 00:04:31.840
We compared our coordinates with prior range&nbsp;
maps for the species when they were available&nbsp;&nbsp;

00:04:31.840 --> 00:04:36.960
from the International Union for Conservation&nbsp;
of Nature (IUCN). Here's an example range&nbsp;&nbsp;

00:04:36.960 --> 00:04:42.240
circumscription in red for one species,&nbsp;
this is again Rhinolophus affinis,&nbsp;&nbsp;

00:04:42.240 --> 00:04:48.400
and our coordinates for that species in green.&nbsp;
We found that georeferenced specimens suggest&nbsp;&nbsp;

00:04:48.400 --> 00:04:55.840
range extensions for 153 of the 169 focal bat&nbsp;
taxa for which we have these kinds of maps.&nbsp;&nbsp;

00:04:56.880 --> 00:05:02.160
This is a significant, significant expansion&nbsp;
of our understanding of where to find the bats.&nbsp;&nbsp;

00:05:03.920 --> 00:05:11.280
This is a screenshot of a web-based horseshoe&nbsp;
bat data explorer for IUCN map assessors&nbsp;&nbsp;

00:05:11.280 --> 00:05:16.880
and other stakeholders to look at locality&nbsp;
coordinates relative to the current IUCN maps,&nbsp;&nbsp;

00:05:18.320 --> 00:05:23.840
with links back to complete records in our system.&nbsp;

00:05:25.200 --> 00:05:31.680
The records arrived with 2,930 distinct values&nbsp;
referencing people who collected or identified&nbsp;&nbsp;

00:05:31.680 --> 00:05:38.160
the specimens. We were able to assign 803&nbsp;
unique identifiers to a subset of those values.&nbsp;&nbsp;

00:05:39.200 --> 00:05:43.120
These unique identifiers, or ORCID&nbsp;
IDs when the person is living,&nbsp;&nbsp;

00:05:43.120 --> 00:05:52.720
and Wikidata QIDs when the person is deceased.&nbsp;
An additional 437 values representing 359 people&nbsp;&nbsp;

00:05:52.720 --> 00:05:57.760
are reasonably assigned to persons currently&nbsp;
living but who do not yet have an ORCID ID.&nbsp;

00:06:00.800 --> 00:06:08.240
To do this we engage 34 people who are mostly&nbsp;
bat experts from 13 countries. These experts and&nbsp;&nbsp;

00:06:08.240 --> 00:06:13.600
our data curators found that they could associate&nbsp;
about half of the records to a unique identifier&nbsp;&nbsp;

00:06:13.600 --> 00:06:19.840
for specimen collector and about two-thirds&nbsp;
of those for specimen identifier.&nbsp;

00:06:23.840 --> 00:06:28.400
The value of doing this for a crisis response&nbsp;
might not be immediately apparent, but it could&nbsp;&nbsp;

00:06:28.400 --> 00:06:35.760
be among the most important things that we did. We&nbsp;
identified 117 living people with ORCID IDs with&nbsp;&nbsp;

00:06:35.760 --> 00:06:41.760
experience collecting the bats. You might say that&nbsp;
it's easy to find the bat experts - just approach&nbsp;&nbsp;

00:06:41.760 --> 00:06:47.360
their professional societies or do a literature&nbsp;
search. However, the bat collectors and those&nbsp;&nbsp;

00:06:47.360 --> 00:06:53.360
you might find in that way will only partially&nbsp;
overlap. The bat collectors turned out to include&nbsp;&nbsp;

00:06:53.360 --> 00:06:58.800
a considerable diversity of professions including&nbsp;
those who are not professional biologists.&nbsp;&nbsp;

00:06:59.840 --> 00:07:04.880
Here are a few other descriptors - descriptions&nbsp;
of collectors with valuable experience doing&nbsp;&nbsp;

00:07:04.880 --> 00:07:13.600
field work in sometimes remote areas. Remember,&nbsp;
excuse me, remember we also identified 359 living&nbsp;&nbsp;

00:07:13.600 --> 00:07:19.520
bat collectors who don't have ORCID IDs.&nbsp;
Together, this is a rolodex of potential&nbsp;&nbsp;

00:07:19.520 --> 00:07:24.880
contacts for those of you who need to go back&nbsp;
into the field to relocate bat populations.&nbsp;

00:07:27.920 --> 00:07:33.120
Prior to data enhancement 5.5% of the records&nbsp;
had information about associated sequences.&nbsp;&nbsp;

00:07:33.760 --> 00:07:39.360
We identified an additional about&nbsp;
1,100 specimens with which we could&nbsp;&nbsp;

00:07:39.360 --> 00:07:47.600
associate new sequences that we found.
Our versioned data, and importantly our&nbsp;&nbsp;

00:07:48.560 --> 00:07:54.800
protocols, are shared at Zenodo so that we have&nbsp;
now blazed a trail that others can follow for&nbsp;&nbsp;

00:07:54.800 --> 00:08:01.360
rapid data enhancement of specimens during the&nbsp;
next crisis. We expect to share our final versions&nbsp;&nbsp;

00:08:01.360 --> 00:08:06.880
of everything there very soon. However, I will&nbsp;
note that the current version that's up right&nbsp;&nbsp;

00:08:06.880 --> 00:08:12.160
now is very close to the final version. We're&nbsp;
close to submission of a manuscript focused on&nbsp;&nbsp;

00:08:12.160 --> 00:08:17.520
the work and expect to make the Horseshoe bat data&nbsp;
explorer broadly available to those who need it.&nbsp;

00:08:19.120 --> 00:08:26.080
The EU recently announced new funding for&nbsp;
the creation of records, About 20 - about&nbsp;&nbsp;

00:08:27.280 --> 00:08:36.000
approximately 20,000 bat specimens, and we expect&nbsp;
the foundation that we laid to speed up that work.&nbsp;

00:08:39.200 --> 00:08:44.400
I want to thank those who contributed their&nbsp;
time and expertise to the people disambiguation&nbsp;&nbsp;

00:08:44.400 --> 00:08:49.920
and in the first paragraph, but not&nbsp;
assigned ORCID IDs there for lack of space.&nbsp;&nbsp;

00:08:50.800 --> 00:08:52.960
And thank you to the NSF for supporting the work.

