WEBVTT
Kind: captions
Language: en

00:00:00.400 --> 00:00:04.080
We will be hearing from Dominique Duncan&nbsp;&nbsp;

00:00:04.080 --> 00:00:09.200
at the University of Southern California&nbsp;
and Dominique we're ready whenever you are.

00:00:10.160 --> 00:00:18.080
Thank you. So I'm Dominique Duncan from USC&nbsp;
from the Nerve Imaging and Informatics Institute&nbsp;&nbsp;

00:00:18.080 --> 00:00:25.280
and I'll be talking about our COVID-19 data&nbsp;
archive. For short it's called COVID ARC.

00:00:26.320 --> 00:00:33.200
So we have a lot of experience at our institute&nbsp;
with large scale multimodal data repositories&nbsp;&nbsp;

00:00:33.760 --> 00:00:40.960
so we decided to apply our experience and&nbsp;
the tools that we've developed in these other&nbsp;&nbsp;

00:00:40.960 --> 00:00:48.400
projects and extend them to this COVID-19 data&nbsp;
archive. So you can see here on the bottom left,&nbsp;&nbsp;

00:00:48.400 --> 00:00:54.560
that's a screenshot of our home page. If you want&nbsp;
to go to the website, I encourage everyone to go&nbsp;&nbsp;

00:00:55.120 --> 00:01:03.360
to covid-arc.loni.usc.edu to find out more about&nbsp;
the project, the various data sets that we have,&nbsp;&nbsp;

00:01:03.920 --> 00:01:11.920
our analytic tools and to find out more about&nbsp;
either uploading data or downloading existing data&nbsp;&nbsp;

00:01:11.920 --> 00:01:19.600
that we have. So we have some publicly available&nbsp;
data sets that we've been curating and organizing.&nbsp;&nbsp;

00:01:20.560 --> 00:01:27.680
We want to encourage researchers to&nbsp;
be able to not only perform analysis&nbsp;&nbsp;

00:01:27.680 --> 00:01:35.600
on individual data sets but across data sets from&nbsp;
various sites and so we want to make that process&nbsp;&nbsp;

00:01:35.600 --> 00:01:43.120
easier for them. And so we're also providing&nbsp;
various tools like quality control tools&nbsp;&nbsp;

00:01:44.000 --> 00:01:51.360
that people can use to evaluate images and&nbsp;
evaluate the quality using various metrics. We&nbsp;&nbsp;

00:01:51.360 --> 00:01:58.480
have various visualization tools for imaging data&nbsp;
and other types of data and then a wide variety of&nbsp;&nbsp;

00:01:58.480 --> 00:02:03.840
analytic tools that people can use. And then&nbsp;
for data sets that are not publicly available,&nbsp;&nbsp;

00:02:04.880 --> 00:02:13.120
data providers can decide if they want to store&nbsp;
their data on our server or if they want to keep&nbsp;&nbsp;

00:02:13.120 --> 00:02:19.840
their data stored locally at their site and just&nbsp;
provide us with the metadata so that we can let&nbsp;&nbsp;

00:02:19.840 --> 00:02:24.640
users know what data are available and&nbsp;
then we just facilitate the process&nbsp;&nbsp;

00:02:25.200 --> 00:02:32.080
of requesting access to that data. So if data&nbsp;
providers give us their data, it's anonymized&nbsp;&nbsp;

00:02:32.720 --> 00:02:39.600
and the data providers maintain full control of&nbsp;
access and they get to decide who gets access to&nbsp;&nbsp;

00:02:39.600 --> 00:02:45.520
it. If they want to wait to publish some findings&nbsp;
and then make their data sets publicly available,&nbsp;&nbsp;

00:02:45.520 --> 00:02:52.400
they're able to do that as well. And then&nbsp;
we use ASPERA which is IBM's HIPAA compliant&nbsp;&nbsp;

00:02:52.400 --> 00:02:59.200
encrypted high-speed file transfer system&nbsp;
for either uploading data to our server or&nbsp;&nbsp;

00:02:59.200 --> 00:03:03.840
for people who want access to the data.&nbsp;
They can download it that way as well.

00:03:05.440 --> 00:03:09.760
I know you can't see any of this text&nbsp;
but this just gives you an overview&nbsp;&nbsp;

00:03:09.760 --> 00:03:16.560
of what we have on our server. So, if you&nbsp;
request access, this is the tree structure&nbsp;&nbsp;

00:03:16.560 --> 00:03:22.960
of the COVID ARC project and so it's separated&nbsp;
into the data and then our analysis that we've&nbsp;&nbsp;

00:03:22.960 --> 00:03:29.440
been working on because we're also very involved&nbsp;
in the analytics for this project. So, the data&nbsp;&nbsp;

00:03:29.440 --> 00:03:36.000
will be separated by the different sites and&nbsp;
then there's more information on each of those.

00:03:36.000 --> 00:03:41.600
So, here is a table of the data sets&nbsp;
that we currently have on the server&nbsp;&nbsp;

00:03:41.600 --> 00:03:49.200
so you can see the location, where the data were&nbsp;
acquired for each of those data sets, and then&nbsp;&nbsp;

00:03:49.200 --> 00:03:55.680
the data format of those images. Right now,&nbsp;
we're focusing mainly on chest CT but we're&nbsp;&nbsp;

00:03:56.640 --> 00:04:03.520
also interested in other types of data and&nbsp;
soon we'll have some brain data so brain&nbsp;&nbsp;

00:04:03.520 --> 00:04:10.560
MRI of COVID-19 patients as well as&nbsp;
EEG. So here you can just see how many&nbsp;&nbsp;

00:04:11.600 --> 00:04:17.600
COVID images and non-COVID images and&nbsp;
masks are there from each of those sites.

00:04:18.960 --> 00:04:21.520
Here you can see some class activation maps.&nbsp;&nbsp;

00:04:22.480 --> 00:04:29.200
So there are many features, as we know,&nbsp;
CT features of COVID-19 patients like&nbsp;&nbsp;

00:04:29.200 --> 00:04:34.880
ground glass opacity, consolidation, crazy&nbsp;
paving pattern, reticular pattern, etc.&nbsp;&nbsp;

00:04:35.680 --> 00:04:42.080
So this just highlights the regions that were most&nbsp;
important for our classification that we're doing.

00:04:43.280 --> 00:04:50.320
One of the issues that we're discovering&nbsp;
is that image quality tends to&nbsp;&nbsp;

00:04:50.320 --> 00:04:58.320
lead to misclassification and so we're&nbsp;
assessing the quality of the images but then&nbsp;&nbsp;

00:04:58.320 --> 00:05:02.640
also considering various ways of&nbsp;
improving the quality of the images&nbsp;&nbsp;

00:05:03.200 --> 00:05:08.320
so looking at different filtering methods and&nbsp;
that's something that's currently in progress.

00:05:09.680 --> 00:05:13.200
We're also doing image&nbsp;
thresholding on the lung masks&nbsp;&nbsp;

00:05:13.200 --> 00:05:17.280
so we're using image thresholding&nbsp;
to find the best possible&nbsp;&nbsp;

00:05:17.280 --> 00:05:24.240
processed image type in the lung masks to improve&nbsp;
the prediction rate of our neural networks.

00:05:24.240 --> 00:05:29.440
And this is a table on one of the&nbsp;
data sets. That first one from Brazil&nbsp;&nbsp;

00:05:29.440 --> 00:05:36.960
that had about 1200 COVID patients and 1200&nbsp;
non-COVID images and we did a comparison&nbsp;&nbsp;

00:05:38.080 --> 00:05:42.800
looking at various methods to&nbsp;
compare convolutional neural networks&nbsp;&nbsp;

00:05:42.800 --> 00:05:48.000
and their accuracy. So we found that&nbsp;
ResNet-18 performed the best for that.

00:05:49.760 --> 00:05:54.080
And just to summarize because I think I'm&nbsp;
at the end of the time these are the people&nbsp;&nbsp;

00:05:54.080 --> 00:06:00.800
that are working on the project including the&nbsp;
REU students. And I'd like to thank the NSF for&nbsp;&nbsp;

00:06:00.800 --> 00:06:07.360
funding this project as well as Katie and Florence&nbsp;
for organizing this and inviting me to talk. And&nbsp;&nbsp;

00:06:07.360 --> 00:06:14.240
please visit the website or email me if you have&nbsp;
any questions and if you would like access to&nbsp;&nbsp;

00:06:14.240 --> 00:06:19.840
any of the data or if you have some data that you&nbsp;
would like to contribute to the website. Thanks.

