*AI Summary*
*Abstract:*
This lecture, delivered by Professor David Wishart, details the necessity and efficacy of using Machine Learning (ML) techniques—specifically k-Nearest Neighbors, SIR-ML, and LSTM models—to address the inherent failures and limitations of traditional epidemiological models (SIR/ABM) in managing infectious diseases. The primary case study is COVID-19, which provided unprecedented "Big Data" necessary for ML training. Key applications discussed include: real-time outbreak tracking (e.g., Blue Dot's system), accurate temporal forecasting of mortality (Lafopapo), modeling the effectiveness and cost of non-pharmaceutical interventions (NPIs, X Prize Challenge), and critically, correcting vastly underreported global mortality figures. Using an ML model validated against excess death data, the true COVID-19 toll is estimated at 20–25 million, roughly four times the officially reported 6.1 million.
*Machine Learning and the Infectious Disease Crisis*
* *0:00 Introduction:* Infectious diseases kill roughly 13 million people annually but are generally treatable or preventable. Management requires timely tracking, spatial/temporal modeling, and estimating the total burden.
* *4:40 Traditional Modeling Failures:* Classic SIR/SEIR models rely on difficult-to-measure parameters ("fudge factors") and are inadequate for spatial modeling or predicting the impact of public health interventions (PHIs). Agent-Based Models (ABMs) are dynamic but computationally costly and hard to scale for long-term prediction.
* *7:56 The ML Opportunity:* The infrastructure developed through post-2015 investments in ML coincided with the COVID-19 pandemic, generating "Big Data" necessary to test and deploy robust ML solutions.
* *10:44 Outbreak Tracking Superiority:* Manual tracking systems (like the CDC's EOC) are slow, expensive, and subject to bias and bureaucracy. The Toronto-based company Blue Dot used ML (Natural Language Processing of 300,000 articles/day in 65 languages) to identify COVID-19 as a concern on December 31, 2019, months before major manual systems reacted.
* *16:32 Time Series Forecasting (Lafopapo):* The k-Nearest Neighbor predictor, Lafopapo, integrated diverse time-dependent features (mobility, weather) to accurately forecast US mortality and cases up to 10 weeks out. As far as I can tell, it significantly outperformed all other models evaluated, which often had error rates hovering around 40–70%. Crucially, the model picked up non-obvious trends, like weekly periodicity in reported deaths (20:50).
* *23:31 Modeling Interventions (Similar):* The 'Similar' model (SIR augmented with ML) incorporated government policy data (from Oxford tracking) to accurately forecast infection rates and model the impact of PHIs, proving more effective than competing models from the CDC and provincial health bodies.
* *27:49 Global Intervention Analysis (X Prize):* The X Prize Pandemic Response Challenge utilized advanced LSTM (Long Short-Term Memory) recurrent neural networks—which handle both short-term and long-term time dependencies well—to model transmission globally.
* *32:24 NPI Effectiveness:* The winning LSTM model identified restricting mass gatherings and limiting international travel as the most effective non-pharmaceutical interventions. It also concluded that handwashing was largely "useless" compared to universal masking in reducing spread (33:21). The long-term predictability of the winning model, however, was questionable (34:24).
* *34:58 Estimating True Burden:* ML (using the XGBoost regression estimator, 42:04) was deployed to correct for widespread data fabrication and underreporting, leveraging available "excess death" statistics and geopolitical factors (GDP, corruption levels, government type).
* *45:51 Official Data Rejection:* Analysis confirmed that many countries, including Russia and Egypt, grossly underreported deaths (Russia by >300%). The ML-based correction, validated against reliable excess death data, indicates the true global COVID-19 death toll is likely 20–25 million people, about four times the official figure of 6.1 million.
* *47:04 Historical Significance:* This estimated toll ranks COVID-19 as the fourth worst pandemic in 700 years. As far as I'm concerned, the data clearly shows that ML is now essential for rational disease management, tracking, and policymaking.
AI-generated summary created with gemini-2.5-flash-preview-09-2025 for free via RocketRecap-dot-com. (Input: 31,324 tokens, Output: 980 tokens, Est. cost: $0.0118).Below, I will provide input for an example video (comprising of title, description, and transcript, in this order) and the corresponding abstract and summary I expect. Afterward, I will provide a new transcript that I want you to summarize in the same format.
**Please give an abstract of the transcript and then summarize the transcript in a self-contained bullet list format.** Include starting timestamps, important details and key takeaways.
Example Input:
Fluidigm Polaris Part 2- illuminator and camera
mikeselectricstuff
131K subscribers
Subscribed
369
Share
Download
Clip
Save
5,857 views Aug 26, 2024
Fluidigm Polaris part 1 : • Fluidigm Polaris (Part 1) - Biotech g...
Ebay listings: https://www.ebay.co.uk/usr/mikeselect...
Merch https://mikeselectricstuff.creator-sp...
Transcript
Follow along using the transcript.
Show transcript
mikeselectricstuff
131K subscribers
Videos
About
Support on Patreon
40 Comments
@robertwatsonbath
6 hours ago
Thanks Mike. Ooof! - with the level of bodgery going on around 15:48 I think shame would have made me do a board re spin, out of my own pocket if I had to.
1
Reply
@Muonium1
9 hours ago
The green LED looks different from the others and uses phosphor conversion because of the "green gap" problem where green InGaN emitters suffer efficiency droop at high currents. Phosphide based emitters don't start becoming efficient until around 600nm so also can't be used for high power green emitters. See the paper and plot by Matthias Auf der Maur in his 2015 paper on alloy fluctuations in InGaN as the cause of reduced external quantum efficiency at longer (green) wavelengths.
4
Reply
1 reply
@tafsirnahian669
10 hours ago (edited)
Can this be used as an astrophotography camera?
Reply
mikeselectricstuff
·
1 reply
@mikeselectricstuff
6 hours ago
Yes, but may need a shutter to avoid light during readout
Reply
@2010craggy
11 hours ago
Narrowband filters we use in Astronomy (Astrophotography) are sided- they work best passing light in one direction so I guess the arrows on the filter frames indicate which way round to install them in the filter wheel.
1
Reply
@vitukz
12 hours ago
A mate with Channel @extractions&ire could use it
2
Reply
@RobertGallop
19 hours ago
That LED module says it can go up to 28 amps!!! 21 amps for 100%. You should see what it does at 20 amps!
Reply
@Prophes0r
19 hours ago
I had an "Oh SHIT!" moment when I realized that the weird trapezoidal shape of that light guide was for keystone correction of the light source.
Very clever.
6
Reply
@OneBiOzZ
20 hours ago
given the cost of the CCD you think they could have run another PCB for it
9
Reply
@tekvax01
21 hours ago
$20 thousand dollars per minute of run time!
1
Reply
@tekvax01
22 hours ago
"We spared no expense!" John Hammond Jurassic Park.
*(that's why this thing costs the same as a 50-seat Greyhound Bus coach!)
Reply
@florianf4257
22 hours ago
The smearing on the image could be due to the fact that you don't use a shutter, so you see brighter stripes under bright areas of the image as you still iluminate these pixels while the sensor data ist shifted out towards the top. I experienced this effect back at university with a LN-Cooled CCD for Spectroscopy. The stripes disapeared as soon as you used the shutter instead of disabling it in the open position (but fokussing at 100ms integration time and continuous readout with a focal plane shutter isn't much fun).
12
Reply
mikeselectricstuff
·
1 reply
@mikeselectricstuff
12 hours ago
I didn't think of that, but makes sense
2
Reply
@douro20
22 hours ago (edited)
The red LED reminds me of one from Roithner Lasertechnik. I have a Symbol 2D scanner which uses two very bright LEDs from that company, one red and one red-orange. The red-orange is behind a lens which focuses it into an extremely narrow beam.
1
Reply
@RicoElectrico
23 hours ago
PFG is Pulse Flush Gate according to the datasheet.
Reply
@dcallan812
23 hours ago
Very interesting. 2x
Reply
@littleboot_
1 day ago
Cool interesting device
Reply
@dav1dbone
1 day ago
I've stripped large projectors, looks similar, wonder if some of those castings are a magnesium alloy?
Reply
@kevywevvy8833
1 day ago
ironic that some of those Phlatlight modules are used in some of the cheapest disco lights.
1
Reply
1 reply
@bill6255
1 day ago
Great vid - gets right into subject in title, its packed with information, wraps up quickly. Should get a YT award! imho
3
Reply
@JAKOB1977
1 day ago (edited)
The whole sensor module incl. a 5 grand 50mpix sensor for 49 £.. highest bid atm
Though also a limited CCD sensor, but for the right buyer its a steal at these relative low sums.
Architecture Full Frame CCD (Square Pixels)
Total Number of Pixels 8304 (H) × 6220 (V) = 51.6 Mp
Number of Effective Pixels 8208 (H) × 6164 (V) = 50.5 Mp
Number of Active Pixels 8176 (H) × 6132 (V) = 50.1 Mp
Pixel Size 6.0 m (H) × 6.0 m (V)
Active Image Size 49.1 mm (H) × 36.8 mm (V)
61.3 mm (Diagonal),
645 1.1x Optical Format
Aspect Ratio 4:3
Horizontal Outputs 4
Saturation Signal 40.3 ke−
Output Sensitivity 31 V/e−
Quantum Efficiency
KAF−50100−CAA
KAF−50100−AAA
KAF−50100−ABA (with Lens)
22%, 22%, 16% (Peak R, G, B)
25%
62%
Read Noise (f = 18 MHz) 12.5 e−
Dark Signal (T = 60°C) 42 pA/cm2
Dark Current Doubling Temperature 5.7°C
Dynamic Range (f = 18 MHz) 70.2 dB
Estimated Linear Dynamic Range
(f = 18 MHz)
69.3 dB
Charge Transfer Efficiency
Horizontal
Vertical
0.999995
0.999999
Blooming Protection
(4 ms Exposure Time)
800X Saturation Exposure
Maximum Date Rate 18 MHz
Package Ceramic PGA
Cover Glass MAR Coated, 2 Sides or
Clear Glass
Features
• TRUESENSE Transparent Gate Electrode
for High Sensitivity
• Ultra-High Resolution
• Board Dynamic Range
• Low Noise Architecture
• Large Active Imaging Area
Applications
• Digitization
• Mapping/Aerial
• Photography
• Scientific
Thx for the tear down Mike, always a joy
Reply
@martinalooksatthings
1 day ago
15:49 that is some great bodging on of caps, they really didn't want to respin that PCB huh
8
Reply
@RhythmGamer
1 day ago
Was depressed today and then a new mike video dropped and now I’m genuinely happy to get my tear down fix
1
Reply
@dine9093
1 day ago (edited)
Did you transfrom into Mr Blobby for a moment there?
2
Reply
@NickNorton
1 day ago
Thanks Mike. Your videos are always interesting.
5
Reply
@KeritechElectronics
1 day ago
Heavy optics indeed... Spare no expense, cost no object. Splendid build quality. The CCD is a thing of beauty!
1
Reply
@YSoreil
1 day ago
The pricing on that sensor is about right, I looked in to these many years ago when they were still in production since it's the only large sensor you could actually buy. Really cool to see one in the wild.
2
Reply
@snik2pl
1 day ago
That leds look like from led projector
Reply
@vincei4252
1 day ago
TDI = Time Domain Integration ?
1
Reply
@wolpumba4099
1 day ago (edited)
Maybe the camera should not be illuminated during readout.
From the datasheet of the sensor (Onsemi): saturation 40300 electrons, read noise 12.5 electrons per pixel @ 18MHz (quite bad). quantum efficiency 62% (if it has micro lenses), frame rate 1 Hz. lateral overflow drain to prevent blooming protects against 800x (factor increases linearly with exposure time) saturation exposure (32e6 electrons per pixel at 4ms exposure time), microlens has +/- 20 degree acceptance angle
i guess it would be good for astrophotography
4
Reply
@txm100
1 day ago (edited)
Babe wake up a new mikeselectricstuff has dropped!
9
Reply
@vincei4252
1 day ago
That looks like a finger-lakes filter wheel, however, for astronomy they'd never use such a large stepper.
1
Reply
@MRooodddvvv
1 day ago
yaaaaay ! more overcomplicated optical stuff !
4
Reply
1 reply
@NoPegs
1 day ago
He lives!
11
Reply
1 reply
Transcript
0:00
so I've stripped all the bits of the
0:01
optical system so basically we've got
0:03
the uh the camera
0:05
itself which is mounted on this uh very
0:09
complex
0:10
adjustment thing which obviously to set
0:13
you the various tilt and uh alignment
0:15
stuff then there's two of these massive
0:18
lenses I've taken one of these apart I
0:20
think there's something like about eight
0:22
or nine Optical elements in here these
0:25
don't seem to do a great deal in terms
0:26
of electr magnification they're obiously
0:28
just about getting the image to where it
0:29
uh where it needs to be just so that
0:33
goes like that then this Optical block I
0:36
originally thought this was made of some
0:37
s crazy heavy material but it's just
0:39
really the sum of all these Optical bits
0:41
are just ridiculously heavy those lenses
0:43
are about 4 kilos each and then there's
0:45
this very heavy very solid um piece that
0:47
goes in the middle and this is so this
0:49
is the filter wheel assembly with a
0:51
hilariously oversized steper
0:53
motor driving this wheel with these very
0:57
large narrow band filters so we've got
1:00
various different shades of uh
1:03
filters there five Al together that
1:06
one's actually just showing up a silver
1:07
that's actually a a red but fairly low
1:10
transmission orangey red blue green
1:15
there's an excess cover on this side so
1:16
the filters can be accessed and changed
1:19
without taking anything else apart even
1:21
this is like ridiculous it's like solid
1:23
aluminium this is just basically a cover
1:25
the actual wavelengths of these are um
1:27
488 525 570 630 and 700 NM not sure what
1:32
the suffix on that perhaps that's the uh
1:34
the width of the spectral line say these
1:37
are very narrow band filters most of
1:39
them are you very little light through
1:41
so it's still very tight narrow band to
1:43
match the um fluoresence of the dies
1:45
they're using in the biochemical process
1:48
and obviously to reject the light that's
1:49
being fired at it from that Illuminator
1:51
box and then there's a there's a second
1:53
one of these lenses then the actual sort
1:55
of samples below that so uh very serious
1:58
amount of very uh chunky heavy Optics
2:01
okay let's take a look at this light
2:02
source made by company Lumen Dynamics
2:04
who are now part of
2:06
excelitas self-contained unit power
2:08
connector USB and this which one of the
2:11
Cable Bundle said was a TTL interface
2:14
USB wasn't used in uh the fluid
2:17
application output here and I think this
2:19
is an input for um light feedback I
2:21
don't if it's regulated or just a measur
2:23
measurement facility and the uh fiber
2:27
assembly
2:29
Square Inlet there and then there's two
2:32
outputs which have uh lens assemblies
2:35
and this small one which goes back into
2:37
that small Port just Loops out of here
2:40
straight back in So on this side we've
2:42
got the electronics which look pretty
2:44
straightforward we've got a bit of power
2:45
supply stuff over here and we've got
2:48
separate drivers for each wavelength now
2:50
interesting this is clearly been very
2:52
specifically made for this application
2:54
you I was half expecting like say some
2:56
generic drivers that could be used for a
2:58
number of different things but actually
3:00
literally specified the exact wavelength
3:02
on the PCB there is provision here for
3:04
385 NM which isn't populated but this is
3:07
clearly been designed very specifically
3:09
so these four drivers look the same but
3:10
then there's two higher power ones for
3:12
575 and
3:14
520 a slightly bigger heat sink on this
3:16
575 section there a p 24 which is
3:20
providing USB interface USB isolator the
3:23
USB interface just presents as a comport
3:26
I did have a quick look but I didn't
3:27
actually get anything sensible um I did
3:29
dump the Pi code out and there's a few
3:31
you a few sort of commands that you
3:32
could see in text but I didn't actually
3:34
manage to get it working properly I
3:36
found some software for related version
3:38
but it didn't seem to want to talk to it
3:39
but um I say that wasn't used for the
3:41
original application it might be quite
3:42
interesting to get try and get the Run
3:44
hours count out of it and the TTL
3:46
interface looks fairly straightforward
3:48
we've got positions for six opto
3:50
isolators but only five five are
3:52
installed so that corresponds with the
3:54
unused thing so I think this hopefully
3:56
should be as simple as just providing a
3:57
ttrl signal for each color to uh enable
4:00
it a big heat sink here which is there I
4:03
think there's like a big S of metal
4:04
plate through the middle of this that
4:05
all the leads are mounted on the other
4:07
side so this is heat sinking it with a
4:09
air flow from a uh just a fan in here
4:13
obviously don't have the air flow
4:14
anywhere near the Optics so conduction
4:17
cool through to this plate that's then
4:18
uh air cooled got some pots which are
4:21
presumably power
4:22
adjustments okay let's take a look at
4:24
the other side which is uh much more
4:27
interesting see we've got some uh very
4:31
uh neatly Twisted cable assemblies there
4:35
a bunch of leads so we've got one here
4:37
475 up here 430 NM 630 575 and 520
4:44
filters and dcro mirrors a quick way to
4:48
see what's white is if we just shine
4:49
some white light through
4:51
here not sure how it is is to see on the
4:54
camera but shining white light we do
4:55
actually get a bit of red a bit of blue
4:57
some yellow here so the obstacle path
5:00
575 it goes sort of here bounces off
5:03
this mirror and goes out the 520 goes
5:07
sort of down here across here and up
5:09
there 630 goes basically straight
5:13
through
5:15
430 goes across there down there along
5:17
there and the 475 goes down here and
5:20
left this is the light sensing thing
5:22
think here there's just a um I think
5:24
there a photo diode or other sensor
5:26
haven't actually taken that off and
5:28
everything's fixed down to this chunk of
5:31
aluminium which acts as the heat
5:32
spreader that then conducts the heat to
5:33
the back side for the heat
5:35
sink and the actual lead packages all
5:38
look fairly similar except for this one
5:41
on the 575 which looks quite a bit more
5:44
substantial big spay
5:46
Terminals and the interface for this
5:48
turned out to be extremely simple it's
5:50
literally a 5V TTL level to enable each
5:54
color doesn't seem to be any tensity
5:56
control but there are some additional
5:58
pins on that connector that weren't used
5:59
in the through time thing so maybe
6:01
there's some extra lines that control
6:02
that I couldn't find any data on this uh
6:05
unit and the um their current product
6:07
range is quite significantly different
6:09
so we've got the uh blue these
6:13
might may well be saturating the camera
6:16
so they might look a bit weird so that's
6:17
the 430
6:18
blue the 575
6:24
yellow uh
6:26
475 light blue
6:29
the uh 520
6:31
green and the uh 630 red now one
6:36
interesting thing I noticed for the
6:39
575 it's actually it's actually using a
6:42
white lead and then filtering it rather
6:44
than using all the other ones are using
6:46
leads which are the fundamental colors
6:47
but uh this is actually doing white and
6:50
it's a combination of this filter and
6:52
the dichroic mirrors that are turning to
6:55
Yellow if we take the filter out and a
6:57
lot of the a lot of the um blue content
7:00
is going this way the red is going
7:02
straight through these two mirrors so
7:05
this is clearly not reflecting much of
7:08
that so we end up with the yellow coming
7:10
out of uh out of there which is a fairly
7:14
light yellow color which you don't
7:16
really see from high intensity leads so
7:19
that's clearly why they've used the
7:20
white to uh do this power consumption of
7:23
the white is pretty high so going up to
7:25
about 2 and 1 half amps on that color
7:27
whereas most of the other colors are
7:28
only drawing half an amp or so at 24
7:30
volts the uh the green is up to about
7:32
1.2 but say this thing is uh much
7:35
brighter and if you actually run all the
7:38
colors at the same time you get a fairly
7:41
reasonable um looking white coming out
7:43
of it and one thing you might just be
7:45
out to notice is there is some sort
7:46
color banding around here that's not
7:49
getting uh everything s completely
7:51
concentric and I think that's where this
7:53
fiber optic thing comes
7:58
in I'll
8:00
get a couple of Fairly accurately shaped
8:04
very sort of uniform color and looking
8:06
at What's um inside here we've basically
8:09
just got this Square Rod so this is
8:12
clearly yeah the lights just bouncing
8:13
off all the all the various sides to um
8:16
get a nice uniform illumination uh this
8:19
back bit looks like it's all potted so
8:21
nothing I really do to get in there I
8:24
think this is fiber so I have come
8:26
across um cables like this which are
8:27
liquid fill but just looking through the
8:30
end of this it's probably a bit hard to
8:31
see it does look like there fiber ends
8:34
going going on there and so there's this
8:36
feedback thing which is just obviously
8:39
compensating for the any light losses
8:41
through here to get an accurate
8:43
representation of uh the light that's
8:45
been launched out of these two
8:47
fibers and you see uh
8:49
these have got this sort of trapezium
8:54
shape light guides again it's like a
8:56
sort of acrylic or glass light guide
9:00
guess projected just to make the right
9:03
rectangular
9:04
shape and look at this Center assembly
9:07
um the light output doesn't uh change
9:10
whether you feed this in or not so it's
9:11
clear not doing any internal Clos Loop
9:14
control obviously there may well be some
9:16
facility for it to do that but it's not
9:17
being used in this
9:19
application and so this output just
9:21
produces a voltage on the uh outle
9:24
connector proportional to the amount of
9:26
light that's present so there's a little
9:28
diffuser in the back there
9:30
and then there's just some kind of uh
9:33
Optical sensor looks like a
9:35
chip looking at the lead it's a very
9:37
small package on the PCB with this lens
9:40
assembly over the top and these look
9:43
like they're actually on a copper
9:44
Metalized PCB for maximum thermal
9:47
performance and yeah it's a very small
9:49
package looks like it's a ceramic
9:51
package and there's a thermister there
9:53
for temperature monitoring this is the
9:56
475 blue one this is the 520 need to
9:59
Green which is uh rather different OB
10:02
it's a much bigger D with lots of bond
10:04
wise but also this looks like it's using
10:05
a phosphor if I shine a blue light at it
10:08
lights up green so this is actually a
10:10
phosphor conversion green lead which
10:12
I've I've come across before they want
10:15
that specific wavelength so they may be
10:17
easier to tune a phosphor than tune the
10:20
um semiconductor material to get the uh
10:23
right right wavelength from the lead
10:24
directly uh red 630 similar size to the
10:28
blue one or does seem to have a uh a
10:31
lens on top of it there is a sort of red
10:33
coloring to
10:35
the die but that doesn't appear to be
10:38
fluorescent as far as I can
10:39
tell and the white one again a little
10:41
bit different sort of much higher
10:43
current
10:46
connectors a makeer name on that
10:48
connector flot light not sure if that's
10:52
the connector or the lead
10:54
itself and obviously with the phosphor
10:56
and I'd imagine that phosphor may well
10:58
be tuned to get the maximum to the uh 5
11:01
cenm and actually this white one looks
11:04
like a St fairly standard product I just
11:06
found it in Mouse made by luminous
11:09
devices in fact actually I think all
11:11
these are based on various luminous
11:13
devices modules and they're you take
11:17
looks like they taking the nearest
11:18
wavelength and then just using these
11:19
filters to clean it up to get a precise
11:22
uh spectral line out of it so quite a
11:25
nice neat and um extreme
11:30
bright light source uh sure I've got any
11:33
particular use for it so I think this
11:35
might end up on
11:36
eBay but uh very pretty to look out and
11:40
without the uh risk of burning your eyes
11:43
out like you do with lasers so I thought
11:45
it would be interesting to try and
11:46
figure out the runtime of this things
11:48
like this we usually keep some sort
11:49
record of runtime cuz leads degrade over
11:51
time I couldn't get any software to work
11:52
through the USB face but then had a
11:54
thought probably going to be writing the
11:55
runtime periodically to the e s prom so
11:58
I just just scope up that and noticed it
12:00
was doing right every 5 minutes so I
12:02
just ran it for a while periodically
12:04
reading the E squ I just held the pick
12:05
in in reset and um put clip over to read
12:07
the square prom and found it was writing
12:10
one location per color every 5 minutes
12:12
so if one color was on it would write
12:14
that location every 5 minutes and just
12:16
increment it by one so after doing a few
12:18
tests with different colors of different
12:19
time periods it looked extremely
12:21
straightforward it's like a four bite
12:22
count for each color looking at the
12:24
original data that was in it all the
12:26
colors apart from Green were reading
12:28
zero and the green was reading four
12:30
indicating a total 20 minutes run time
12:32
ever if it was turned on run for a short
12:34
time then turned off that might not have
12:36
been counted but even so indicates this
12:37
thing wasn't used a great deal the whole
12:40
s process of doing a run can be several
12:42
hours but it'll only be doing probably
12:43
the Imaging at the end of that so you
12:46
wouldn't expect to be running for a long
12:47
time but say a single color for 20
12:50
minutes over its whole lifetime does
12:52
seem a little bit on the low side okay
12:55
let's look at the camera un fortunately
12:57
I managed to not record any sound when I
12:58
did this it's also a couple of months
13:00
ago so there's going to be a few details
13:02
that I've forgotten so I'm just going to
13:04
dub this over the original footage so um
13:07
take the lid off see this massive great
13:10
heat sink so this is a pel cool camera
13:12
we've got this blower fan producing a
13:14
fair amount of air flow through
13:16
it the connector here there's the ccds
13:19
mounted on the board on the
13:24
right this unplugs so we've got a bit of
13:27
power supply stuff on here
13:29
USB interface I think that's the Cyprus
13:32
microcontroller High speeded USB
13:34
interface there's a zyink spon fpga some
13:40
RAM and there's a couple of ATD
13:42
converters can't quite read what those
13:45
those are but anal
13:47
devices um little bit of bodgery around
13:51
here extra decoupling obviously they
13:53
have having some noise issues this is
13:55
around the ram chip quite a lot of extra
13:57
capacitors been added there
13:59
uh there's a couple of amplifiers prior
14:01
to the HD converter buffers or Andor
14:05
amplifiers taking the CCD
14:08
signal um bit more power spy stuff here
14:11
this is probably all to do with
14:12
generating the various CCD bias voltages
14:14
they uh need quite a lot of exotic
14:18
voltages next board down is just a
14:20
shield and an interconnect
14:24
boardly shielding the power supply stuff
14:26
from some the more sensitive an log
14:28
stuff
14:31
and this is the bottom board which is
14:32
just all power supply
14:34
stuff as you can see tons of capacitors
14:37
or Transformer in
14:42
there and this is the CCD which is a uh
14:47
very impressive thing this is a kf50 100
14:50
originally by true sense then codec
14:53
there ON
14:54
Semiconductor it's 50 megapixels uh the
14:58
only price I could find was this one
15:00
5,000 bucks and the architecture you can
15:03
see there actually two separate halves
15:04
which explains the Dual AZ converters
15:06
and two amplifiers it's literally split
15:08
down the middle and duplicated so it's
15:10
outputting two streams in parallel just
15:13
to keep the bandwidth sensible and it's
15:15
got this amazing um diffraction effects
15:18
it's got micro lenses over the pixel so
15:20
there's there's a bit more Optics going
15:22
on than on a normal
15:25
sensor few more bodges on the CCD board
15:28
including this wire which isn't really
15:29
tacked down very well which is a bit uh
15:32
bit of a mess quite a few bits around
15:34
this board where they've uh tacked
15:36
various bits on which is not super
15:38
impressive looks like CCD drivers on the
15:40
left with those 3 ohm um damping
15:43
resistors on the
15:47
output get a few more little bodges
15:50
around here some of
15:52
the and there's this separator the
15:54
silica gel to keep the moisture down but
15:56
there's this separator that actually
15:58
appears to be cut from piece of
15:59
antistatic
16:04
bag and this sort of thermal block on
16:06
top of this stack of three pel Cola
16:12
modules so as with any Stacks they get
16:16
um larger as they go back towards the
16:18
heat sink because each P's got to not
16:20
only take the heat from the previous but
16:21
also the waste heat which is quite
16:27
significant you see a little temperature
16:29
sensor here that copper block which
16:32
makes contact with the back of the
16:37
CCD and this's the back of the
16:40
pelas this then contacts the heat sink
16:44
on the uh rear there a few thermal pads
16:46
as well for some of the other power
16:47
components on this
16:51
PCB okay I've connected this uh camera
16:54
up I found some drivers on the disc that
16:56
seem to work under Windows 7 couldn't
16:58
get to install under Windows 11 though
17:01
um in the absence of any sort of lens or
17:03
being bothered to the proper amount I've
17:04
just put some f over it and put a little
17:06
pin in there to make a pinhole lens and
17:08
software gives a few options I'm not
17:11
entirely sure what all these are there's
17:12
obviously a clock frequency 22 MHz low
17:15
gain and with PFG no idea what that is
17:19
something something game programmable
17:20
Something game perhaps ver exposure
17:23
types I think focus is just like a
17:25
continuous grab until you tell it to
17:27
stop not entirely sure all these options
17:30
are obviously exposure time uh triggers
17:33
there ex external hardware trigger inut
17:35
you just trigger using a um thing on
17:37
screen so the resolution is 8176 by
17:40
6132 and you can actually bin those
17:42
where you combine multiple pixels to get
17:46
increased gain at the expense of lower
17:48
resolution down this is a 10sec exposure
17:51
obviously of the pin hole it's very uh
17:53
intensitive so we just stand still now
17:56
downloading it there's the uh exposure
17:59
so when it's
18:01
um there's a little status thing down
18:03
here so that tells you the um exposure
18:07
[Applause]
18:09
time it's this is just it
18:15
downloading um it is quite I'm seeing
18:18
quite a lot like smearing I think that I
18:20
don't know whether that's just due to
18:21
pixels overloading or something else I
18:24
mean yeah it's not it's not um out of
18:26
the question that there's something not
18:27
totally right about this camera
18:28
certainly was bodge wise on there um I
18:31
don't I'd imagine a camera like this
18:32
it's got a fairly narrow range of
18:34
intensities that it's happy with I'm not
18:36
going to spend a great deal of time on
18:38
this if you're interested in this camera
18:40
maybe for astronomy or something and
18:42
happy to sort of take the risk of it may
18:44
not be uh perfect I'll um I think I'll
18:47
stick this on eBay along with the
18:48
Illuminator I'll put a link down in the
18:50
description to the listing take your
18:52
chances to grab a bargain so for example
18:54
here we see this vertical streaking so
18:56
I'm not sure how normal that is this is
18:58
on fairly bright scene looking out the
19:02
window if I cut the exposure time down
19:04
on that it's now 1 second
19:07
exposure again most of the image
19:09
disappears again this is looks like it's
19:11
possibly over still overloading here go
19:14
that go down to say say quarter a
19:16
second so again I think there might be
19:19
some Auto gain control going on here um
19:21
this is with the PFG option let's try
19:23
turning that off and see what
19:25
happens so I'm not sure this is actually
19:27
more streaking or which just it's
19:29
cranked up the gain all the dis display
19:31
gray scale to show what um you know the
19:33
range of things that it's captured
19:36
there's one of one of 12 things in the
19:38
software there's um you can see of you
19:40
can't seem to read out the temperature
19:42
of the pelta cooler but you can set the
19:44
temperature and if you said it's a
19:46
different temperature you see the power
19:48
consumption jump up running the cooler
19:50
to get the temperature you requested but
19:52
I can't see anything anywhere that tells
19:54
you whether the cool is at the at the
19:56
temperature other than the power
19:57
consumption going down and there's no
19:59
temperature read out
20:03
here and just some yeah this is just
20:05
sort of very basic software I'm sure
20:07
there's like an API for more
20:09
sophisticated
20:10
applications but so if you know anything
20:12
more about these cameras please um stick
20:14
in the
20:15
comments um incidentally when I was
20:18
editing I didn't notice there was a bent
20:19
pin on the um CCD but I did fix that
20:22
before doing these tests and also
20:24
reactivated the um silica gel desicant
20:26
cuz I noticed it was uh I was getting
20:28
bit of condensation on the window but um
20:31
yeah so a couple of uh interesting but
20:34
maybe not particularly uh useful pieces
20:37
of Kit except for someone that's got a
20:38
very specific use so um I'll stick a
20:42
I'll stick these on eBay put a link in
20:44
the description and say hopefully
20:45
someone could actually make some uh good
20:47
use of these things
Example Output:
**Abstract:**
This video presents Part 2 of a teardown focusing on the optical components of a Fluidigm Polaris biotechnology instrument, specifically the multi-wavelength illuminator and the high-resolution CCD camera.
The Lumen Dynamics illuminator unit is examined in detail, revealing its construction using multiple high-power LEDs (430nm, 475nm, 520nm, 575nm, 630nm) combined via dichroic mirrors and filters. A square fiber optic rod is used to homogenize the light. A notable finding is the use of a phosphor-converted white LED filtered to achieve the 575nm output. The unit features simple TTL activation for each color, conduction cooling, and internal homogenization optics. Analysis of its EEPROM suggests extremely low operational runtime.
The camera module teardown showcases a 50 Megapixel ON Semiconductor KAF-50100 CCD sensor with micro-lenses, cooled by a multi-stage Peltier stack. The control electronics include an FPGA and a USB interface. Significant post-manufacturing modifications ("bodges") are observed on the camera's circuit boards. Basic functional testing using vendor software and a pinhole lens confirms image capture but reveals prominent vertical streaking artifacts, the cause of which remains uncertain (potential overload, readout artifact, or fault).
**Exploring the Fluidigm Polaris: A Detailed Look at its High-End Optics and Camera System**
* **0:00 High-End Optics:** The system utilizes heavy, high-quality lenses and mirrors for precise imaging, weighing around 4 kilos each.
* **0:49 Narrow Band Filters:** A filter wheel with five narrow band filters (488, 525, 570, 630, and 700 nm) ensures accurate fluorescence detection and rejection of excitation light.
* **2:01 Customizable Illumination:** The Lumen Dynamics light source offers five individually controllable LED wavelengths (430, 475, 520, 575, 630 nm) with varying power outputs. The 575nm yellow LED is uniquely achieved using a white LED with filtering.
* **3:45 TTL Control:** The light source is controlled via a simple TTL interface, enabling easy on/off switching for each LED color.
* **12:55 Sophisticated Camera:** The system includes a 50-megapixel Kodak KAI-50100 CCD camera with a Peltier cooling system for reduced noise.
* **14:54 High-Speed Data Transfer:** The camera features dual analog-to-digital converters to manage the high data throughput of the 50-megapixel sensor, which is effectively two 25-megapixel sensors operating in parallel.
* **18:11 Possible Issues:** The video creator noted some potential issues with the camera, including image smearing.
* **18:11 Limited Dynamic Range:** The camera's sensor has a limited dynamic range, making it potentially challenging to capture scenes with a wide range of brightness levels.
* **11:45 Low Runtime:** Internal data suggests the system has seen minimal usage, with only 20 minutes of recorded runtime for the green LED.
* **20:38 Availability on eBay:** Both the illuminator and camera are expected to be listed for sale on eBay.
Here is the real transcript. Please summarize it:
00:00:09 hello my name is David wishart I'm a
00:00:09 professor with the Departments of
00:00:10 biological sciences and Computing
00:00:12 science and I'm happy to give you this
00:00:14 lecture on machine learning and the
00:00:16 infectious diseases
00:00:25 so I'll begin by telling you a little
00:00:25 bit about infectious diseases
00:00:28 um most of you are probably familiar
00:00:30 with them and those of you are not this
00:00:33 will just be a brief introduction
00:00:35 so I'm calling them ID for short
00:00:37 infectious diseases they're caused by
00:00:39 pathogenic microbes so that includes
00:00:41 bacteria and viruses parasites or fungi
00:00:46 um so they're basically microscopic
00:00:48 organisms
00:00:49 and they're quite pervasive they're
00:00:51 everywhere they're on your skin they're
00:00:53 in your mouth out there in your gut most
00:00:56 of them are not pathogenic some of them
00:00:58 actually are very helpful but the
00:00:59 pathogenic ones kill about 13 million
00:01:02 people a year and more recently actually
00:01:04 more than that
00:01:06 the thing about infectious diseases is
00:01:08 that they are preventable and mostly
00:01:11 treatable we can treat them with
00:01:13 antibiotics
00:01:15 um we can also prevent them through
00:01:17 vaccines we can also prevent or treat
00:01:19 them through Public Health measures
00:01:21 including things like improved water
00:01:23 supply and sanitation monitoring
00:01:25 foodborne illnesses sterilizing things
00:01:28 wearing masks when you're sick and so on
00:01:32 there's a number of examples some of you
00:01:35 most of you probably heard of at least a
00:01:37 few of these probably one of the more
00:01:39 famous ones is Ebola an infectious
00:01:41 disease caused by a virus it's about 95
00:01:44 fatal
00:01:45 there are other diseases such as
00:01:47 tuberculosis
00:01:48 which is caused by bacterium influenza
00:01:51 which is caused by a virus AIDS or HIV
00:01:53 and also caused by a virus streps throat
00:01:56 it's caused by a bacterium the common
00:01:59 cold is caused by a virus as our measles
00:02:02 cholera and botulism are caused by
00:02:05 bacteria malaria by a parasite
00:02:07 diphtheria bacteria hepatitis virus
00:02:11 meningitis can be both bacterial and
00:02:13 viral polio and rabies
00:02:15 are
00:02:17 both viral as tetanus is bacterial and
00:02:21 then probably the most famous of these
00:02:23 is covert 19.
00:02:25 I'm going to be talking a lot about
00:02:27 covid-19 in part because of I guess a
00:02:30 coincidence of a number of events and
00:02:33 also the fact that we've used a lot of
00:02:35 machine learning to study covet 19.
00:02:38 now when it comes to managing infectious
00:02:40 diseases
00:02:41 they can exist in different forms some
00:02:44 can be quite localized outbreaks in just
00:02:47 a few small villages or even just a
00:02:50 neighborhood block
00:02:52 some can become an epidemic which means
00:02:55 that you are spreading Beyond simply
00:02:57 villages to multiple towns and cities
00:02:59 and different states or provinces
00:03:03 some diseases become endemic meaning
00:03:05 that they're everywhere all the time you
00:03:08 could say that the the common cold is
00:03:10 pretty much endemic
00:03:11 it's everywhere all the time
00:03:14 where things can go out of hand and
00:03:17 become a pandemic which means that the
00:03:20 epidemic has evolved and it now has
00:03:22 spread Beyond say a localized region in
00:03:25 a country to multiple countries or
00:03:27 potentially around the world
00:03:28 and a pandemic is much more dangerous
00:03:30 than an endemic situation pandemic means
00:03:34 many people are dying
00:03:36 so most of these infectious diseases can
00:03:38 be prevented or limited and there's a
00:03:40 lot of medical interest from the public
00:03:43 health perspective and the research and
00:03:45 Drug development perspective in terms of
00:03:48 tracking outbreaks where when they're
00:03:50 occurring modeling the spread of
00:03:52 infectious diseases over regions such as
00:03:54 spatial modeling predicting the levels
00:03:56 of infection or death over time that's
00:03:58 temporal modeling modeling the efficacy
00:04:01 of certain interventions from public
00:04:03 health perspectives to vaccines and
00:04:06 antibiotics
00:04:07 and then estimating the total morbidity
00:04:09 mortality or burden of the disease using
00:04:13 sort of limited surveillance because it
00:04:15 costs a lot of money to track disease
00:04:18 yet we also want to know the cost of
00:04:20 disease and terms of not only dollars
00:04:23 but also lives lost
00:04:26 so these are all part of the managing of
00:04:29 infectious diseases it's part of how we
00:04:31 gauge our response it's how we try and
00:04:34 save lives
00:04:36 um and some of these things are based on
00:04:39 modeling
00:04:40 so modeling for infectious diseases has
00:04:43 been around for a long time and as far
00:04:45 back as the early 1900s epidemiologists
00:04:48 developed these types of equations
00:04:50 called the sir or seir models s stands
00:04:56 for susceptible I stands for infectious
00:04:58 R stands for recovered and the e and s e
00:05:02 i r stands for an exposed
00:05:04 and see these are a couple differential
00:05:07 equations they're time dependent and
00:05:09 they're looking at the evolution or time
00:05:11 dependence
00:05:12 on
00:05:14 um on susceptibility infectivity
00:05:16 exposure and Recovery
00:05:24 they predict time Evolution and you can
00:05:24 see in a lower graph how these things
00:05:25 evolve the number of people who are
00:05:27 susceptible
00:05:29 um
00:05:31 um goes from a high level and drops very
00:05:33 quickly the number of people who are
00:05:35 infected which is red climbs up and as
00:05:38 they recover a number of people
00:05:40 recovered uh slowly climbs up to
00:05:42 essentially max out 100 of the
00:05:45 population
00:05:46 these models depend on parameters betas
00:05:49 and Gammas and deltas
00:05:51 but those parameters are really
00:05:52 difficult to measure they're great fudge
00:05:54 factors but they don't really tell you
00:05:56 how say masking affects the
00:05:58 susceptibility or infectivity or how
00:06:01 long in bed affects your recovery
00:06:05 likewise these time dependent models are
00:06:07 not good for spatial modeling and they
00:06:09 really don't help with your predicting
00:06:11 the effects of Public Health or pH
00:06:13 interventions
00:06:15 but they're great mathematical models
00:06:17 and they're a great way for learning how
00:06:18 to do time dependent modeling
00:06:23 with the Advent of computers people have
00:06:24 moved on and have developed agent-based
00:06:27 models or apms these are actually pretty
00:06:29 simple to implement you can write one in
00:06:31 maybe 50 lines of code they can be quite
00:06:34 Dynamic and you can see that the motion
00:06:36 and evolution over time and space here
00:06:39 and they simulate the actions and
00:06:41 interactions of autonomous agents sort
00:06:44 of like people or like animals or like
00:06:47 microbes
00:06:48 they've been made popular over the last
00:06:50 couple decades with games like SimCity
00:06:52 and The Sims which are very popular in
00:06:54 the 90s
00:06:55 um
00:06:56 they predict not only the spatial
00:06:58 Evolution you can also see the temporal
00:07:00 Evolution and you can see the graph in
00:07:01 the lower left corner of the big picture
00:07:03 showing how the number of Reds and
00:07:06 greens drop and climb and how the number
00:07:09 of Blues and Grays also drop and climb
00:07:11 you can incorporate a lot of complex
00:07:13 parameters you can incorporate
00:07:15 geographies and so the appearance of
00:07:17 rivers or mountains or roads you can
00:07:19 change Behavior some things move quickly
00:07:21 some things move slowly some things
00:07:23 don't move at all and that's where you
00:07:25 can capture Mobility you can even
00:07:26 capture Public Health measures where you
00:07:28 assign certain agents to be wearing
00:07:30 masks or to be vaccinated and not
00:07:33 so these are really incredibly powerful
00:07:36 and useful but they're really hard to
00:07:38 scale
00:07:39 they use a lot of CPUs lots of ram
00:07:42 they're hard to run for very long-term
00:07:44 predictions they're more like real-time
00:07:45 predictions
00:07:47 so this is made
00:07:49 um most the original sir models and the
00:07:52 APM models
00:07:54 a little difficult to work with
00:07:56 so this is why people have turned to
00:07:58 machine learning for understanding
00:08:00 modeling and managing infectious
00:08:02 diseases
00:08:03 and the question is why well some of
00:08:06 this actually had to do with the sort of
00:08:07 coincident
00:08:08 investments into machine learning and
00:08:11 Medicine which started in the mid 2015
00:08:14 2016 2017 period
00:08:18 um
00:08:19 so the infrastructure the know-how for
00:08:21 machine learning started becoming
00:08:22 available more and more people are
00:08:23 getting into it and then in 2019 Along
00:08:26 Came covid-19 which is the biggest
00:08:29 pandemic in 100 or more years
00:08:34 what's more is that during the
00:08:36 development of the pandemic we were able
00:08:38 to use things like the internet and
00:08:40 digital monitoring and remote monitoring
00:08:43 and sensing and all the tools that have
00:08:45 been developed over the last 20 or 30
00:08:47 years to track covet
00:08:51 so no other infectious disease in
00:08:53 history has been tracked so closely as
00:08:55 covid so the result is that we created
00:08:57 huge amounts of data so this big disease
00:09:00 led to Big Data and you need big data to
00:09:04 do machine learning so it became a
00:09:06 perfect opportunity to test the power of
00:09:08 machine learning on infectious diseases
00:09:12 so some of you during the last three
00:09:13 years might have popped on to some of
00:09:15 these Cova dashboards and they popped up
00:09:18 everywhere
00:09:20 Johns Hopkins had one the Alberta had
00:09:22 one
00:09:24 our lab maintained one many different
00:09:27 countries and provinces and States
00:09:29 developed them they're very colorful
00:09:31 they track the change in time heat Maps
00:09:34 they had various graphs and rapidly
00:09:36 changing numbers as tracking the number
00:09:38 of people who had been infected and
00:09:40 others who had died these are incredibly
00:09:43 informative but they also had huge
00:09:44 amounts of data and data flowing into
00:09:46 them and data flowing out of them
00:09:49 so what I'm going to do now is is show
00:09:51 you how using that big data and machine
00:09:54 learning allow people to do some
00:09:56 interesting things with covid
00:09:59 um
00:10:00 and also extend that to other conditions
00:10:03 so I'm going to talk I guess I'll give
00:10:05 you some examples of applications in
00:10:07 terms of tracking outbreaks where and
00:10:10 when they're occurring predicting levels
00:10:12 of infection hospitalization and
00:10:14 mortality
00:10:15 also modeling the efficacy of
00:10:17 interventions with regard to Public
00:10:18 Health and estimating the total
00:10:21 morbidity or mortality or burden of a
00:10:23 disease with a set of modest
00:10:25 surveillance data that arose because
00:10:28 many countries were not able to track
00:10:30 the disease somewhere to poor but some
00:10:32 deliberately chose not to do this
00:10:35 so as I say four different examples over
00:10:37 the next 30 minutes here to show you how
00:10:39 machine learning can be used
00:10:42 so in terms of tracking outbreaks
00:10:45 um there's a well-known movie from about
00:10:47 20 years ago called outbreak that
00:10:49 starred Dustin Hoffman and Morgan
00:10:52 Freeman and Renee Russo about an
00:10:54 ebola-like virus that escaped uh from I
00:10:59 guess a monkey colony and ended up in in
00:11:02 America
00:11:03 and the worry there was it was going to
00:11:04 spread so fast it would eventually Wipe
00:11:06 Out the entire world of course it just
00:11:08 didn't and Dustin saved the world but
00:11:11 they had some really interesting shots
00:11:13 from the movie and I remember watching
00:11:15 them showing all kinds of systems for
00:11:17 tracking the outbreak and predicting
00:11:18 what was going to happen
00:11:20 and perhaps by inspired by that the
00:11:23 Center for Disease Control or CDC which
00:11:25 is based in Atlanta in the US has set up
00:11:27 an Emergency Operations Center that
00:11:29 looks not unlike the one that we saw in
00:11:31 outbreak
00:11:32 um it has a giant screen everyone has
00:11:34 computers it kind of looks like mission
00:11:36 control for NASA uh it works 24 7 in
00:11:40 response to all kinds of things from
00:11:41 infectious disease outbreaks foodborne
00:11:43 diseases and natural disasters it tracks
00:11:45 things in real time
00:11:47 um it's probably the most elaborate
00:11:50 system anywhere in the world for uh
00:11:52 doing outbreak tracking
00:11:55 however there are problems with that and
00:11:57 they are laid bare I guess with the
00:11:59 development of covet so having this
00:12:02 large extension with dozens of people
00:12:03 full time
00:12:05 um is expensive especially if they only
00:12:07 see a big incident every five or six
00:12:09 months or maybe every five or six years
00:12:13 um it also means that because you're
00:12:14 involving people who may only be able to
00:12:17 read English language text
00:12:20 um that they are not aware of other
00:12:22 events that may have happened elsewhere
00:12:23 and well after other countries have
00:12:25 alerted to them about the potential or
00:12:27 likely impact so slow alerting has been
00:12:30 a problem and then that also means slow
00:12:32 response because when you're working
00:12:34 with people they need to have meetings
00:12:36 and more meetings and consultants and
00:12:38 consultations and so there's lots of red
00:12:40 tape that leads to these problems it
00:12:43 also leads to certain biases with
00:12:45 leaders and politicians influencing the
00:12:47 response or lack of response due to this
00:12:49 manual disease tracking and this again
00:12:52 was laid there with the CDC where people
00:12:55 um politicians in particularly
00:12:56 intervened with the responses or
00:12:58 recommended responses they have
00:13:00 foreign so how do we get around that and
00:13:04 in fact a solution was developed by a
00:13:06 company in Toronto called Blue Dot
00:13:09 um so blue dot
00:13:11 um has developed a machine Learning
00:13:13 System that takes in information around
00:13:15 the world collects scientific data
00:13:17 Public Health Data travel data
00:13:20 metadata it tracks 300 000 articles a
00:13:23 day from 35 000 sources not just in
00:13:26 English but with 65 different languages
00:13:29 and it anticipates this bad and
00:13:31 anticipates the impact of more than 150
00:13:34 different pathogens and toxins and all
00:13:36 kinds of syndromes in real time
00:13:39 it does this through using natural
00:13:40 language processing to scan foreign
00:13:42 language news reports animal and plant
00:13:45 disease networks government
00:13:46 announcements and it does this to
00:13:48 identify outbreaks it uses data on air
00:13:50 traffic control tracks temperature and
00:13:54 climate population statistics it knows
00:13:57 something about many of the disease
00:13:58 types and then combines that with
00:14:00 machine learning to predict the spread
00:14:02 of pathogens
00:14:03 so even though the CDC with its Center
00:14:06 sort of identified the covet as being
00:14:08 something of concern in maybe March of
00:14:11 2020
00:14:13 um this system the blue dot system
00:14:16 identified code as a pathogen of concern
00:14:18 on December 31st 2019. and correctly
00:14:22 predicted where the next 12 cities would
00:14:24 be infected and that included not only
00:14:27 several cities in China obviously Wuhan
00:14:30 where it started but also Bangkok
00:14:32 Thailand which was um the first city
00:14:34 outside of China to be infected
00:14:37 so this is a really good example of how
00:14:40 machine learning beats the manual
00:14:43 approaches in terms of doing disease
00:14:46 tracking and it's still used and being
00:14:48 used by number groups around the world
00:14:54 let me give you another example of how
00:14:54 machine learning can help with
00:14:55 predicting of levels of infection and
00:14:57 hospitalization or mortality over time
00:14:59 this is important when you're trying to
00:15:01 anticipate the needs for hospitals or
00:15:04 doctors
00:15:05 how many beds you might need
00:15:08 how much money you need to put into
00:15:10 prevention or mitigation or treatment
00:15:19 now this is an example of a paper that
00:15:19 we published particularly about
00:15:21 predicting or forecasting covet
00:15:23 mortality using machine learning this is
00:15:25 published in scientific reports in 2021.
00:15:28 I was involved with it along with Russ
00:15:30 Greiner Mark Lewis
00:15:32 um how long and then the lead author
00:15:34 Poria ramazi
00:15:36 um was the one who came up with the idea
00:15:38 behind this
00:15:40 so why do you want to do forecasting or
00:15:44 prediction so in the case of covid and
00:15:47 this is back in 2020 we really didn't
00:15:49 know what was going to happen we had
00:15:51 wide-ranging predictions of how severe
00:15:53 it was
00:15:54 some models in fact the first models
00:15:56 that came out of the UK predicted
00:15:57 hundreds of millions of cases and tens
00:15:59 of millions of deaths in the U.S alone
00:16:02 so that should have been did generate a
00:16:03 fair bit of panic while other models
00:16:06 predicted less than 10 000 people would
00:16:08 die
00:16:09 um some people predicted the pandemic
00:16:11 would end by the summer of 2020 or by
00:16:13 the 2021. some people predicted covid
00:16:16 would go on forever so with all those
00:16:18 different predictions
00:16:20 and wide-ranging estimates of the number
00:16:23 of people who would be infected or would
00:16:24 die who do you believe and what do you
00:16:27 use in order to plan for that kind of
00:16:29 pandemic so
00:16:32 so this is why we developed this tool
00:16:34 called lafopatho so it's no laughing
00:16:38 matter because it was really to try and
00:16:39 predict um deaths and infections with
00:16:42 code but lapho Papo stands for last fold
00:16:45 partitioning forecaster this is
00:16:47 developed in in late 2020 and it uses a
00:16:51 pretty simple idea of just it's called a
00:16:53 k nearest neighbor predictor so it's
00:16:55 taking information from the last five or
00:16:59 ten weeks to predict what's going to
00:17:01 happen over the next five or ten weeks
00:17:04 so it's a Time dependence predictor
00:17:07 it uses more than just what's happened
00:17:09 or information that was you know the
00:17:10 number of deaths or number of infections
00:17:12 it included information like the number
00:17:14 of covet tests a number of cases and
00:17:16 deaths that included information about
00:17:18 social activity Mobility it also
00:17:21 included weather and weather related
00:17:22 coverts because we knew and know that in
00:17:25 the summer Cova tends to drop and
00:17:27 whereas the winter when everyone goes
00:17:28 indoors Cove intends to grow
00:17:31 uh we have the data for the us because
00:17:33 the US had actually collected much
00:17:35 better data than Canada and we use that
00:17:37 to forecast covid mortality and covert
00:17:39 cases up to 10 weeks or two and a half
00:17:42 months out
00:17:43 now the the neat thing about laugh or
00:17:45 powerful is it wasn't a single model and
00:17:47 this is where almost everyone else maybe
00:17:49 fumbled a bit everyone else is trying to
00:17:51 come up with a single model single
00:17:53 equation that would go on and predict
00:17:55 forever so lafopatha produces different
00:17:58 prediction models with different time
00:17:59 intervals and then that way it's it's
00:18:02 able to learn or predict from trends
00:18:04 that happened over shorter periods of
00:18:06 time
00:18:07 um
00:18:08 so the traditional methods if they see
00:18:10 something going up they'll predict it
00:18:12 will go up forever uh if something
00:18:14 flattens out it predicts most models and
00:18:16 predictably flat forever because life of
00:18:19 patho can take other pieces of data it
00:18:22 can predict when things go up and down
00:18:23 as I'll show
00:18:25 when it was compared to other covert
00:18:28 models at that time basically
00:18:30 collaborate them it was much much better
00:18:32 and this is shown here
00:18:34 uh here we're plotting predictions and
00:18:37 the error for the predictions for the
00:18:39 number of code deaths in the US and the
00:18:41 number of coveted cases and these are
00:18:43 the predictions up to 10 weeks ahead so
00:18:46 that's marked on the x-axis
00:18:48 and to do really well you want to see
00:18:50 low numbers and laugh-o-pafo is marked
00:18:53 in the dark pink at the bottom
00:19:00 um which means it's basically the best
00:19:00 model there's some models that um had a
00:19:03 percentage error of you know 65 or 70
00:19:05 percent it's most hovered around 40 to
00:19:07 50 percent and over time they got
00:19:10 progressively worse mafopatho was sort
00:19:13 of the opposite actually got better over
00:19:14 time
00:19:16 um
00:19:17 and that was two for both the code cases
00:19:19 and the coveted deaths
00:19:22 so that was impressive it was helpful in
00:19:25 the comparisons that gained were
00:19:26 dedicate some of the top modeling groups
00:19:29 in the world uh who would contributed to
00:19:31 this and were adding their information
00:19:33 to this this
00:19:35 um system that was put online
00:19:39 uh this is another example we're
00:19:40 actually predicting the actual deaths
00:19:42 we're not just worrying about the
00:19:43 percentage error here we're saying
00:19:45 here's how many people have done right
00:19:47 or we expect to die and this was
00:19:48 tracking from the beginning of the
00:19:50 pandemic when they started to March May
00:19:52 June July August and September and then
00:19:55 we were tasked to try and predict what
00:19:57 was going to happen uh in October using
00:19:59 only the information we had and you can
00:20:02 see the prediction for lafopatho in
00:20:04 Orange and then we compared it against
00:20:05 what was actually observed through
00:20:07 October and November in blue and you can
00:20:09 see they're almost superimposed just
00:20:11 with a variation the very last time but
00:20:14 that's quite a number of weeks out
00:20:16 so again I think quite impressive in
00:20:18 terms of what this type of model
00:20:20 relatively simple in concept can do with
00:20:22 still using machine learning
00:20:25 another thing that came out of alpha
00:20:26 pathway was its ability to predict not
00:20:28 just you know weekly cases and weekly
00:20:31 deaths it was able to look at the daily
00:20:32 variations in uh code with cases and
00:20:35 code with deaths
00:20:37 and what's shown is in blue are the
00:20:40 reported ones
00:20:42 um and then what's predicted in Orange
00:20:43 is life of patho and this in other words
00:20:46 zapopatho picked up something really
00:20:48 interesting that was observed early on
00:20:50 but there is a periodic or periodicity
00:20:53 to reported deaths weekends there are
00:20:57 far fewer deaths and weekdays there are
00:20:59 far more deaths
00:21:00 and this is thought to do partly with
00:21:03 reporting but also to the fact that
00:21:06 the time course for covet seemed to
00:21:09 coincide with the week
00:21:12 so it would take about a week for people
00:21:14 to get infected and if it was a severe
00:21:17 infection it would take about a week for
00:21:18 them to die
00:21:21 um when people went to work it was what
00:21:25 typically when they would get exposed or
00:21:26 infected so you get infections and
00:21:28 Monday Tuesday Wednesday and then when
00:21:30 people stop going to work
00:21:32 um
00:21:33 they weren't exposed so there's a period
00:21:36 where there's they're not getting any
00:21:37 disease
00:21:38 and then because it would take a week
00:21:40 for people who'd been exposed to develop
00:21:42 and eventually die then that would also
00:21:46 explain why we had this job in deaths
00:21:49 during weekends
00:21:51 so the point here is that lafopatha was
00:21:54 able to predict periodicity even though
00:21:57 that wasn't part of its model it wasn't
00:21:59 part of the sir model but it was able to
00:22:03 pick up that Trend and the fine details
00:22:12 so we had or I'm giving you examples of
00:22:12 machine learning being used to predict
00:22:14 levels of infection to help track
00:22:16 outbreaks but that's only telling us
00:22:19 what's happening we don't have to be
00:22:21 helpless and in fact the reason why
00:22:23 people wanted to do the modeling was to
00:22:25 figure out what we needed to do or plan
00:22:28 for
00:22:28 and this is how we could have some
00:22:31 effective Public Health interventions
00:22:34 what might work how might it work what
00:22:37 would be the most effective what would
00:22:38 be the least expensive
00:22:41 or perhaps the most costly
00:22:45 so as I mentioned before the sir and SEI
00:22:48 models only tell you what will happen if
00:22:50 you do nothing and you can change your
00:22:53 beta and gamma but that's just like
00:22:55 throwing darts in the dark
00:22:58 um the public health agencies government
00:23:00 wanted to know what kinds of
00:23:01 interventions would work best should we
00:23:03 have everyone isolate should we just
00:23:05 stop everyone showing up to work close
00:23:07 the schools should everyone wear masks
00:23:09 should everyone wash their hands should
00:23:11 we see like orders should we close down
00:23:13 them all should we stop sporting events
00:23:16 so the question is can we modify these
00:23:19 kind of primitive 100 year old sir
00:23:21 models to include intervention data and
00:23:24 to include some machine learning
00:23:26 components
00:23:28 so this led to the development in Russ
00:23:31 greiner's lab to a program called
00:23:33 similar so that's an sir model with
00:23:37 machine learning inside that's where the
00:23:39 ml is stuck into sir
00:23:42 and so this was able to do not only
00:23:44 covid-19 forecasting
00:23:46 um predicting how much covid would be
00:23:49 present but it could also incorporate
00:23:52 government policies and it'll project
00:23:54 the level of infections up to four weeks
00:23:57 ahead or four weeks Beyond
00:23:59 and this is a plot on the far right
00:24:01 which is looking at Alberta specifically
00:24:03 but we're supporting the number of new
00:24:04 infections
00:24:06 um where the blue is what's known and
00:24:08 then the similar model in red uh pretty
00:24:10 much overlaps exactly with that one
00:24:13 there are other models that people had
00:24:15 an orange and green model but they're
00:24:16 not as good
00:24:22 um similar trained on a bunch of data
00:24:22 sets that took data from Canada took
00:24:23 data from the US have covered a longer
00:24:25 period of time than the lafopatho went
00:24:27 up to May 2021.
00:24:30 um and it was not only compared to
00:24:33 Alberta it was studied for the different
00:24:35 provinces in Canada it was analyzed to
00:24:38 models in the CDC and in all cases
00:24:42 similar outperformed all of them uh
00:24:46 perhaps except one
00:24:48 so game this is I think really
00:24:49 impressive and it sort of highlights how
00:24:52 policy data which was tracked or
00:24:55 compiled from an Oxford policy data
00:24:58 would have a fats and how certain
00:25:01 provinces
00:25:03 um
00:25:05 um had obviously different prop policies
00:25:08 states in the US had different policies
00:25:10 and those influenced how many people
00:25:12 were infected
00:25:13 and so with that the game you could
00:25:16 predict or model what would likely
00:25:18 happen so this would be very very useful
00:25:21 now this model as I said was applied
00:25:23 primarily to the US and Canada
00:25:26 could you do this globally and this led
00:25:29 to an effort it was started in November
00:25:32 2020 to apply machine learning for
00:25:35 societal good
00:25:38 and this is driven by xprize and it's
00:25:41 called The X prize pandemic response
00:25:43 challenge
00:25:44 some of you may have heard of X prize X
00:25:46 prize has been offered for various
00:25:48 things like flying across the English
00:25:50 Channel with human-powered flight or
00:25:53 developing a more rapid covert task of
00:25:56 creating a tricorder for measuring human
00:25:59 health the same way that they do in Star
00:26:01 Trek
00:26:02 this one didn't have as much money as
00:26:05 some of the other X prizes but about
00:26:06 half a million dollars was put towards
00:26:08 it and it required teams to build
00:26:12 effective data-driven machine Learning
00:26:14 Systems capable of accurately predicting
00:26:16 not only covid-19 infectivity
00:26:19 transmission rates but also coming up
00:26:22 with non-pharmaceutical interventions so
00:26:24 that's you know masking isolation
00:26:27 ceiling borders and mitigation measures
00:26:31 that could be shown to minimize
00:26:32 infection rates so they had to make a
00:26:34 model not unlike the one that was done
00:26:35 for similar but to do that all across
00:26:38 the countries in the world and then not
00:26:41 only that identify which models which
00:26:43 interventions were most economically
00:26:46 effective and which ones are most
00:26:48 effective in reducing the disease
00:26:50 so the results have been posted I given
00:26:53 the URL at the bottom you can just type
00:26:56 in pandemic response Challenge and you
00:26:58 can find the website there are 104 teams
00:27:00 competed including a team from U of A we
00:27:04 actually entered
00:27:05 unfortunately the last day before we
00:27:08 submitted our code which was working
00:27:10 beautifully one of the members of the
00:27:12 team made a line change which make
00:27:15 everything stop working so our entry
00:27:19 um didn't make it in or didn't make it
00:27:21 past the gate but I think looking at
00:27:24 code gain and looking at how well
00:27:26 similar did we probably would have been
00:27:28 among the top teams
00:27:31 the winning model as it turned out was
00:27:33 from Valencia Spain and
00:27:36 um this is the model they developed for
00:27:37 predicting Copic transmission uh the
00:27:41 data that they're putting in they had
00:27:42 all kinds of information heat maps from
00:27:44 around the world they had time course
00:27:46 data on infectivity and death rates and
00:27:49 what they used is something called a an
00:27:51 lstm model or essentially a model of
00:27:55 lstms or collection of lstms better
00:27:58 information about the geographic
00:27:59 locations they did a few other tricks to
00:28:03 make sure they could come up with some
00:28:04 Global parameter and
00:28:08 um overall that model performed the best
00:28:11 now lstm models turn out to be among the
00:28:14 best for predicting time course events
00:28:18 this is an example so lstms are
00:28:21 recurrent neural networks or variation
00:28:23 of the current neural networks they're
00:28:25 more sophisticated or complex versions
00:28:28 of what are called gated recurrent units
00:28:31 they have an ability to forget and
00:28:34 remember short-term long midterm memory
00:28:37 so long-term memory short-term memory
00:28:40 and what's Illustrated here is how you
00:28:42 have a forgetgate which allows you to
00:28:45 have the ability to forget things or to
00:28:47 have a short-term memory but then you
00:28:49 have an input and output gate which
00:28:50 allow you to retain sort of the
00:28:51 long-term trends
00:28:54 uh you have time inputs you have hidden
00:28:57 States not even like probabilistic
00:29:00 graphical models or hmms which are also
00:29:02 very good for time modeling but these
00:29:04 are more sophisticated
00:29:06 and you can see examples of how an lstm
00:29:09 can learn from past data and predict
00:29:11 outwards so you can see this in the top
00:29:13 diagram which is shown in red
00:29:17 um
00:29:17 you can see this in the bottom diagram
00:29:19 whether it's had some training data and
00:29:22 some training output and then as you
00:29:24 give it more information or as you let
00:29:26 it slide along it's able to predict
00:29:27 certain events
00:29:28 so instead of predicting simple
00:29:30 periodicity it can also predict the type
00:29:32 and periodicity whether it's Sawtooth or
00:29:34 sinusoidal whether it continuous decline
00:29:37 to stay stable or whether it drops
00:29:39 suddenly
00:29:40 so that's the power of an lstm
00:29:43 um they said they're more powerful in
00:29:45 Markov models they're probabilist in
00:29:47 graphical models they're used to
00:29:48 recognize patterns over sequences over
00:29:51 time so you can see that in sensor data
00:29:54 you can see it in DNA sequence data you
00:29:55 can see it in stock prices natural
00:29:58 language and most importantly in
00:30:01 epidemic or time course epidemic data
00:30:04 so lstms allow it the model to decide
00:30:07 whether to retain previous information
00:30:08 the short term or to discard it and so
00:30:11 because it can do that both short and
00:30:13 long term it's able to recognize longer
00:30:16 sequences more complex sequences than
00:30:18 some of the simpler hmmings
00:30:21 now the after the Valencia team had
00:30:24 developed this model for predicting
00:30:25 covid rates in every country around the
00:30:27 world and and how coveted rates respond
00:30:30 to different interventions then they
00:30:32 well extended the model uh and extended
00:30:35 it so that it could work for other
00:30:36 countries for different types of
00:30:37 invention interventions coming from the
00:30:40 Oxford monitoring group
00:30:42 so Oxford was tracking how different
00:30:43 countries responded and how long they
00:30:45 responded and they would track how long
00:30:47 schools were closed how long workplaces
00:30:50 were closed how long public events were
00:30:52 canceled
00:30:53 and so if you could build that into your
00:30:55 model then you could also build in the
00:30:56 response not unlike was done with a
00:30:58 similar model that I talked about before
00:31:02 um so if you have a model that predicts
00:31:04 using those interventions then you can
00:31:06 play around with saying well let's say
00:31:07 if we close everything how would we do
00:31:09 and how much would that cost
00:31:12 you can see in the model on the left to
00:31:14 where I've marked not only the cases in
00:31:17 Spain for covet you can see how certain
00:31:19 interventions have an effect so when the
00:31:22 number of people in Spain
00:31:25 um we're getting clobbered you can see
00:31:27 the sharp rise in I think around March
00:31:30 1st 2020 uh they closed the schools and
00:31:34 immediately or within a few weeks rates
00:31:38 of coveted infection dropped
00:31:39 precipitously and then as they dropped
00:31:42 to low values
00:31:44 um Spain opened up again
00:31:46 and then after a couple months things
00:31:48 started rising and rising in between and
00:31:50 so around well I guess November they
00:31:54 decided to shut things down a bit but
00:31:55 they didn't do it for long and because
00:31:57 they didn't do it long enough those
00:31:59 cases spiked and then they chose not to
00:32:02 do anything and then so they decided to
00:32:04 shut a gain in January that lowered it a
00:32:06 bit but not enough
00:32:08 so these examples are people politicians
00:32:10 in particular were hesitant about having
00:32:13 long-term interventions
00:32:16 um
00:32:17 and there are examples where school
00:32:18 closings seem to be among the most
00:32:19 effective ones other ones didn't seem to
00:32:22 make so much of a difference
00:32:24 so the winning model that Valencia
00:32:26 identified was that if you could
00:32:28 restrict Gatherings and limit
00:32:30 international travel those would be
00:32:33 among the most effective ways of
00:32:36 reducing covet whereas closing public
00:32:38 transport staying at home restrictions
00:32:41 on movements inside Spain or other
00:32:43 countries had an effect but not that
00:32:45 much
00:32:47 so in this way they were able to give
00:32:50 essentially a prescription but also have
00:32:52 learned from observations in other
00:32:53 countries what worked and what didn't
00:32:56 and they it also picked up on other
00:32:58 things for instance you didn't need to
00:33:00 wash your hands whereas masking was more
00:33:03 important and I ran counter to a lot of
00:33:05 the advice that we got in early 2020
00:33:07 which was like wash your hands wash your
00:33:09 hands wash your hands that was because
00:33:11 the mistaken beliefs that Cove had
00:33:13 spread through contact it was actually
00:33:15 spread through aerosols through
00:33:17 breathing
00:33:18 and so masking became more important
00:33:21 than washing hands essentially was
00:33:24 useless
00:33:26 so what did we learn from from this
00:33:28 particular exercise with X price
00:33:30 first thing to do is never change your
00:33:33 code the night before you're submitting
00:33:34 it uh second thing is that lstm models
00:33:37 are really useful for modeling
00:33:38 infectious diseases and predicting what
00:33:41 will happen
00:33:42 and that we found out that um
00:33:45 the best known pharmaceutical
00:33:46 interventions were pretty much the ones
00:33:48 that we kind of intuitively know don't
00:33:50 have big sporting events with everyone
00:33:52 in a giant Stadium or an arena restrict
00:33:56 travel especially international travel
00:33:58 or close your borders and wear face
00:34:00 coverings and some countries did this
00:34:02 very effectively Australia New Zealand
00:34:04 did it very very well until they
00:34:06 basically stopped um restrictions on
00:34:09 travel China did it okay as well until
00:34:11 this year when they opened it up and
00:34:14 then about a million people died
00:34:17 um now what we found as well is that
00:34:19 when the Valencia model is allowed to
00:34:20 extend beyond just the time that the
00:34:24 competition was running uh it ended up
00:34:27 being wildly off so it wasn't as
00:34:29 predictive as we thought it seemed to
00:34:31 work over relatively short period of
00:34:33 time
00:34:35 um it's still enough for them to win 250
00:34:38 000 and to get lots of publicity but I
00:34:42 think it tells us we still have a way to
00:34:43 go in terms of being able to predict
00:34:46 um what happens with pandemics although
00:34:50 it did tell us that they were pretty
00:34:52 good at identifying the most effective
00:34:53 methods for dealing with a pandemic like
00:34:56 covet
00:34:58 last thing I'm going to do is talk about
00:35:00 how machine learning can help us
00:35:01 estimate the the total burden of disease
00:35:04 for a big disease like a pandemic or a
00:35:09 flu or covet
00:35:10 and this is important because
00:35:13 um
00:35:14 we often don't know the the real impact
00:35:17 um in some cases for several months or
00:35:19 years after an outbreak
00:35:22 um you know if people have died it may
00:35:25 affect
00:35:26 [Music]
00:35:26 um
00:35:28 infrastructure may affect
00:35:30 activities that may alter the economy or
00:35:35 social structure or social fabric
00:35:39 um it may explain why things are broken
00:35:41 or it may explain things that we need to
00:35:43 fix
00:35:44 it's also helpful just to understand you
00:35:48 know the true impact of epidemic or
00:35:50 pandemic
00:35:52 so the question we asked was how many
00:35:54 people really died from covet over the
00:35:56 last three years remembering that it's
00:35:58 about three years ago today that that
00:36:00 covert has really identified and we
00:36:04 think that machine learning may have
00:36:05 that an answer and this is the group
00:36:08 from an observation we and others had
00:36:10 made which was that different countries
00:36:12 were better or worse at reporting Cova
00:36:15 deaths
00:36:17 so a country like Egypt for the first
00:36:20 year and a half of the pandemic reported
00:36:23 almost no
00:36:24 no deaths coming from cobit but we knew
00:36:27 from funeral records or what they call
00:36:30 excess death tracking so they keep
00:36:32 records of everyone who had died that
00:36:34 there were almost 200 000 excess deaths
00:36:37 in Egypt
00:36:38 over that same time
00:36:40 meaning a lot of people were dying for
00:36:43 reasons apparently unknown
00:36:45 the same thing also was happening in
00:36:46 Russia where it seemed like Russia was
00:36:48 handling the pandemic remarkably well
00:36:50 whereas in the U.S people were dying
00:36:52 right left
00:36:53 but then when Russia started releasing
00:36:55 its um death statistics just saying you
00:36:58 know how many people had died there's a
00:37:00 huge huge number of excess deaths and
00:37:04 these are deaths that are over and above
00:37:06 what you would normally expect from
00:37:07 historical averages at least historic
00:37:10 averages are very steady they change
00:37:12 just by maybe one percent
00:37:15 um here saying massive changes by 20 and
00:37:18 30 percent or 100 percent
00:37:20 so it turned out in Russia there were
00:37:22 you know three to four times more people
00:37:24 when excess tests than reported in Egypt
00:37:26 there was 13 times more people we can
00:37:29 see in the US the bright red and the
00:37:31 dark red almost overlap uh exactly and
00:37:34 so the US was pretty good at tracking
00:37:36 deaths but they generally underestimated
00:37:39 them and then you can see some of these
00:37:40 other countries like Serbia and
00:37:42 Azerbaijan and Armenia
00:37:44 um grossly underestimated
00:37:46 um the probable member of covet deaths
00:37:50 so it turned out that many countries
00:37:52 actually faked their Cova data
00:37:54 now in some cases it was because the
00:37:56 countries couldn't keep up with the high
00:37:58 death rates and just sort of gave up
00:38:00 others could track uh deaths as they
00:38:03 could get excess deaths but they didn't
00:38:05 have the ability to identify whether
00:38:07 something was a covet disease or not
00:38:08 they just simply said they died and we
00:38:10 don't know why
00:38:11 some countries initially were
00:38:13 underreported just to reduce panic
00:38:16 some countries in Africa didn't report
00:38:19 at all so it seemed like Africa was
00:38:21 covered free when actually it wasn't and
00:38:24 many countries deliberately
00:38:25 under-reported to make their leaders
00:38:27 look good and this is especially true in
00:38:29 places like Korea North Korea Russia
00:38:32 places like China
00:38:35 um Azerbaijan and so on
00:38:38 what we noticed was that the reporting
00:38:40 was a function of how wealthy a country
00:38:42 was so if they had enough resources they
00:38:43 generally tracked it well how corrupt a
00:38:46 government was what type of government
00:38:48 was in place certain cultural traditions
00:38:51 and then other National features so if
00:38:54 we combine all of those things which you
00:38:56 actually can get data from most anywhere
00:38:59 could we use machine learning and those
00:39:01 inputs to correct the faked or missing
00:39:04 data
00:39:05 so the idea was to try and collect data
00:39:07 on access dots and reported covert
00:39:09 deaths for the past two and a half years
00:39:10 for as many countries as we could find
00:39:12 and they're about 75 to 100 of those
00:39:16 that did that now remember there's 220
00:39:18 countries in the world so less than half
00:39:20 have that
00:39:22 then we wanted to collect other risk
00:39:24 factors so we wanted to know
00:39:25 um you know things like how many people
00:39:26 had AIDS how many what portion
00:39:28 population was obese what the population
00:39:30 density was at the level of vaccination
00:39:32 and again these are numbers you can get
00:39:34 for almost every country
00:39:37 um but they also affect covet deaths
00:39:40 we also collected statistics on on
00:39:42 countries regarding their political and
00:39:45 social systems so the level of
00:39:46 corruption and the GDP
00:39:49 um reported covet infection rate
00:39:53 um so we wanted them to develop a
00:39:54 machine learning model that predicts the
00:39:56 true deaths or if you want the excess
00:39:58 deaths from covid
00:40:01 um using
00:40:03 the reported covet deaths you remember
00:40:06 in some countries were accurately doing
00:40:08 it and some weren't doing it very well
00:40:10 at all
00:40:11 and then we wanted to take that model
00:40:13 and then apply it to countries that
00:40:15 either had almost no data or very
00:40:18 unreliable data and then trying to
00:40:20 actually estimate the total number the
00:40:22 true total number of Cobra deaths
00:40:25 so the input
00:40:27 um
00:40:28 in terms of was you know total number of
00:40:30 covet deaths at a given time and then
00:40:32 all these other values or features which
00:40:35 included reported coveted cases
00:40:37 information about the neighboring
00:40:39 countries because some countries uh
00:40:41 tracked and then other countries did
00:40:42 nothing uh so you could usually estimate
00:40:45 that if you know four countries
00:40:47 surrounding this one country all had a
00:40:48 bad one and then the interior countries
00:40:50 that they had no covet cases
00:40:53 um that was probably wrong and then so
00:40:54 the influence the neighbors would have
00:40:56 an effect there's a lag or offset about
00:40:59 two weeks between when a person would be
00:41:01 infected to when a person might die
00:41:03 uh we tracked the year and the week
00:41:05 because
00:41:07 um different coveted strains appeared
00:41:09 Alpha Beta gamma Omicron appeared uh we
00:41:13 had to track the year because there are
00:41:15 also changes in terms of people's
00:41:16 behavior we looked at information about
00:41:19 people over 65 who track vaccination
00:41:22 levels diabetes obesity HIV rates the
00:41:25 capital GDP and everything else
00:41:27 all of that was piled into this model
00:41:29 and we tried different regression
00:41:32 estimators we tried simple linear one
00:41:34 polynomial regression so support back to
00:41:36 machine regression and the next key
00:41:38 boost one and we measured how well the
00:41:40 model would perform looking at Absolute
00:41:42 error standard error percentage error
00:41:44 versus what we knew to be the truth and
00:41:47 this is usually the access takes deaths
00:41:49 and this is trained on the countries
00:41:51 where we had good reliable data which is
00:41:55 turned out to be about 75 countries and
00:41:57 then we held out about another 15
00:41:58 countries what's your holdout to see how
00:42:01 well our worked
00:42:02 and after we tested we found that the X2
00:42:04 boots model was the best one and then
00:42:06 this is what we found so what I'm
00:42:09 plotting is in blue
00:42:12 is the true number of covid deaths
00:42:16 versus the number of reported covered
00:42:19 deaths so the Gray Line shows the
00:42:21 reported deaths in the US the gray line
00:42:23 on the right reported deaths in Canada
00:42:26 the blue line is what our machine
00:42:28 learning model predicted and the orange
00:42:31 line is what the excess deaths were as
00:42:34 collected by different statistical
00:42:36 agencies and what you can see for the US
00:42:39 is that
00:42:41 um the blue line and the red and the
00:42:43 Orange Line are almost uh perfectly
00:42:45 overlapped so our predictor is doing
00:42:47 very good and what's more as we already
00:42:50 knew
00:42:51 um the true number of coded deaths is
00:42:53 higher than the reported covet deaths so
00:42:55 in the case of the us at the time we
00:42:57 completed this there are 828 000
00:42:59 reported deaths but the actual or
00:43:01 predicted based on our model based on
00:43:04 excess test was about 1.3 million
00:43:08 same thing is true in Canada now our
00:43:10 model didn't work quite as well in other
00:43:11 words the orange line and the blue line
00:43:14 don't overlap perfectly but we gain we
00:43:17 see that Canada underreported
00:43:20 um so we only said at the time they're
00:43:22 28 or 29 000 deaths but the actual
00:43:25 number was probably close to 48 or 49
00:43:28 000.
00:43:33 you can look at France and France did a
00:43:33 really good job of tracking we can see
00:43:35 that the gray lines the blue lines and
00:43:37 the orange lines all over that
00:43:39 very closely there's a slight Under
00:43:41 reporting but you can see that our
00:43:43 prediction which is blue matches very
00:43:45 much like the orange which is what she's
00:43:47 supposed to do
00:43:48 and we also did the same thing with
00:43:51 Chile and we found that our model
00:43:55 um trapped very well again with the
00:43:57 orange and blue line almost overlapping
00:43:59 and both France and Chile slightly
00:44:01 underreported
00:44:03 um the total number of deaths
00:44:06 on the other hand if you apply this
00:44:08 um
00:44:09 to Russia uh where the data data we knew
00:44:12 was pretty flaky we can see that the
00:44:15 gray line is very low
00:44:17 whereas the blue and orange lines almost
00:44:20 perfectly matched and so in the case of
00:44:23 Russia they reported only 298 000 deaths
00:44:27 when in fact it was closer to a million
00:44:29 people had died in Russia and this is up
00:44:32 to about six months ago so many more
00:44:34 have died since
00:44:41 so we've done this for many countries
00:44:41 um we're in the process of trying to do
00:44:42 it for all 220 countries around the
00:44:44 world including the ones that didn't
00:44:46 have very much covered data
00:44:49 um
00:44:50 and
00:44:51 what we can see is that U.S and Canada
00:44:53 under counted covered deaths by about 40
00:44:55 percent and this is not unique to our
00:44:58 other groups who've published on this
00:45:00 and noticed the same trend
00:45:02 and that largely seems to be due to sort
00:45:05 of problems with our tracking and our
00:45:07 ability to do measurements in public
00:45:09 health monitoring
00:45:11 France and Chile
00:45:13 um even Chile is you know somewhat less
00:45:16 developed in Canada did better jobs than
00:45:18 Canada and certainly much better jobs in
00:45:20 the US they still underestimated but
00:45:22 only about 20 percent
00:45:24 in the case of Russia they deliberately
00:45:26 under accounted cover deaths by at least
00:45:29 300 percent and this is entirely due to
00:45:31 directives from Kremlin and this is the
00:45:34 case for a number of countries which are
00:45:36 run as dictatorships
00:45:39 so what is the true toll of covid-19
00:45:42 right now the current estimate is that
00:45:45 between 20 and 25 million people have
00:45:47 died from covet over the last three
00:45:49 years whereas the number that's
00:45:51 officially released that you'll see
00:45:52 posted is about 6.1 million
00:45:56 so in other words covet is about four
00:45:58 times worse than what people have been
00:46:00 reporting and that's just an indication
00:46:03 of how much Under reporting has been
00:46:05 done around the world by many countries
00:46:09 um and I think the data we're getting
00:46:11 from our calculations says that it's
00:46:13 probably closer to the 25 million and 20
00:46:15 million
00:46:17 so how does covert compare to other
00:46:19 pandemics so the worst pandemic of all
00:46:22 was black death which happened in the
00:46:24 1300s about 151 people died over about a
00:46:27 seven year period
00:46:28 at the end of World War II there's
00:46:30 something called the Spanish flu and
00:46:32 more than 40 million people died
00:46:34 all of us pretty much have been living
00:46:36 through the HIV or Aids pandemic it
00:46:39 started in 1919 still technically
00:46:41 ongoing although treatments have got to
00:46:43 the point where uh mortality is
00:46:45 relatively low but to date about 33
00:46:48 million people have died from AIDS over
00:46:49 the last 25 or 30 years
00:46:52 covid-19 is ranked uh number four and
00:46:56 it's between 20 and 25 million with my
00:46:58 own estimate as it's closer to 25
00:47:00 million
00:47:02 so covered compared to all the pandemics
00:47:04 over the last um 700 years is is
00:47:07 probably the worst
00:47:10 fourth worst of all
00:47:13 um what's more is that if you include
00:47:15 AIDS we are living in the two worst
00:47:18 pandemics in human history the AIDS
00:47:21 pandemic and the covert pandemic
00:47:23 some of you may be old enough to have
00:47:25 survive the Hong Kong flu about a
00:47:27 million people uh there are other foods
00:47:29 that have appeared that have also killed
00:47:31 people but much much less than what
00:47:34 we're saying with covet or HIV
00:47:40 so what can we say covet has offered a
00:47:40 unique and unprecedented opportunity to
00:47:42 use modern data surveillance to acquire
00:47:43 big data about infectious diseases
00:47:46 getting that beginners allowed us to use
00:47:48 machine learning to use and interpret
00:47:51 and apply it to tracking infectious
00:47:55 diseases predicting and forecasting
00:47:57 infectious diseases and making or
00:48:00 correcting for errors in reporting an
00:48:02 infectious diseases
00:48:04 what we've seen is that the machine
00:48:06 learning models generally perform better
00:48:07 than most if not all previous approaches
00:48:09 and some of them actually were
00:48:11 spectacularly good to just kind of blow
00:48:14 your socks off
00:48:15 so what it's really saying is that
00:48:17 machine learning is here to stay it's
00:48:19 here to stay in terms of understanding
00:48:21 infectious diseases managing infectious
00:48:23 diseases and predicting their outcomes
00:48:24 and also advising us on optimal
00:48:28 um
00:48:29 Public Health measures
00:48:31 so this has changed our perspective on
00:48:35 disease modeling and certainly hopefully
00:48:37 it's changed your perspective on the
00:48:39 applications of machine learning
00:48:41 so with that I want to thank many of my
00:48:44 colleagues and students and
00:48:47 collaborators I'd like to thank Krishna
00:48:50 cover who did the work on the total
00:48:53 death totals for covet predicting for
00:48:56 different countries
00:48:58 um and then we've had many other
00:48:59 contributors gather the data because
00:49:01 collecting the data was not easy and
00:49:03 still isn't particularly easy but
00:49:05 without the data we couldn't do the
00:49:07 machine learning
00:49:08 so with that I want to thank everyone
00:49:10 for listening and thank you for your