Age Group Sensitivity for COVID-19 Infections Forecasting using Deep Learning
Abstract
The COVID-19 pandemic has created unprecedented challenges for
governments and healthcare systems worldwide, highlighting the
critical importance of understanding the factors that contribute
to virus transmission. This study aimed to identify the most
influential age groups in COVID-19 infection rates at the US county
level using the Modified Morris Method and deep learning for
time series. Our approach involved training the state-of-the-art
time-series model Temporal Fusion Transformer on different age
groups as a static feature and the population vaccination status
as the dynamic feature. We analyzed the impact of those age groups
on COVID-19 infection rates by perturbing individual input features
and ranked them based on their Morris sensitivity scores, which
quantify their contribution to COVID-19 transmission rates.
The findings are verified using ground truth data from the CDC and
US Census, which provide the true infection rates for each age group.
The results suggest that young adults were the most influential age
group in COVID-19 transmission at the county level between March 1,
2020, and November 27, 2021. Using these results can inform public
health policies and interventions, such as targeted vaccination
strategies, to better control the spread of the virus. Our approach
demonstrates the utility of feature sensitivity analysis in
identifying critical factors contributing to COVID-19 transmission
and can be applied in other public health domains.
Introduction
The COVID-19 pandemic has posed significant challenges
for governments and healthcare systems worldwide,
highlighting the need for effective measures to manage and
control the virus's spread. Understanding the factors that
contribute to disease transmission is crucial for developing
targeted public health policies and interventions.
Age is a critical factor in COVID-19 transmission as shown by previous studies.
Many prior works have studied forecasting COVID-19 infection
and mortality rates using different statistical, autoregressive
machine learning, and deep learning models.
The focus of these models is to make better predictions for
the epidemic spread which would help take better mitigation
steps. Interpreting how these models make these predictions
can provide an understanding of the importance of these
different input factors and their interactions.
Temporal Fusion Transformer (TFT) [1] is a state-of-the-art
forecasting model that has been widely used to do
multivariate multi-horizon forecasting. TFT is particularly
useful for modeling complex, non-linear relationships
between input features and target variables,
making it an ideal tool for analyzing COVID-19 transmission
patterns.
The Morris method has been widely used to perform sensitivity
analysis of different models. By calculating the output change
w.r.t input perturbation, the Morris method provides a simple
approach to rank input factors by their sensitivity.
By combining TFT with feature sensitivity analysis, we can
gain a better understanding of the factors that contribute
to COVID-19 transmission and develop more effective strategies
to control the spread of the virus.
In this work, we collected the population by age subgroups
data for each of the 3,142 US counties, along with the daily
vaccination rate of the population and COVID-19 case report
from March 1, 2020, to Dec 27, 2021. While there are prior
studies that observe COVID-19 trends for different age groups,
we trained a TFT model on the dataset separately on each age
subgroup as the static covariate and then calculated the
sensitivity of those models using a modified Morris method.
We then ranked the age groups based on their Morris sensitivity
scores to identify the most influential factors contributing
to COVID-19 transmission. Finally, we evaluated our age group
sensitivity rank with the actual COVID-19 cases reported for
those age groups in the same time period.
Our study aims to provide insights into the age-related factors
that contribute to COVID-19 transmission and inform targeted
interventions to control the spread of the virus.
Fig. 1: Ground truth: weekly COVID-19 case numbers for each
of the eight age subgroups from the time period of March 1,
2020, to November 27, 2021.
Morris Method
Morris method [2] is a reliable and efficient sensitivity
analysis method that defines the sensitivity of a model input as the ratio
of the change in an output variable to the change in an input feature.
Given a model f(.), and a set of k input features, the Morris sensitivity
of a model input feature can be defined.
The Temporal Fusion Transformer (TFT)
The Temporal Fusion Transformer (TFT) [1] is a novel, interpretable,
attention-based deep learning model for multi-horizon forecasting. Its architecture is
carefully designed to handle static (e.g. percentage of age group in a county
population), past observed, and known future (e.g. day of the week) inputs.
Its architecture is specially modified for four main uses.
1) To learn both local and global time-varying relationships at different
scales. 2) To filter out input noises using Variable Selection Network (VSN).
3) To incorporate static metadata into the dynamic features for temporal
forecasting. 4) To efficiently use different parts of the network through a
gating mechanism. On a wide range of real-world tasks, TFT achieves
state-of-the-art performance. Hence we use the model for our own study
here.
Input Data and Features
The data set consists of daily COVID-19 cases for over 3,142 US counties with
eight age group related static features, one dynamic feature and one known future feature.
Feature description and sources are reported in the following table.
Table 1: Details of Collected Features and Sources.
| Feature |
Type |
Update Frequency |
Description |
Source(s) |
Age Groups ( UNDER5, AGE517, AGE1829, AGE3039, AGE4049, AGE5064, AGE6574, AGE75PLUS ) |
Static |
Once |
Percent of population in each age group. |
2020 Govt Census |
Vaccination Full Dose (Series_Complete_Pop_Pct) |
Observed |
Daily |
Percent of people who are fully vaccinated (have second dose of a two-dose vaccine or one dose of a single-dose vaccine) based on the jurisdiction and county where recipient lives. |
CDC |
| SinWeekly |
Known Future |
Daily |
Sin (day of the week / 7) . |
Date |
| Case |
Target |
Daily |
COVID-19 infection at county level. |
USA Facts |
The data set was split into training, validation, and testing sets
following the date range in the table bellow:
Table 2: Dateset split and date
| Dataset |
Start |
End |
| Train | 03-01-2020 | 11-27-2021 |
| Validation | 11-28-2021 | 12-12-2021 |
| Test | 12-13-2021 | 12-27-2021 |
Model Training and Prediction
A separate TFT model was trained for each age group, for a total
of eight models. Since all static variables are presented together
as context vectors, the separation ensures the identification of
a specific subgroup's contribution without the interference of other
correlated static variables. In addition to the static age feature,
all TFT models include a dynamic vaccination status feature.
Fig. 2: Workflow of Population Age Group Sensitivity Analysis for COVID-19 Forecasts.
An additional variable SinWeekly is also included, which represents
weekly trends for changes in the number of new cases.
The target variable for each model was the daily number of cases
for each county. The daily predictions for each county were determined
for the training, validation, and testing time periods. These predictions
were compared to the observed values to obtain RMSE values.
Sensitivity Analysis
Next, our Modified Morris Method was applied to each model.
This involved perturbing the age feature in each model by a
range of positive and negative delta values. The chosen values
ranged from -0.01 to 0.01 with steps of 0.001, disregarding the
delta value equal to zero. The Morris indices for each of the delta
values are shown below. Positive and negative values of delta indicate
an increase and decrease in the age feature, respectively. Investigating
both an increase and decrease in these features allowed for an
understanding of how the Morris indices change between different delta
values. Our results show that the Morris values are approximately stable
for each age group across a range of values.
Another observation about these results is the magnitude of the Morris
values. With the delta and Morris values shown at the same scale of 10^-4,
the Morris values remain about two to three orders of magnitude smaller
than the delta values. This means that the changes caused by the
perturbation of the static variable result in small changes in the
predictions of the models, indicating a non-linear relationship between
the static input and the output.
Fig. 3: Morris scores for all eight models, calculated at intervals of 0.001, excluding zero.
Despite the relatively small values, there is a clear distinction
between which age groups have higher Morris indices on average.
This separation allowed us to rank the age groups based on their
sensitivity to the models. Before this was done, it was necessary
to obtain a ground truth ranking to use for verification of our
ranking derived from the Morris method.
Data were obtained from the CDC for the total number of COVID-19
cases reported for each age group within the time period of the
training set. Data from the US Census were also obtained to
provide an estimate of the total population of each age group,
with the estimate made for April 1, 2020. Having both the cases
and the estimated population allowed us to calculate the true
ranking. This was done by calculating the percentage of people in
each age group who were infected. For example, 10,018,923 cases
were reported for the age group 18-29, with an estimated population
of 54,992,661 for that age group. This results in an infection rate
of 18.2 percent. The infection rates for all age groups are shown below:
Table 3: Total number of cases, population, and infection rate of each age group.
| Age Group |
Cases |
Population |
Infection Rate (%) |
| 0 - 4 Years | 1,249,223 | 19,392,551 | 6.4 |
| 5 - 17 Years | 6,184,296 | 54,992,661 | 11.2 |
| 18 - 29 Years | 10,018,923 | 53,013,409 | 18.9 |
| 30 - 39 Years | 7,760,789 | 45,034,182 | 17.2 |
| 40 - 49 Years | 6,767,348 | 41,003,731 | 16.5 |
| 50 - 64 Years | 8,820,765 | 63,876,118 | 13.8 |
| 65 - 74 Years | 3,289,094 | 32,346,340 | 10.2 |
| 75+ Years | 2,505,606 | 21,790,289 | 11.5 |
Based on the infection rate within each age subgroup, a ranking was made
ranging from 1, the age group with the lowest infection rate, to 8,
the age group with the highest infection rate. A ranking of the sensitivity
of the model to each age feature was also determined by first ranking the relative
Morris values for each age group for a given delta to obtain a ranking from 1
(lowest) to 8 (highest) value. After ranking each subgroup for all 20 delta values,
the average ranking was determined.
Table 4: Ranking of the contribution of each age group.
| Age Group |
Infection Rank |
Morris Rank |
Difference |
| 0 - 4 Years | 8 | 8 | 0 |
| 5 - 17 Years | 6 | 7 | 1 |
| 18 - 29 Years | 1 | 1 | 0 |
| 30 - 39 Years | 2 | 3.5 | 1.1 |
| 40 - 49 Years | 3 | 2 | 1 |
| 50 - 64 Years | 4 | 5 | 1 |
| 65 - 74 Years | 7 | 6 | 1 |
| 75+ Years | 5 | 3.5 | 1.5 |
Since the average Morris ranking for the age groups 30-39 and 75+ were found to be
identical, we chose to break the tie arbitrarily. We noticed that the 5-17 age group
ranking of 2 was significantly different from the ground truth ranking of 7.
However, our results indicate that the Morris ranking closely matches the ground
truth ranking for the other seven age groups, as most ranks are identical or have
a difference of 1 to 3. This is interpreted as indicating that our predictions of
which age groups contribute most to the total COVID-19 case numbers during the
time period of the data are approximately equal to the true values verified by
CDC and US Census data.
Related Work
Numerous works have been done on COVID-19 forecasting using deep learning
and other AI or statistical models. Understanding the feature importance
and interaction of the input factors is crucial to the adaptation of control measures based on the changing dynamics of the pandemic.
Prior work involved simulation of the spread of COVID-19 using a fractal-fractional model and used sensitivity analysis to assess the impact of model parameters such as the transmission rate, recovery rate, and vaccination rate on the outcomes of the model. Other work investigated the impact of key model parameters on the outcomes of a Susceptible-Carrier-Exposed-Infected-Recovered (SCEIR) model predicting COVID-19 infection. Similarly, additional work analyzed the demographic patterns of COVID-19 cases during the recent resurgence of the pandemic in the United States. The data, consisting of 948 counties, studies the age distribution of cases and their relationship with changes in COVID-19 incidence.
Conclusion and Future Work
Typically, deep learning lacks sufficient interpretability to understand the
decisions a model makes to improve predictive performance. However, our work
demonstrates the successful application of the Morris method. By analyzing model
sensitivity to changes in the age input feature, we explained the impact of COVID-19
on various age group populations in the US between the time period of March 1, 2020
to November 27, 2021. Small perturbations in the static features showed that our models
were most sensitive to the 18-29, 30-39, and 40-49 age groups, indicating that these
groups contributed most to the total number of COVID-19 cases. These results were then
verified by comparison to the ground truth from CDC and Census data.
After demonstrating the effectiveness of the Morris method applied to deep learning
models for understanding the impact of COVID-19 on different age groups, we intend
for future experiments to center on exploring the temporal component of these predictions.
CDC data show that the ranking of COVID-19 infection rates by age group changes over time.
By training new TFT models for a variety of time periods, we can better understand how the
impact of COVID-19 varies throughout the pandemic and can further strengthen the reliability
of our methods.
References
- B. Lim, S. O¨ . Arik, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,” International Journal of Forecasting, vol. 37, no. 4, pp. 1748–1764, 2021.
- M. Morris, “Factorial sampling plans for preliminary computational experiments,” Technometrics, vol. 33, no. 2, pp. 161–174, 1991.