Age Group Sensitivity for COVID-19 Infections Forecasting using Deep Learning

Abstract

The COVID-19 pandemic has created unprecedented challenges for governments and healthcare systems worldwide, highlighting the critical importance of understanding the factors that contribute to virus transmission. This study aimed to identify the most influential age groups in COVID-19 infection rates at the US county level using the Modified Morris Method and deep learning for time series. Our approach involved training the state-of-the-art time-series model Temporal Fusion Transformer on different age groups as a static feature and the population vaccination status as the dynamic feature. We analyzed the impact of those age groups on COVID-19 infection rates by perturbing individual input features and ranked them based on their Morris sensitivity scores, which quantify their contribution to COVID-19 transmission rates. The findings are verified using ground truth data from the CDC and US Census, which provide the true infection rates for each age group. The results suggest that young adults were the most influential age group in COVID-19 transmission at the county level between March 1, 2020, and November 27, 2021. Using these results can inform public health policies and interventions, such as targeted vaccination strategies, to better control the spread of the virus. Our approach demonstrates the utility of feature sensitivity analysis in identifying critical factors contributing to COVID-19 transmission and can be applied in other public health domains.

Introduction

The COVID-19 pandemic has posed significant challenges for governments and healthcare systems worldwide, highlighting the need for effective measures to manage and control the virus's spread. Understanding the factors that contribute to disease transmission is crucial for developing targeted public health policies and interventions.
Age is a critical factor in COVID-19 transmission as shown by previous studies. Many prior works have studied forecasting COVID-19 infection and mortality rates using different statistical, autoregressive machine learning, and deep learning models.
The focus of these models is to make better predictions for the epidemic spread which would help take better mitigation steps. Interpreting how these models make these predictions can provide an understanding of the importance of these different input factors and their interactions.
Temporal Fusion Transformer (TFT) [1] is a state-of-the-art forecasting model that has been widely used to do multivariate multi-horizon forecasting. TFT is particularly useful for modeling complex, non-linear relationships between input features and target variables, making it an ideal tool for analyzing COVID-19 transmission patterns.

The Morris method has been widely used to perform sensitivity analysis of different models. By calculating the output change w.r.t input perturbation, the Morris method provides a simple approach to rank input factors by their sensitivity.
By combining TFT with feature sensitivity analysis, we can gain a better understanding of the factors that contribute to COVID-19 transmission and develop more effective strategies to control the spread of the virus.
In this work, we collected the population by age subgroups data for each of the 3,142 US counties, along with the daily vaccination rate of the population and COVID-19 case report from March 1, 2020, to Dec 27, 2021. While there are prior studies that observe COVID-19 trends for different age groups, we trained a TFT model on the dataset separately on each age subgroup as the static covariate and then calculated the sensitivity of those models using a modified Morris method.

We then ranked the age groups based on their Morris sensitivity scores to identify the most influential factors contributing to COVID-19 transmission. Finally, we evaluated our age group sensitivity rank with the actual COVID-19 cases reported for those age groups in the same time period. Our study aims to provide insights into the age-related factors that contribute to COVID-19 transmission and inform targeted interventions to control the spread of the virus.



Fig. 1: Ground truth: weekly COVID-19 case numbers for each of the eight age subgroups from the time period of March 1, 2020, to November 27, 2021.


Morris Method

Morris method [2] is a reliable and efficient sensitivity analysis method that defines the sensitivity of a model input as the ratio of the change in an output variable to the change in an input feature. Given a model f(.), and a set of k input features, the Morris sensitivity of a model input feature can be defined.

The Temporal Fusion Transformer (TFT)

The Temporal Fusion Transformer (TFT) [1] is a novel, interpretable, attention-based deep learning model for multi-horizon forecasting. Its architecture is carefully designed to handle static (e.g. percentage of age group in a county population), past observed, and known future (e.g. day of the week) inputs. Its architecture is specially modified for four main uses. 1) To learn both local and global time-varying relationships at different scales. 2) To filter out input noises using Variable Selection Network (VSN). 3) To incorporate static metadata into the dynamic features for temporal forecasting. 4) To efficiently use different parts of the network through a gating mechanism. On a wide range of real-world tasks, TFT achieves state-of-the-art performance. Hence we use the model for our own study here.

Input Data and Features

The data set consists of daily COVID-19 cases for over 3,142 US counties with eight age group related static features, one dynamic feature and one known future feature. Feature description and sources are reported in the following table.

Table 1: Details of Collected Features and Sources.

Feature Type Update Frequency Description Source(s)
Age Groups
( UNDER5, AGE517, AGE1829, AGE3039, AGE4049, AGE5064, AGE6574, AGE75PLUS )
Static Once Percent of population in each age group. 2020 Govt Census
Vaccination Full Dose
(Series_Complete_Pop_Pct)
Observed Daily Percent of people who are fully vaccinated (have second dose of a two-dose vaccine or one dose of a single-dose vaccine) based on the jurisdiction and county where recipient lives. CDC
SinWeekly Known Future Daily Sin (day of the week / 7) . Date
Case Target Daily COVID-19 infection at county level. USA Facts

The data set was split into training, validation, and testing sets following the date range in the table bellow:

Table 2: Dateset split and date

Dataset Start End
Train03-01-2020 11-27-2021
Validation11-28-202112-12-2021
Test12-13-202112-27-2021

Model Training and Prediction

A separate TFT model was trained for each age group, for a total of eight models. Since all static variables are presented together as context vectors, the separation ensures the identification of a specific subgroup's contribution without the interference of other correlated static variables. In addition to the static age feature, all TFT models include a dynamic vaccination status feature.

Fig. 2: Workflow of Population Age Group Sensitivity Analysis for COVID-19 Forecasts.


An additional variable SinWeekly is also included, which represents weekly trends for changes in the number of new cases. The target variable for each model was the daily number of cases for each county. The daily predictions for each county were determined for the training, validation, and testing time periods. These predictions were compared to the observed values to obtain RMSE values.

Sensitivity Analysis

Next, our Modified Morris Method was applied to each model. This involved perturbing the age feature in each model by a range of positive and negative delta values. The chosen values ranged from -0.01 to 0.01 with steps of 0.001, disregarding the delta value equal to zero. The Morris indices for each of the delta values are shown below. Positive and negative values of delta indicate an increase and decrease in the age feature, respectively. Investigating both an increase and decrease in these features allowed for an understanding of how the Morris indices change between different delta values. Our results show that the Morris values are approximately stable for each age group across a range of values.
Another observation about these results is the magnitude of the Morris values. With the delta and Morris values shown at the same scale of 10^-4, the Morris values remain about two to three orders of magnitude smaller than the delta values. This means that the changes caused by the perturbation of the static variable result in small changes in the predictions of the models, indicating a non-linear relationship between the static input and the output.

Fig. 3: Morris scores for all eight models, calculated at intervals of 0.001, excluding zero.


Despite the relatively small values, there is a clear distinction between which age groups have higher Morris indices on average. This separation allowed us to rank the age groups based on their sensitivity to the models. Before this was done, it was necessary to obtain a ground truth ranking to use for verification of our ranking derived from the Morris method.

Data were obtained from the CDC for the total number of COVID-19 cases reported for each age group within the time period of the training set. Data from the US Census were also obtained to provide an estimate of the total population of each age group, with the estimate made for April 1, 2020. Having both the cases and the estimated population allowed us to calculate the true ranking. This was done by calculating the percentage of people in each age group who were infected. For example, 10,018,923 cases were reported for the age group 18-29, with an estimated population of 54,992,661 for that age group. This results in an infection rate of 18.2 percent. The infection rates for all age groups are shown below:

Table 3: Total number of cases, population, and infection rate of each age group.

Age Group Cases Population Infection Rate (%)
0 - 4 Years 1,249,22319,392,5516.4
5 - 17 Years6,184,29654,992,66111.2
18 - 29 Years10,018,92353,013,409 18.9
30 - 39 Years7,760,789 45,034,18217.2
40 - 49 Years6,767,34841,003,73116.5
50 - 64 Years8,820,76563,876,11813.8
65 - 74 Years3,289,09432,346,34010.2
75+ Years2,505,60621,790,28911.5

Based on the infection rate within each age subgroup, a ranking was made ranging from 1, the age group with the lowest infection rate, to 8, the age group with the highest infection rate. A ranking of the sensitivity of the model to each age feature was also determined by first ranking the relative Morris values for each age group for a given delta to obtain a ranking from 1 (lowest) to 8 (highest) value. After ranking each subgroup for all 20 delta values, the average ranking was determined.

Table 4: Ranking of the contribution of each age group.

Age Group Infection Rank Morris Rank Difference
0 - 4 Years 880
5 - 17 Years671
18 - 29 Years11 0
30 - 39 Years2 3.51.1
40 - 49 Years321
50 - 64 Years451
65 - 74 Years761
75+ Years53.51.5

Since the average Morris ranking for the age groups 30-39 and 75+ were found to be identical, we chose to break the tie arbitrarily. We noticed that the 5-17 age group ranking of 2 was significantly different from the ground truth ranking of 7. However, our results indicate that the Morris ranking closely matches the ground truth ranking for the other seven age groups, as most ranks are identical or have a difference of 1 to 3. This is interpreted as indicating that our predictions of which age groups contribute most to the total COVID-19 case numbers during the time period of the data are approximately equal to the true values verified by CDC and US Census data.

Related Work

Numerous works have been done on COVID-19 forecasting using deep learning and other AI or statistical models. Understanding the feature importance and interaction of the input factors is crucial to the adaptation of control measures based on the changing dynamics of the pandemic. Prior work involved simulation of the spread of COVID-19 using a fractal-fractional model and used sensitivity analysis to assess the impact of model parameters such as the transmission rate, recovery rate, and vaccination rate on the outcomes of the model. Other work investigated the impact of key model parameters on the outcomes of a Susceptible-Carrier-Exposed-Infected-Recovered (SCEIR) model predicting COVID-19 infection. Similarly, additional work analyzed the demographic patterns of COVID-19 cases during the recent resurgence of the pandemic in the United States. The data, consisting of 948 counties, studies the age distribution of cases and their relationship with changes in COVID-19 incidence.

Conclusion and Future Work

Typically, deep learning lacks sufficient interpretability to understand the decisions a model makes to improve predictive performance. However, our work demonstrates the successful application of the Morris method. By analyzing model sensitivity to changes in the age input feature, we explained the impact of COVID-19 on various age group populations in the US between the time period of March 1, 2020 to November 27, 2021. Small perturbations in the static features showed that our models were most sensitive to the 18-29, 30-39, and 40-49 age groups, indicating that these groups contributed most to the total number of COVID-19 cases. These results were then verified by comparison to the ground truth from CDC and Census data.
After demonstrating the effectiveness of the Morris method applied to deep learning models for understanding the impact of COVID-19 on different age groups, we intend for future experiments to center on exploring the temporal component of these predictions. CDC data show that the ranking of COVID-19 infection rates by age group changes over time. By training new TFT models for a variety of time periods, we can better understand how the impact of COVID-19 varies throughout the pandemic and can further strengthen the reliability of our methods.

References

  1. B. Lim, S. O¨ . Arik, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,” International Journal of Forecasting, vol. 37, no. 4, pp. 1748–1764, 2021.
  2. M. Morris, “Factorial sampling plans for preliminary computational experiments,” Technometrics, vol. 33, no. 2, pp. 161–174, 1991.