Running Dynamics

Project:

Date:

Team:

Keywords:

Predicting Spatial Patterns in Running Behavior Through Time-Series Analysis of Weather Conditions

2024.5

Biru Cao, Yifeng Liu

Temporal Sequence Modeling, CNN, LSTM

This project predicts spatial patterns of the running behavior through time-series analysis based on weather conditions by building a CNN+LSTM model.

Introduction

Because running is space and time-related, we want to analyze the group behavior based on spatial and temporal data. We aim to predict spatial-temporal patterns of the running behavior through time-series analysis.
The data we have includes:

1. Running trajectories in the Tsinghua University campus from July to August 2022;

2. Weather information from July to August 2022.

Data Preparation

Our interest is understanding the distribution of running behaviors rather than its sequence, so we encode the data in a heatmap format. We convert each day's trajectory data into a 31x33 matrix. Each cell contains three key metrics: total and average length of trajectories, and the number of runners.

Figure 1: Trajectory encoded matrix

The next step is encoding the weather and date data. Because runners might have their own regular pattern, we want to capture the cycles in the date. We use one-hot vectors to represent temporal aspects such as the month, week of the month, and day of the week. Weather conditions are simplified into three primary variables: wind speed, temperature, and humidity.

Figure 2: Encoded weather data

These features are then concatenated with the spatial data and reshaped to match the matrix dimensions. The model’s input sequence (X) includes runner matrices and weather and date features for 7 days. And the output is the 8th day’s runner matrix.

Figure 3: Input and output of the model

Figure 4: Input and output of the model

Model Architecture

Figure 5: Model architecture

Because we want to deal with spatial and temporal data, we use a CNN+LSTM hybrid model to predict future runner matrices. The spatial features are captured by CNN, and temporal features are captured by LSTM. TimeDistributed wrappers allow the CNN operations to apply independently to each time step.

Training

For training, we randomly select 80% of the data for the training dataset and 20% for the validation dataset. We use the Adam optimizer and mean squared error for loss calculation. After 200 epochs, we've observed a significant decrease in the model’s loss.

Figure 6: The training loss

Evaluation

Considering that each cell in our matrix contains three distinct measurements—total trajectory length, average length, and runner count—we develop separate models for each metric and evaluate them individually. For evaluation, we compare the predicted matrices to the actual data in the heatmap style to see the model’s accuracy. We visualize around 100 matrices. Because we also use MSE as the loss function, the final loss is also used to evaluate the model's accuracy.

Through visual inspection, for instance, the average_length model achieves high accuracy, closely matching the ground truth. The highlighted small squares in the results correspond to the campus's football fields, which are popular running spots. The runner number model and the total length model show similar promising results. In conclusion, our model can accurately predict future running behaviors based on historical spatial and temporal data.

Figure 7: The real data of the average_length matrix (left) and the predicted average_length matrix(right) on the same day

Figure 7: The real data of the average_length matrix (left) and the predicted average_length matrix(right) on the same day

Conclusion

Our results show that our method of generating future heatmaps based on past data has achieved acceptable success, and MSE is about 5% of the mean value. It shows how runners’ preferences for space vary according to weather and date and identifies the most popular running spots.

Through the visual comparison of the data, we found that many errors were due to the noise in the original dataset or sports events such as marathons. If we want to get a better prediction, we can also combine the information of some public events to improve the prediction.

At the same time, our research method hasn’t met some expectations. For example, we didn’t include future weather forecasts as inputs to our models. At the same time, our test and training sets are in random split instead of temporal split. We could improve these in the future work.

In conclusion, while our CNN+LSTM hybrid model shows promise in predicting spatial patterns in running behavior, further research is needed to enhance its robustness, efficiency, and generalization capability.

References

[1] H. Ertan, “CNN-LSTM based Models for Multiple Parallel Input and Multi-Step Forecast,” Medium, Nov. 17, 2021.

https://towardsdatascience.com/cnn-lstm-based-models-for-multiple-parallel-input-and-multi-step-forecast-

6fe2172f7668
[2] P. Malaysia, Upm), T. Malaysia, and Perumal, “A Comprehensive Overview and Comparative Analysis on Deep

Learning Models: CNN, RNN, LSTM, GRU.” Available: https://arxiv.org/pdf/2305.17473
[3] Liu, Y., & Lai, Y. (2024). Analyzing jogging activity patterns and adaptation to public health regulation.

Environment and Planning B: Urban Analytics and City Science, 51(3), 670-688.

https://doi.org/10.1177/23998083231193484

Contents