Istanbul Mini Project
Intro
Estimation and Comparison of Prediction Intervals using QRLSTM and Historical Forecast Similarity on Istanbul’s Traffic Data.
Evaluation and Comparison of 2 different approaches:
- QRLSTM:
Directly estimates wanted quantiles with an LSTM network.
Called QRLSTM because of the unique loss function that is uneven weighted
Using the specific quantiles for creating Prediction Intervals
- Forecast Similarity
First, makes a point estimate of the traffic flow.
Then, uses these estimations as a time–series data
Creates time-windows of forecast-time-series data
Calculates the similarity between different time-windows using different metrics
Using similar time windows for creating Prediction Intervals
Data
The data provided had many missing instances.
4 different methods for handling missing data (not to be included in this report)
- Remove NA
- Using ARIMA for predicting missing data
- Mean of every same hour
- Mean of every same hour of the same day (e.g. Monday 4:00)
QRLSTM
A peek to the input of LSTM network:
head(a1)
QRLSTM code:
# tau = 0.05 / data = d1
<- keras_model_sequential()
qrlstm05_d1 # LSTM > SOFTMAX > OUTPUT
%>%
qrlstm05_d1 layer_lstm(units = 8, input_shape = c(9,1)) %>%
layer_activation_softmax() %>%
layer_dense(units = 1)
# ADAM / LEARNING RATE = 0.001 / QUANTILE(PINBALL) LOSS
%>%
qrlstm05_d1 compile(loss = pinball_loss05,
optimizer = optimizer_adam(learning_rate = 0.001),
metrics = 'mae')
# BATCHSIZE = 1 (changing it to 10 might be much much faster with same performance, not sure though)
<- qrlstm05_d1 %>% fit(
fit_qr05_d1 x = t1ax,
y = t1ay,
batch_size = 1,
epochs = 30,
verbose = 1,
shuffle = FALSE
)
## tau = 0.95
<- keras_model_sequential()
qrlstm95_d1
%>%
qrlstm95_d1 layer_lstm(units = 8, input_shape = c(9,1)) %>%
layer_activation_softmax() %>%
layer_dense(units = 1)
#
%>%
qrlstm95_d1 compile(loss = pinball_loss95,
optimizer = optimizer_adam(learning_rate = 0.001),
metrics = 'mae')
<- qrlstm95_d1 %>% fit(
fit_qr95_d1 x = t1ax,
y = t1ay,
batch_size = 10,
epochs = 30,
verbose = 1,
shuffle = FALSE
)
ggplot(te1all[1:300,]) +
geom_line(mapping = aes(x = index, y = flow), color = "black") +
geom_line(mapping = aes(x = index, y = upper), color = "blue") +
geom_line(mapping = aes(x = index, y = lower), color = "red")
We see that model1 covers the flow inside the interval, nicely. However, it sometimes has very big intervals.
ggplot(te3all[1:300,]) +
geom_line(mapping = aes(x = index, y = flow), color = "black") +
geom_line(mapping = aes(x = index, y = upper), color = "blue") +
geom_line(mapping = aes(x = index, y = lower), color = "red")
In model2, the intervals are smaller compared to model1. However, we see that there are many cases where the flow leaves the interval. Thus we are expecting a bad UC score.
Data | Winkler Train | Winkler Test | Unconditional Coverage Train | Unconditional Coverage Test |
---|---|---|---|---|
Data1 | 0.3182632 | 0.3231521 | 0.9218062 | 0.9144954 |
Data2 | 0.2177761 | 0.2219309 | 0.7706763 | 0.7710508 |
Data3 | 0.1957726 | 0.1909628 | 0.6790396 | 0.7146834 |
Data4 | 0.1928564 | 0.1887284 | 0.6926111 | 0.7174669 |
Smaller winkler score is better
UC should be as close to 0.90 as possible. However we have some weird results here.
Forecast Similarity
Using point LSTM forecast results, closest 20 neighbors are chosen.
The measuring was euclidean distance on 10 hour time-windows.
The results are as shown:
Data | Winkler Score | Unconditional Coverage |
---|---|---|
Data1 | 0.2419179 | 0.8235727 |
Data2 | 0.2218581 | 0.7503492 |
Data3 | 0.2319301 | 0.8365922 |
Data4 | 0.2305406 | 0.8400838 |
Results
Data | Winkler - Forecast Similarity | Winkler - QRLSTM | UC - Forecast Similarity | UC - QRLSTM |
---|---|---|---|---|
Data1 | 0.2419179 | 0.3929869 | 0.8235727 | 0.9144954 |
Data2 | 0.2218581 | 0.2660739 | 0.7503492 | 0.7710508 |
Data3 | 0.2319301 | 0.2264859 | 0.8365922 | 0.7146834 |
Data4 | 0.2305406 | 0.2241348 | 0.8400838 | 0.7174669 |
We see mixed results.
For winkler score: in first two datasets QRLSTM is significantly worse. In last two datasets QRLSTM performs a little better.
For UC Coverage: in first two datasets QRLSTM performs better. However all these results except QRLSTM on Data1 are problematic. Forecast similarity was a hand-made approach, so its UC coverage might be a little off, however QRLSTM’s inaccuracy is surprising.