About to be finished. You may ignore this post. (Uploaded online for page structuring purposes.)
In the first post we prepared the data.
In the second post we created trajectories with varying windowsizes. Then for different distance functions, we obtained the nearest neighbors.
Now we will use these nearest neighbors for forecasting. There are endless options to make forecasts with nearest neighbors. Some of them are
Taking mean of closest k neighbors
Taking weighted mean of closest k neighbors. Weighting can be determined - according to their order of closeness - according to their distance to the trajectory of interest
Creating a linear regression model - Linear regression would be too general (as we will show), thus running the regression models on subsets is also another option.
Before making forecasts, we first need to set benchmark results.
The most straightforward forecast would be using the previous observation as the forecast. This also can be called as random walk because we assume every new step is just a noise different from the previous one.
Autoregressive models always come to mind when analysing time-series forecasting. Thus, we fitted ARIMA and AR models to create a nice performance benchmark. Going into details is beyond the scope of this post, we only show some of the results.
Model | Station 1 | Station 2 | Station 3 |
---|---|---|---|
Naive | 1 | 1 | 1 |
AR (9) | 1 | 1 | 1 |
ARIMA | 1 | 1 | 1 |
Simpler ARIMA | 1 | 1 | 1 |
note to self: kable kullanip bu sekil tablo cikariliyor direkt. kable(df) diye.
ARIMA naiveden kotu falan cikiyor trainde de. sacma seyler var. ona sonra bakarim.
We can see that in multi-step ahead forecasting, historical similarity methods outperform benchmark results by a lot!