today... practicing some time series analysis by grabbing some of my co2 ppm data.
to keep the challenge reasonable, i selected a subset of the data, 3 weeks long, where i didn't leave my apartment for any trips (this was surprisingly hard to find). i used the first 2 weeks to develop my prediction model, then verified the model on the last week of data.
i first averaged the data in 15-minute groups, just to cut down on the computation load by a factor of 15.
the data exhibits both daily and also weekly patterns, which i factored out in the data preprocessing stages (only to invert the transformation later, of course)
finally, i found it unnecessary to detrend the data
(light blue on the left 2 weeks is my input data)
both a KPSS and ADF test suggested that the data was already quite stationary without any differencing, so i just threw it into an ARIMA model with a grid search (50 experiments) over pdq parameters with corrected aic as the criteria, and i found that (1,0,4) was the optimal parameters
finally, i ran the forecast prodecure (which quickly regressed to the mean, as expected, since the default stance to take when predicting far into the future is to regress to the mean). i also simulated 100 trajectories from the ARIMA model in order to quickly visualize the credible bounds of my forecast
the final results are shown, with dark blue being the actual observed data, yellow being the forecasted PPM (although, as mentioned previously, this quickly regresses to the seasonal means after about a day), and red being the 100 simulated futures. all in all, the forecast seems decent, as the ground truth trajectory rarely deviates from the credible intervals
the one-step prediction (generally not a very useful indication of forecast ability, as it only measures accuracy of forecasting a single step in the future (aka 15 minutes)) had an RMSE of 12.9 ppm, compared to the naive strategy (always predicting the last seen value) at 20.45, and the seasonal-trend-following strategy at 14.0 ppm.