Smooth Your Data Without Loosing Local and Global Extremes
In this article, we will describe the impact of data smoothing using the EWMA method (Exponentially Weighted Moving Average), but in two different ways. The first one is classic data smoothing on a window of 60 samples (in our case, 60 days).
Second approach uses multiple consecutive data smoothing, but with a smaller window. In this case, we are smoothing for 2 days, but 30 times (so that the product would be 60). I don’t know is there any name for this approach, but I’ll call it mini-window smoothing (MWS).
Even from a visual comparison (see figures below), it is clear that the second approach obtained significantly better results, and Tables 1 and 2 clearly confirm this in all the metrics we used to compare the results. The second approach has very similar variance, standard deviation and IQR (InterQuartile Range), and the minimum and maximum values are much closer to the original data set.
Also, using the R2 score and Pearson’s correlation coefficient when comparing new datasets with the original dataset confirms that the second dataset has a much higher correlation and similarity than the first one.
This small study used a dataset related to the WTI stock exchange that refers to the US oil market.
Tabular comparison
Visual comparison
Now let’s look at a visual representation of the original dataset, the smoothed datasets obtained using these two approaches and finally a comparison of the original data set with the new two.
Figure 1 shows a line graph of the original data set.
Figure 2 shows a line graph of the smoothed data set using the first approach (1x60 days).
Figure 3 shows a line graph of the smoothed data set using the second approach (2x30 days).
Figures 4 and 5 graphically compare the first and second approaches with the original data set, respectively.
Conclusion
Based on the provided tables and figures, it is evident that the second approach better preserves the distribution, extreme values, and smaller movements (which were previously crucial) compared to the first approach. However, while it effectively smooths the data, it also eliminates outliers. I hope you found this article insightful and consider employing this approach in your upcoming projects.