Smooth Your Data Without Loosing Local and Global Extremes

Hrvoje Ljubić
3 min readFeb 7, 2024

--

In this article, we will describe the impact of data smoothing using the EWMA method (Exponentially Weighted Moving Average), but in two different ways. The first one is classic data smoothing on a window of 60 samples (in our case, 60 days).

Second approach uses multiple consecutive data smoothing, but with a smaller window. In this case, we are smoothing for 2 days, but 30 times (so that the product would be 60). I don’t know is there any name for this approach, but I’ll call it mini-window smoothing (MWS).

Even from a visual comparison (see figures below), it is clear that the second approach obtained significantly better results, and Tables 1 and 2 clearly confirm this in all the metrics we used to compare the results. The second approach has very similar variance, standard deviation and IQR (InterQuartile Range), and the minimum and maximum values are much closer to the original data set.

Also, using the R2 score and Pearson’s correlation coefficient when comparing new datasets with the original dataset confirms that the second dataset has a much higher correlation and similarity than the first one.

This small study used a dataset related to the WTI stock exchange that refers to the US oil market.

Tabular comparison

Table 1 — Comparison of statistical features of the 3 presented datasets
Table 2 — Comparison of the similarity of the original dataset with the two smoothed datasets

Visual comparison

Now let’s look at a visual representation of the original dataset, the smoothed datasets obtained using these two approaches and finally a comparison of the original data set with the new two.

Figure 1 shows a line graph of the original data set.

Figure 1 — Original dataset

Figure 2 shows a line graph of the smoothed data set using the first approach (1x60 days).

Figure 32— The dataset obtained after applying the first approach (1x60days)

Figure 3 shows a line graph of the smoothed data set using the second approach (2x30 days).

Figure 3 — The dataset obtained after applying the second approach (2x30days)

Figures 4 and 5 graphically compare the first and second approaches with the original data set, respectively.

Figure 4— Comparison of original and 1x60days smoothed dataset
Figure 5 — Comparison of original and 2x30days smoothed dataset

Conclusion

Based on the provided tables and figures, it is evident that the second approach better preserves the distribution, extreme values, and smaller movements (which were previously crucial) compared to the first approach. However, while it effectively smooths the data, it also eliminates outliers. I hope you found this article insightful and consider employing this approach in your upcoming projects.

--

--