Time Series Forecasting(Zaman Serileri Tahmini)

ANIL NEBİ ŞENTÜRK
9 min readFeb 7, 2022
https://www.educba.com/time-series-analysis/

Öncelikle Zaman Serisi Analizi nedir?

  • Bir zaman serisi,ilgilenilen bir büyüklüğün zaman içerisinde sıralanmış ölçümlerinin bir kümesidir.Zaman serisi ile ilgili bu analizin yapılma amacı ise, gözlem kümesince temsil edilen gerçeğin anlaşılması ve zaman serisindeki değişkenlerin gelecekteki değerlerinin doğru bir şekilde tahmin (forecast) edilmesidir.

Zaman serisi kavramları:
1)Stationary(durağanlık)
2)Trend
3) Seasonality(mevsimsellik)
4)Cycle(dögüsellik)

  1. Stationary(durağanlık):Serinin istatistiksel özelliklerinin zaman içerisinde değişmemesidir.
    *Bir zaman serisinin ortalaması,varyansı ve kovaryansı zaman boyunca sabit kalıyorsa,serinin durağan olduğu ifade edilir.
Stationary&Non-Stationary Series

Trend:Bir zaman serisinin uzun vadede artış veya azalışını gösterir.

Artış Trendi,durağanlık yoktur
Azalış trendi var

Seasonality(mevsimsellik):Zaman serisinin belirli bir davranışı belirli periyotlarla tekrar etmesi durumuna mevsimsellik denir.

Mevsimsellik varsa trend vardır fakat durağanlık yoktur.

Cycle(Döngüler):Mevsimsellik daha belirgin,kısa vadeli,gün ,hafta gibi mevsimsellik ile örtüşecek…
Döngüsellik ise uzun vadeli,daha belirsiz bir yapıda,gün,hafta,yıl ve mevsim ile örtüşmeyecek yapıda gerçekleşir.

Zaman Serisi Modellerinin Doğasını Anlamak:
Moving Average(hareketli ortalama):Bir zaman serisinin gelecek değeri kendisinden k adet önceki değerin ortalamasıdır.

Moving Average Formula

“Bugünü en iyi dün ile tahmin edebilirsin”

Moving Average değeri bize serinin trendini verir…

Weighted Average(Ağırlıklı ortalama):Daha sonlarda olan gözlemlere daha fazla ağırlık verme fikri taşır.

Weighted Average Formula
Moving Average&Weighted Average

Smoothing Yöntemleri:
1)Single Exponential Smoothing:Sadece durağan serilerde,trend ve mevsimsellik olmamalı.
2)Double Exponential Smoothing:Durağanlık ve trendi yakalar.
3)Triple Exponential Smoothing:Trend+level+mevsimsellik(Holt-Winters)

Single Exponential Smoothing(SES):

# Time Series
##################################################

##################################################
# Holt-Winters
##################################################


import itertools
import warnings
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.metrics import mean_absolute_error
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose
import statsmodels.tsa.api as smt
from statsmodels.tsa.statespace.sarimax import SARIMAX


warnings.filterwarnings('ignore')

Veri seti hikayesi: Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.
Period of Record: March 1958 — December 2001

data = sm.datasets.co2.load_pandas()
y = data.data #hedef değişkenimiz y seçiyoruz.
DatetimeIndex([‘1958–03–29’, ‘1958–04–05’, ‘1958–04–12’, ‘1958–04–19’,
‘1958–04–26’, ‘1958–05–03’, ‘1958–05–10’, ‘1958–05–17’,
‘1958–05–24’, ‘1958–05–31’,

‘2001–10–27’, ‘2001–11–03’, ‘2001–11–10’, ‘2001–11–17’,
‘2001–11–24’, ‘2001–12–01’, ‘2001–12–08’, ‘2001–12–15’,
‘2001–12–22’, ‘2001–12–29’],
dtype=’datetime64[ns]’, length=2284, freq=’W-SAT’)

Göründüğü üzere bu veri seti haftalık olarak cumartesi günleri tutulmuş.
Haftalık co2 veri seti tutmak bana ne katar ki?diye düşünüp bu veriyi aylık olarak düzenleyelim.
y = y[‘co2’].resample(‘MS’).mean() #veri setini aylığa çevirelim mounthsample

y.head()
Out[55]:
1958–03–01 316.100
1958–04–01 317.200
1958–05–01 317.433
1958–06–01 NaN
1958–07–01 315.625
Freq: MS, Name: co2, dtype: float64

y.isnull().sum()
#5 adet

Zaman serilerinde eksik(NaN)değerleri ortalama ile doldurmak yerine önceki ve sonraki değerlere göre doldurmak daha anlamlı.

y = y.fillna(y.bfill()) #veri setinde ki eksik değeri bir sonraki ayın ortalaması ile doldur.Çünkü veride mevsimsellik veya trend olabilir.
y.head()
1958-03-01 316.100
1958-04-01 317.200
1958-05-01 317.433
1958-06-01 315.625
1958-07-01 315.625

Freq: MS, Name: co2, dtype: float64
y.plot(figsize=(15, 6)) #seriyi görselleştirelim
plt.show()

Holdout yöntemi ile(train-test ayırımı):
# 1958'den 1997'sonuna kadar train set.
train = y[:’1997–12–01']
len(train) # 478 ay train set(478 ay)

# 1998'in ilk ayından 2001'in sonuna kadar test set.
test = y[‘1998–01–01’:]
len(test) # 48 ay(test seti)

478 ay üzerinden model kuracağız.48 Ay üzerinden kurduğumuz bu modeli test edeceğiz.# Zaman Serisi Yapısal AnaliziDurağanlık Testi (Dickey-Fuller Testi)
def is_stationary(y):

# "HO: Non-stationary"
# "H1: Stationary"

p_value = sm.tsa.stattools.adfuller(y)[1]
if p_value < 0.05:
print(F"Result: Stationary (H0: non-stationary, p-value: {round(p_value, 3)})")
else:
print(F"Result: Non-Stationary (H0: non-stationary, p-value: {round(p_value, 3)})")


is_stationary(y)
# Zaman Serisi Bileşenleri ve Durağanlık Testi
def ts_decompose(y, model="additive", stationary=False):
result = seasonal_decompose(y, model=model)
fig, axes = plt.subplots(4, 1, sharex=True, sharey=False)
fig.set_figheight(10)
fig.set_figwidth(15)

axes[0].set_title("Decomposition for " + model + " model")
axes[0].plot(y, 'k', label='Original ' + model)
axes[0].legend(loc='upper left')

axes[1].plot(result.trend, label='Trend')
axes[1].legend(loc='upper left')

axes[2].plot(result.seasonal, 'g', label='Seasonality & Mean: ' + str(round(result.seasonal.mean(), 4)))
axes[2].legend(loc='upper left')

axes[3].plot(result.resid, 'r', label='Residuals & Mean: ' + str(round(result.resid.mean(), 4)))
axes[3].legend(loc='upper left')
plt.show(block=True)

if stationary:
is_stationary(y)

# Toplamsal ve çarpımsal modeller için analiz
for model in ["additive", "multiplicative"]:
ts_decompose(y, model, stationary=True)

Level(ortalama)Single Exponential Smoothing modelliyor,Trend,mevsimsellik,modelin artıkları

Result: Non-Stationary (H0: non-stationary, p-value: 0.999)
Ho<0.05 ise ho red edebiliyorduk.Burada p-value=0.99 olduğu için ho red edemiyoruz.Dolayısıyla serimiz durağan değilidr(H1 kabul ediyoruz)

# y(t) = Level + Trend + Seasonality + Noise
# y(t) = Level * Trend * Seasonality * Noise

Mevsimsellik ve artık bileşenleri Trend tarafından şekillendirilemiyorsa seri toplamsaldır.Eğer mevsimsellik ve artık bileşen Trend tarafından şekilleniyorsa çarpımsal bir seridir.

Toplamsal Seri:y(t) = Level + Trend + Seasonality + Noise
Çarpımsal Seri:y(t) = Level * Trend * Seasonality * Noise

Single Exponential Smoothing(SES)
SES=Level modeller
Durağan serilerde kullanılır
Trend ve mevsimsellik varsa kullanılmaz
ses_model = SimpleExpSmoothing(train).fit(smoothing_level=0.5)
y_pred = ses_model.forecast(48)
mean_absolute_error(test, y_pred)#5.70
#görsel olarak modelin tahmin başarısı
train.plot(title="Single Exponential Smoothing")
test.plot()
y_pred.plot()
plt.show()

train["1985":].plot(title="Single Exponential Smoothing")
test.plot()
y_pred.plot()
plt.show()
def plot_co2(train, test, y_pred, title):
mae = mean_absolute_error(test, y_pred)
train["1985":].plot(legend=True, label="TRAIN", title=f"{title}, MAE: {round(mae,2)}")
test.plot(legend=True, label="TEST", figsize=(6, 4))
y_pred.plot(legend=True, label="PREDICTION")
plt.show()


plot_co2(train, test, y_pred, "Single Exponential Smoothing")
def ses_optimizer(train, alphas, step=48):
best_alpha, best_mae = None, float("inf")

for alpha in alphas:
ses_model = SimpleExpSmoothing(train).fit(smoothing_level=alpha)
y_pred = ses_model.forecast(step)
mae = mean_absolute_error(test, y_pred)

if mae < best_mae:
best_alpha, best_mae = alpha, mae

print("alpha:", round(alpha, 2), "mae:", round(mae, 4))
print("best_alpha:", round(best_alpha, 2), "best_mae:", round(best_mae, 4))
return best_alpha, best_mae

alphas = np.arange(0.8, 1, 0.01)
ses_optimizer(train, alphas)

best_alpha, best_mae = ses_optimizer(train, alphas)
def ses_optimizer(train, alphas, step=48):
best_alpha, best_mae = None, float("inf")

for alpha in alphas:
ses_model = SimpleExpSmoothing(train).fit(smoothing_level=alpha)
y_pred = ses_model.forecast(step)
mae = mean_absolute_error(test, y_pred)

if mae < best_mae:
best_alpha, best_mae = alpha, mae

print("alpha:", round(alpha, 2), "mae:", round(mae, 4))
print("best_alpha:", round(best_alpha, 2), "best_mae:", round(best_mae, 4))
return best_alpha, best_mae

alphas = np.arange(0.8, 1, 0.01)
ses_optimizer(train, alphas)

best_alpha, best_mae = ses_optimizer(train, alphas)0.99,4.54
# DES: Level (SES) + Trend

# Level'a ek olarak Trendi yakalayabiliyor.


des_model = ExponentialSmoothing(train, trend="add").fit(smoothing_level=0.5,
smoothing_trend=0.5)

y_pred = des_model.forecast(48)

plot_co2(train, test, y_pred, "Double Exponential Smoothing")

des_model.params

Double Exponential Smoothing

# Hyperparameter Optimization
############################


def des_optimizer(train, alphas, betas, step=48):
best_alpha, best_beta, best_mae = None, None, float("inf")
for alpha in alphas:
for beta in betas:
des_model = ExponentialSmoothing(train, trend="add").fit(smoothing_level=alpha,
smoothing_slope=beta)
y_pred = des_model.forecast(step)
mae = mean_absolute_error(test, y_pred)
if mae < best_mae:
best_alpha, best_beta, best_mae = alpha, beta, mae
print("alpha:", round(alpha, 2), "beta:", round(beta, 2), "mae:", round(mae, 4))
print("best_alpha:", round(best_alpha, 2), "best_beta:", round(best_beta, 2), "best_mae:", round(best_mae, 4))
return best_alpha, best_beta, best_mae


alphas = np.arange(0.01, 1, 0.10)
betas = np.arange(0.01, 1, 0.10)

best_alpha, best_beta, best_mae = des_optimizer(train, alphas, betas)


############################
# Final DES Model
############################

final_des_model = ExponentialSmoothing(train, trend="add").fit(smoothing_level=best_alpha,
smoothing_slope=best_beta)

y_pred = final_des_model.forecast(48)

plot_co2(train, test, y_pred, "Double Exponential Smoothing")


##################################################
# Triple Exponentıal Smoothing (Holt-Winters)
##################################################


# TES = SES + DES + Mevsimsellik

tes_model = ExponentialSmoothing(train,
trend="add",
seasonal="add",
seasonal_periods=12).fit(smoothing_level=0.5,
smoothing_slope=0.5,
smoothing_seasonal=0.5)


y_pred = tes_model.forecast(48)

plot_co2(train, test, y_pred, "Triple Exponential Smoothing")
# Hyperparameter Optimization
############################




def tes_optimizer(train, abg, step=48):
best_alpha, best_beta, best_gamma, best_mae = None, None, None, float("inf")

for comb in abg:
tes_model = ExponentialSmoothing(train, trend="add", seasonal="add", seasonal_periods=12).\
fit(smoothing_level=comb[0], smoothing_slope=comb[1], smoothing_seasonal=comb[2])
y_pred = tes_model.forecast(step)
mae = mean_absolute_error(test, y_pred)
if mae < best_mae:
best_alpha, best_beta, best_gamma, best_mae = comb[0], comb[1], comb[2], mae
print([round(comb[0], 2), round(comb[1], 2), round(comb[2], 2), round(mae, 2)])

print("best_alpha:", round(best_alpha, 2), "best_beta:", round(best_beta, 2), "best_gamma:", round(best_gamma, 2),
"best_mae:", round(best_mae, 4))

return best_alpha, best_beta, best_gamma, best_mae


alphas = betas = gammas = np.arange(0.10, 1, 0.20)
abg = list(itertools.product(alphas, betas, gammas))


best_alpha, best_beta, best_gamma, best_mae = tes_optimizer(train, abg)


############################
# Final TES Model
############################

final_tes_model = ExponentialSmoothing(train, trend="add", seasonal="add", seasonal_periods=12).\
fit(smoothing_level=best_alpha, smoothing_slope=best_beta, smoothing_seasonal=best_gamma)

y_pred = final_tes_model.forecast(48)

plot_co2(train, test, y_pred, "Triple Exponential Smoothing")



##################################################
# ARIMA(p, d, q): (Autoregressive Integrated Moving Average)
##################################################


arima_model = ARIMA(train, order=(1, 1, 1)).fit(disp=0)
arima_model.summary()

y_pred = arima_model.forecast(48)[0]
y_pred = pd.Series(y_pred, index=test.index)


plot_co2(train, test, y_pred, "ARIMA")

arima_model.plot_predict(dynamic=False)
plt.show()


############################
# Hyperparameter Optimization (Model Derecelerini Belirleme)
############################

# 1. AIC İstatistiğine Göre Model Derecesini Belirleme
# 2. ACF & PACF Grafiklerine Göre Model Derecesini Belirleme

############################
# AIC & BIC İstatistiklerine Göre Model Derecesini Belirleme
############################

# p ve q kombinasyonlarının üretilmesi
p = d = q = range(0, 4)
pdq = list(itertools.product(p, d, q))

def arima_optimizer_aic(train, orders):
best_aic, best_params = float("inf"), None
for order in orders:
try:
arma_model_result = ARIMA(train, order).fit(disp=0)
aic = arma_model_result.aic
if aic < best_aic:
best_aic, best_params = aic, order
print('ARIMA%s AIC=%.2f' % (order, aic))
except:
continue
print('Best ARIMA%s AIC=%.2f' % (best_params, best_aic))
return best_params


best_params_aic = arima_optimizer_aic(train, pdq)


############################
# Final Model
############################

arima_model = ARIMA(train, best_params_aic).fit(disp=0)
y_pred = arima_model.forecast(48)[0]
y_pred = pd.Series(y_pred, index=test.index)

plot_co2(train, test, y_pred, "ARIMA")
# ACF & PACF Grafiklerine Göre Model Derecesini Belirleme
############################

def acf_pacf(y, lags=30):
plt.figure(figsize=(12, 7))
layout = (2, 2)
ts_ax = plt.subplot2grid(layout, (0, 0), colspan=2)
acf_ax = plt.subplot2grid(layout, (1, 0))
pacf_ax = plt.subplot2grid(layout, (1, 1))
y.plot(ax=ts_ax)

# Durağanlık testi (HO: Seri Durağan değildir. H1: Seri Durağandır.)
p_value = sm.tsa.stattools.adfuller(y)[1]
ts_ax.set_title('Time Series Analysis Plots\n Dickey-Fuller: p={0:.5f}'.format(p_value))
smt.graphics.plot_acf(y, lags=lags, ax=acf_ax)
smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax)
plt.tight_layout()
plt.show()

acf_pacf(y)



#################################
# ACF Grafiği
#################################


######################
# PACF Grafiği
######################


######################
# ACF ve PACF Grafiği ile Model Derecesi Belirlemek
######################


# ACF genişliği gecikmelere göre "AZALIYORSA" ve PACF p gecikme sonra "KESILIYORSA" AR(p) modeli olduğu anlamına gelir.

# ACF genişliği q gecikme sonra "KESILIYORSA" ve PACF genişliği gecikmelere göre "AZALIYORSA" MA(q) modeli olduğu anlamına gelir.

# ACF ve PACF'nin genişlikleri gecikmelere göre azalıyorsa, ARMA modeli olduğu anlamına gelir.

df_diff = y.diff()
df_diff.dropna(inplace=True)

acf_pacf(df_diff)

##################################################
# SARIMA(p, d, q): (Seasonal Autoregressive Integrated Moving-Average)
##################################################



model = SARIMAX(train, order=(1, 0, 1), seasonal_order=(0, 0, 0, 12))
sarima_model = model.fit(disp=0)

y_pred_test = sarima_model.get_forecast(steps=48)
y_pred = y_pred_test.predicted_mean
y_pred = pd.Series(y_pred, index=test.index)

plot_co2(train, test, y_pred, "SARIMA")


############################
# Hyperparameter Optimization (Model Derecelerini Belirleme)
############################


p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

def sarima_optimizer_aic(train, pdq, seasonal_pdq):
best_aic, best_order, best_seasonal_order = float("inf"), float("inf"), None
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
sarimax_model = SARIMAX(train, order=param, seasonal_order=param_seasonal)
results = sarimax_model.fit(disp=0)
aic = results.aic
if aic < best_aic:
best_aic, best_order, best_seasonal_order = aic, param, param_seasonal
print('SARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, aic))
except:
continue
print('SARIMA{}x{}12 - AIC:{}'.format(best_order, best_seasonal_order, best_aic))
return best_order, best_seasonal_order


best_order, best_seasonal_order = sarima_optimizer_aic(train, pdq, seasonal_pdq)


############################
# Final Model
############################

model = SARIMAX(train, order=best_order, seasonal_order=best_seasonal_order)
sarima_final_model = model.fit(disp=0)

y_pred_test = sarima_final_model.get_forecast(steps=48)

# MAE
y_pred = y_pred_test.predicted_mean
y_pred = pd.Series(y_pred, index=test.index)

plot_co2(train, test, y_pred, "SARIMA")


##################################################
# Modelin İstatistiksel Çıktılarının İncelenmesi
##################################################
sarima_final_model.plot_diagnostics(figsize=(15, 12))
plt.show()




##################################################
# BONUS: MAE'ye Göre SARIMA Optimizasyonu
##################################################

p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

def sarima_optimizer_mae(train, pdq, seasonal_pdq):
best_mae, best_order, best_seasonal_order = float("inf"), float("inf"), None
for param in pdq:
for param_seasonal in seasonal_pdq:
try:
model = SARIMAX(train, order=param, seasonal_order=param_seasonal)
sarima_model = model.fit(disp=0)
y_pred_test = sarima_model.get_forecast(steps=48)
y_pred = y_pred_test.predicted_mean
mae = mean_absolute_error(test, y_pred)

# mae = fit_model_sarima(train, val, param, param_seasonal)

if mae < best_mae:
best_mae, best_order, best_seasonal_order = mae, param, param_seasonal
print('SARIMA{}x{}12 - MAE:{}'.format(param, param_seasonal, mae))
except:
continue
print('SARIMA{}x{}12 - MAE:{}'.format(best_order, best_seasonal_order, best_mae))
return best_order, best_seasonal_order

best_order, best_seasonal_order = sarima_optimizer_mae(train, pdq, seasonal_pdq)

model = SARIMAX(train, order=best_order, seasonal_order=best_seasonal_order)
sarima_final_model = model.fit(disp=0)

y_pred_test = sarima_final_model.get_forecast(steps=48)
y_pred = y_pred_test.predicted_mean
y_pred = pd.Series(y_pred, index=test.index)


plot_co2(train, test, y_pred, "SARIMA")

Referans:https://www.veribilimiokulu.com/python-ile-zaman-serisi-analizi/

vbo❤

--

--