What Is a SARIMAX Model?

 What Is a SARIMAX Model?


SARIMAX model

What Is a SARIMAX model? 

Although we have dedicated a series of blog posts to time series models, we are yet to discuss one very important topic – seasonality.

Each of the models we examined so far – be it ARMAARMAARIMA or ARIMAX has a seasonal equivalent.

As you can probably guess, the names for these counterparts will be SARMA, SARIMA, and SARIMAX respectively, with the “S” representing the seasonal aspect.

Therefore, the full name of the model would be Seasonal Autoregressive Integrated Moving Average Exogenous model.

We can all agree that it’s a mouthful, so we’ll stick with the abbreviation.

Additionally, the SARMA and SARIMA can be considered simpler cases of the SARIMAX, where we don’t use integration or exogenous variables, so we’ll mainly focus our attention to the SARIMAX in this tutorial.

What Is Seasonality?

In case you need a hint, seasonality occurs when certain patterns aren’t consistent, but appear periodically. For instance, check out the weekly YouTube searches for Christmas songs like “Jingle Bells”.

Seasonality refers to the regular and predictable pattern of fluctuations in a time series data that occurs at specific intervals or periods within a year. 

These recurring patterns are often influenced by natural or calendar-related factors, such as holidays, weather conditions, or cultural events. 

Seasonality is commonly observed in various industries, including retail, tourism, agriculture, and finance. 

Understanding and accounting for seasonality is crucial in data analysis and forecasting as it helps identify trends and patterns that repeat annually. 

By recognizing seasonality in data, businesses can make informed decisions, adjust marketing strategies, manage inventory levels, and optimize resource allocation to maximize efficiency and profitability throughout the year.

Seasonality example: A graph representing interest over time via weekly youtube searches of jingle bells

These occur much more frequently over the festive period in December every year. However, the number of times these songs are played is usually a lot lower in June or July.

Therefore, a simple autoregressive component won’t describe the data well.

To elaborate, a simple AR component would severely understate the number of times Christmas songs are played in December, based on the stats from November (1 lag ago). At the same time, it would also greatly overstate the number in January, basing them off of the values recorded in December, since this genre usually experiences a dip after Christmas.

How Do We Handle Seasonality?

To account for such a pattern, we need to include the values recorded during the previous festive period into the model. In this specific example, that would mean relying on the number of times the songs were played last December. Of course, we CAN also include the data from two Decembers back, or even more.

Seasonality: a Jingle Bells seasonality example with a formula that includes the values recorded during the previous festive period into the model

It’s a bit like having another series which is further spread out in time than our original one. Going back to the musical example, the original time series contains values a month apart, while the seasonal one would hold values 12 months apart.

Seasonality formula explained: the original time series contains values a month apart, while the seasonal one would hold values 12 months apart.

The SARIMAX Model Definition

Now that we’re familiar with the general idea of seasonal models, let’s look at the notation we use and what each value means. Compared to the ARIMAX, the SARIMAX requires 4 additional orders.

SARIMAX model definition and number of orders

This might sound like a lot, but there’s no need to worry!

The first 3 of these 4 orders are just seasonal versions of the ARIMA orders.

SARIMAX model explanation: the first 3 of these 4 orders are just seasonal versions of the ARIMA orders

In other words, we have a seasonal autoregressive order denoted by upper-case P, an order of seasonal integration denoted by upper-case D, and a seasonal moving average order signified by upper-case Q. To make differentiation easier, econometricians have agreed to use lower-case letters for their non-seasonal equivalents.

SARIMAX model order notation

The 4th, and last, order is the length of the cycle. For instance, if we have hourly data, and the cycle length is 24, then the seasonal pattern appears once every 24 hours.

What Is the Length of the Cycle in Seasonal Models?

Another way to think about it is “The number of periods necessary to pass before the tendency reappears”. If we want to inspect a seasonal trend, we need to make sure to set the appropriate cycle length. We represent the last order with a lower-case “s” because it sets the length of each season.

How Do We Interpret Seasonal Orders?

Let’s quickly explain how the 4 new orders work in unison.

Essentially, the length – “s”, - expresses how far away the seasonal components will be from the current period. 

So, if we have a model with seasonal orders of (2,0,1 and 5), then we’re including the lagged values from 5, and 10 periods ago, as well as the error term from 5 periods ago. 

Each cycle is “5” periods long and we’re taking 2 lagged seasonal values. So, we’re simply including the values from 5 and 10 periods ago. Similarly, we add the error term from 5 periods ago.

SARIMAX model: interpretation of seasonal orders

To generalize, we’re interested in every “s”-th value. We start from the “s”-th and go all the way up to “s, times p”. 

The equivalent is true for seasonal integrated values and seasonal errors as well.

every “s”-th value

What Is the Equation of a SARIMAX Model?

Let’s see what the equation of a SARIMAX model of order (1,0,1) and a seasonal order (2,0,1,5) looks like.

Equation of a SARIMAX model of order (1,0,1)

The interesting part here is that every seasonal component also comprises additional lagged values. 

If you want to learn why that is so, you can find a detailed explanation of the math behind the SARIMAX model here.

So, what can we see from the equation? 

The total number of coefficients we are estimating equals the sum of seasonal and non-seasonal AR and MA orders. In other words, we’re looking at a total of “P plus Q, plus, p plus q” – many coefficients.

Explanation of the SARIMAX model equation

The non-seasonal ones are expressed with lower-case ϕ and θ; while their seasonal counterparts are expressed with upper-case Φ and Θ respectively. Just like with the orders, the capital letters denote the seasonal components and the lower-case ones - the non-seasonal.

So, this is the basic knowledge of seasonal models you need. However, if you want to learn more about time series and time-series data, make sure to check out our article on the topic and enroll in our Time Series Analysis with Python course.

If you’re new to Python, and you’re enthusiastic to learn more, this comprehensive article on learning Python programming will guide you all the way from the installation, through Python IDEs, Libraries, and frameworks, to the best Python career paths and job outlook.   

Browse Categories

Learn data science with industry experts

Try For Free

Post a Comment

Previous Post Next Post

Contact Form