{"id":47,"date":"2023-10-19T20:23:09","date_gmt":"2023-10-19T19:23:09","guid":{"rendered":"https:\/\/drdustinchambers.com\/?p=47"},"modified":"2023-10-19T20:23:10","modified_gmt":"2023-10-19T19:23:10","slug":"orthodox-time-series-estimation-arima","status":"publish","type":"post","link":"https:\/\/drdustinchambers.com\/index.php\/2023\/10\/19\/orthodox-time-series-estimation-arima\/","title":{"rendered":"Orthodox Time Series Estimation: ARIMA"},"content":{"rendered":"\n<p>To help data scientists learn some of the basic concepts behind time series forecasting, I&#8217;m planning to write a series of brief posts on the topic, starting with autoregressive integrated moving-average (ARIMA) models, a classic technique used by economists for over a quarter century. I will be primarily focusing on intuition and common pitfalls, and will avoid most of the econometric theory behind time series forecasting (which is an expansive field).<\/p>\n\n\n\n<p>Please note that all of the examples in this series will use the Python programming language and various libraries and packages. While most economists use closed-source, specialty stats packages like SAS, Eviews, and Stata to do time series forecasting, I&#8217;m focusing on Python because it is quickly becoming the dominant tool in data science and is preferred by many analysts over R, a very good alternative, open-source platform.<\/p>\n\n\n\n<p>ARIMA models are an extension of autoregressive moving-average (ARMA) models, and can handle non-stationary time series, while ARMA models cannot. Briefly, stationary time series are well behaved, and have a finite and constant mean and variance over time. Such processes are mean reverting. By contrast, the variance of non-stationary time series are time varying and increase over time. Such processes tend to wander aimlessly (hence the name &#8216;drunkard&#8217;s walk&#8217; or &#8216;random walk&#8217;) and can travel an arbitrarily far distance from their unconditional mean. As a rule of thumb, if a time series is highly persistent but follows no discernible pattern over time (like a stock price) or if a series trends away from its initial value without limit, that is a strong indication of a non-stationary series.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ARMA Models<\/h3>\n\n\n\n<p>Before jumping into ARIMA estimation, it&#8217;s helpful to define a basic ARMA(p,q) model, which is a combination of an autoregressive model of order p combined with a moving average model of order q:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[ y_t = \\alpha + \\beta_1 y_{t-1} + \\ldots + \\beta_p y_{t-p} + \\epsilon_t + \\gamma_1 \\epsilon_{t-1} + \\ldots + \\gamma_q \\epsilon_{t-q}\\]<script src=\"https:\/\/drdustinchambers.com\/wp-includes\/js\/dist\/vendor\/wp-polyfill-inert.min.js?ver=3.1.2\" id=\"wp-polyfill-inert-js\"><\/script>\n<script src=\"https:\/\/drdustinchambers.com\/wp-includes\/js\/dist\/vendor\/regenerator-runtime.min.js?ver=0.14.0\" id=\"regenerator-runtime-js\"><\/script>\n<script src=\"https:\/\/drdustinchambers.com\/wp-includes\/js\/dist\/vendor\/wp-polyfill.min.js?ver=3.15.0\" id=\"wp-polyfill-js\"><\/script>\n<script src=\"https:\/\/drdustinchambers.com\/wp-includes\/js\/dist\/hooks.min.js?ver=c6aec9a8d4e5a5d543a1\" id=\"wp-hooks-js\"><\/script>\n<script src=\"https:\/\/drdustinchambers.com\/wp-includes\/js\/dist\/i18n.min.js?ver=7701b0c3857f914212ef\" id=\"wp-i18n-js\"><\/script>\n<script id=\"wp-i18n-js-after\">\nwp.i18n.setLocaleData( { 'text direction\\u0004ltr': [ 'ltr' ] } );\n<\/script>\n<script  async src=\"https:\/\/cdnjs.cloudflare.com\/ajax\/libs\/mathjax\/2.7.7\/MathJax.js?config=TeX-MML-AM_CHTML\" id=\"mathjax-js\"><\/script>\n<\/div>\n\n\n\n<p>Where in the above equation, <em>y<\/em> is the variable of interest (i.e., what we want to forecast) and \u03b5 is a period-specific random error term (i.e., noise). If the time series is non-stationary, then typically the coefficient on the first lag of <em>y<\/em> &#8211; i.e., \u03b2<sub>1<\/sub> &#8211; is very close (or equal) to 1. In such a case, <em>y<\/em> is a long-memory process since any shock to <em>y<\/em> completely propagates forward to the next value of <em>y<\/em>. In other words, shocks are not dampened over time but persist. This is in sharp contrast to a stationary AR process, where shocks to <em>y<\/em> rapidly diminish, and in the absence of any additional shocks, the series converges to the long-run unconditional mean (E(y)).<\/p>\n\n\n\n<p>If the variable of interest is non-stationary, then we say that it has a &#8220;unit root&#8221; and regular estimation techniques (OLS) will fail to produce unbiased estimates of our model parameters. To complicate matters, it is possible that our variable of interest is behaving like a non-stationary process but contains an exogenous trend component &#8211; i.e., the level of y increases by a fixed increment (\u03b4) each time period:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[ y_t = \\alpha + \\delta t + \\sum_{i=1}^{p} \\beta_i y_{t-i} + \\sum_{j=0}^{q} \\gamma_j \\epsilon_{t-j} \\]<\/div>\n\n\n\n<p>In this case, one can simply add a trend term to an ARMA model and proceed with conventional estimation techniques (i.e., the series is &#8220;trend stationary&#8221;).  Thus, there is a need to properly test for the presence of a unit root. For this task, we will use the augmented Dickey-Fuller (ADF) test.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ADF Test<\/h3>\n\n\n\n<p>An important result in time series statistics is that moving-average processes (i.e., MA(q)) can be &#8220;inverted&#8221; and represented as infinite autoregressive processes (AR(\u221e)).  Therefore, an ARMA(p,q) can be practically represented by an AR(m), where m is finite but large enough to sufficiently capture the dynamics of the MA(q) process. Therefore, the ADF test frames both the null and alternative hypotheses in terms of an AR(m) processes, not an ARMA(p,q) process. Through some clever algebraic substitution, the null (H<sub>0<\/sub>) and alternative (H<sub>A<\/sub>) hypotheses can be expressed as follows:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[ H_0: y_t = a + y_{t-1} + \\sum_{i=1}^{m} \\beta_i \\Delta y_{t-i} + \\epsilon_{t} \\]\n\\[ H_A: y_t = b_0 + b_1 t + \\beta_1 y_{t-1} + \\sum_{i=1}^{m} \\beta_i \\Delta y_{t-i} + \\epsilon_{t} \\]<\/div>\n\n\n\n<p>Therefore, under the null hypothesis, the ADF test estimates the following regression equation:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[ \\Delta y_t = a_0 + a_1 t + (\\beta_1 &#8211; 1) y_{t-1} + \\sum_{i=1}^{m} \\beta_i \\Delta y_{t-i} + \\epsilon_{t} \\]<\/div>\n\n\n\n<p>where under our null hypothesis, a<sub>1<\/sub> = 0 and \u03b2<sub>1<\/sub> = 1.  Since |\u03b2<sub>1<\/sub>| &lt; 1 under the alternative hypothesis, the coefficient on <em>y<\/em><sub>t-1<\/sub> in the above regression must be weakly negative, since (\u03b2<sub>1<\/sub> -1) = 0 under the null and (\u03b2<sub>1<\/sub> &#8211; 1) &lt; 0 under the alternative hypothesis.  Therefore, the ADT performs a <em>one-sided test<\/em> on the coefficient (\u03c1) on <em>y<\/em><sub>t-1<\/sub>:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[ H_0: \\rho = (1 &#8211; \\beta_1) = 0 \\]<\/div>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[ H_A: \\rho = (1 &#8211; \\beta_1) &lt; 0 \\]<\/div>\n\n\n\n<p>Under the null hypothesis of a unit root, the simple t-statistic on rho used to perform the one-sided test <strong>does not follow a typical t-distribution<\/strong>, and therefore your stats package will produce and report the appropriate critical values for your test. Therefore, you cannot simply look-up the critical values for this test in a t-distribution table. Also, note that the null hypothesis is that there is a unit root. Thus, if we fail to reject the null hypothesis, then we confirm the presence of a unit root.  You must reject the null hypothesis to conclude that you have a trend stationary process.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ARIMA Models<\/h3>\n\n\n\n<p>ARIMA models look very similar to ARMA models, with the proviso that the original series is non-stationary, and requires d first differences in order to remove any unit roots and make the series stationary.  For example, if d=1, then a single first differencing of the data (which transforms the data from levels (y<sub>t<\/sub>) to period-over-period changes (\u0394y<sub>t<\/sub>)) is required. In this example, the &#8220;order of integration&#8221; is one, also written I(1). With this detail in mind, ARIMA models are characterized by three parameter choices &#8211; ARIMA(p,d,q). After taking the required differences (d) to make the time series stationary, estimation proceeds as it would for an ARMA(p,q) model, except that all variables are in differenced form:<\/p>\n\n\n\n<div class=\"wp-block-mathml-mathmlblock\">\\[ \\Delta^d y_t = a + \\sum_{i=1}^{p} \\beta_i \\Delta^d y_{t-i} + \\sum_{j=0}^{q} \\gamma_j \\Delta^d \\epsilon_{t-j} \\]<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">ARIMA Forecasting Example: US Housing Prices<\/h3>\n\n\n\n<p>For this example, I&#8217;m going to forecast the <a href=\"https:\/\/fred.stlouisfed.org\/series\/CSUSHPISA\" target=\"_blank\" rel=\"noopener\" title=\"\">S&amp;P Case-Shiller US National Home Price Index<\/a> through the end of 2023. To make my life easy, I will be using the seasonally adjusted series from the St. Louis Federal Reserve&#8217;s FRED II database. This is a monthly time series that runs from January 1987 to July 2023 as of the time of writing.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"350\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/fredgraph-case-shiller-index-1024x350.png\" alt=\"\" class=\"wp-image-79\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/fredgraph-case-shiller-index-1024x350.png 1024w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/fredgraph-case-shiller-index-300x102.png 300w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/fredgraph-case-shiller-index-768x262.png 768w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/fredgraph-case-shiller-index.png 1318w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>To perform this forecasting exercise, I used Python and various libraries and packages.  I originally did this work in <strong>Jupyter-Notebook<\/strong>, and you can <a href=\"https:\/\/github.com\/drdlchambers\/timeseries\/tree\/master\/basic_arima\" target=\"_blank\" rel=\"noopener\" title=\"\">download<\/a> that notebook and the data used in this exercise.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Step 1 &#8211; Load Packages and Dataset<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Import Packages<\/span>\n<span style=\"color: green; font-weight: bold;\">import<\/span> warnings\n<span style=\"color: green; font-weight: bold;\">import<\/span> itertools\n<span style=\"color: green; font-weight: bold;\">import<\/span> pandas <span style=\"color: green; font-weight: bold;\">as<\/span> pd\n<span style=\"color: green; font-weight: bold;\">import<\/span> numpy <span style=\"color: green; font-weight: bold;\">as<\/span> np\n<span style=\"color: green; font-weight: bold;\">import<\/span> statsmodels.api <span style=\"color: green; font-weight: bold;\">as<\/span> sm\n<span style=\"color: green; font-weight: bold;\">import<\/span> matplotlib.pyplot <span style=\"color: green; font-weight: bold;\">as<\/span> plt\nplt.style.use(<span style=\"color: red;\">'fivethirtyeight'<\/span>)\n<\/span>\n<span style=\"font-family: monospace; font-size: 1.1em;\"><span style=\"color: #008cba;\"># Load SA Case-Shiller US National Home Price Index<\/span>\ndf_sa <span style=\"color: purple; font-weight: bold;\">=<\/span> pd.read_csv(<span style=\"color: red;\">\"CSUSHPISA.csv\"<\/span>, parse_dates<span style=\"color: purple; font-weight: bold;\">=<\/span>&#91;<span style=\"color: red;\">\"DATE\"<\/span>], index_col<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red;\">\"DATE\"<\/span>)\ndf_sa.rename(columns <span style=\"color: purple; font-weight: bold;\">=<\/span> {<span style=\"color: red;\">'CSUSHPISA'<\/span>:<span style=\"color: red;\">'home_price'<\/span>}, inplace <span style=\"color: purple; font-weight: bold;\">=<\/span> <span style=\"color: green;\"><b>True<\/b><\/span>)\n<\/span>\n<span style=\"font-family: monospace; font-size: 1.1em;\"><span style=\"color: #008cba;\"># Check for Missing Values<\/span>\n<span style=\"color: green;\"><b>if<\/b><\/span> (df_sa.isnull().values.any()):\n    <span style=\"color: green;\">print<\/span>(<span style=\"color: red;\">'The df_sa data frame has missing observations'<\/span>)\n<span style=\"color: green;\"><b>else<\/b><\/span>:\n    <span style=\"color: green;\">print<\/span>(<span style=\"color: red;\">'The df_sa data frame has NO missing observations'<\/span>)\n<\/span><\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Step 2 &#8211; Perform ADF Test<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Create custom function to perform augmented Dickey-Fuller test<\/span>\n<span style=\"color: green; font-weight: bold;\">from<\/span> statsmodels.tsa.stattools <span style=\"color: green; font-weight: bold;\">import<\/span> adfuller\n\n<span style=\"color: green; font-weight: bold;\">def<\/span> <span style=\"color: blue;\">run_adftest<\/span>(ts):\n    \n    augmented_df_result = adfuller(ts)\n    adf_stat = augmented_df_result&#91;<span style=\"color: green;\">0<\/span>]\n    adf_pvalue = augmented_df_result&#91;<span style=\"color: green;\">1<\/span>]\n    \n    <span style=\"color: #008cba;\"># Report Results<\/span>\n    <span style=\"color: green;\">print<\/span>(<span style=\"color: red;\">'The Augmented Dickey Fuller Test Statistic is: %f'<\/span> <span style=\"color: purple; font-weight: bold;\">%<\/span> adf_stat)\n    <span style=\"color: green;\">print<\/span>(<span style=\"color: red;\">'with a p-value of: %f'<\/span> <span style=\"color: purple; font-weight: bold;\">%<\/span> adf_pvalue)\n    <span style=\"color: green; font-weight: bold;\">if<\/span> (adf_pvalue <span style=\"color: purple; font-weight: bold;\">&lt;=<\/span> 0.05):\n        <span style=\"color: green;\">print<\/span>(<span style=\"color: red;\">'therefore, we reject the null hypothesis of a unit root'<\/span>)\n    <span style=\"color: green; font-weight: bold;\">else<\/span>:\n        <span style=\"color: green;\">print<\/span>(<span style=\"color: red;\">'therefore, we CANNOT reject the null hypothesis of a unit root'<\/span>)\n    \n    <span style=\"color: green; font-weight: bold;\">return None<\/span>\n\nadf_result = run_adftest(df_sa.home_price)<\/span><\/code><\/pre>\n\n\n\n<p>The test returns the following results:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\">The Augmented Dickey Fuller Test Statistic is: 1.028806\nwith a p-value of: 0.994561\ntherefore, we CANNOT reject the null hypothesis of a unit root<\/span><\/code><\/pre>\n\n\n\n<p>Given the strong long-run upward trend in US home prices, it isn&#8217;t surprising that the home price series is non-stationary. Before proceeding, it is a good idea to look at a plot of the first-differenced series.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Plot the first difference of the home price index<\/span>\nd_home_price.plot(figsize<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">15<\/span>, <span style=\"color: green;\">7<\/span>))\nplt.show()<\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"478\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/d_home_price-1-1024x478.png\" alt=\"\" class=\"wp-image-135\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/d_home_price-1-1024x478.png 1024w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/d_home_price-1-300x140.png 300w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/d_home_price-1-768x358.png 768w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/d_home_price-1.png 1500w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Interestingly, this series still doesn&#8217;t look anything like random noise, but instead is highly persistent. For example, from about 1991 to about 2005, the pace of month-over-month changes in home prices increased steadily. This reflects the growing bubble in real estate prices that eventually reversed ahead of the financial crisis and Great Recession of 2008-09. From 2009 to 2014, there is a very volatile but nonetheless upward trend in month-over-month price changes, with stable appreciation from 2014 to 2020, followed by arguably the emergence of another real estate bubble in response to COVID-related home buying. This pattern of hot and cold cycles in the housing market may be a sign that the first differenced series still contains a unit root.<\/p>\n\n\n\n<p>Short of another ADF test, we can plot the autocorrelation function (ACF) and the partial autocorrelation (PACF). The ACF displays the raw correlation between values of our series (\u0394<i>y<\/i><sub>t<\/sub>) and lagged values of itself &#8211; e.g., corr(\u0394<i>y<\/i><sub>t<\/sub>,\u0394<i>y<\/i><sub>t-5<\/sub>). For stationary time series, the lagged correlations should decay very quickly and only a few should be statistically significant (i.e., bars above the shaded region). For non-stationary series, the ACF tends to decay very slowly and a large number of lags are statistically significant.<\/p>\n\n\n\n<p>The PACF displays correlations between values of our series (\u0394<em>y<\/em><sub>t<\/sub>) and lagged values of itself while holding the contributions of the intermediary periods fixed.  For example, the PACF between our series (\u0394<i>y<\/i><sub>t<\/sub>) and itself lagged 5 periods (\u0394<i>y<\/i><sub>t-5<\/sub>) holds the resulting variation in \u0394<i>y<\/i><sub>t-4<\/sub>, \u0394<i>y<\/i><sub>t-3<\/sub>, \u0394<i>y<\/i><sub>t-2<\/sub>, and \u0394<i>y<\/i><sub>t-1<\/sub> fixed. Thus in this example, the PACF reveals the <i>direct<\/i> impact of shocks five periods in the past on the present value, while holding any transmission of the shock through the subsequent lagged values fixed. Past shocks typically have a ripple effect like a stone hitting a calm lake, causing a wave to propagate through time and space. In this analogy, the variation in the surface of the water five feet from the impact of the stone depends on the wave that travels outward from that point of impact, disturbing the surface one, two, three, four, and finally five feet outward. With the PACF, we are explicitly ruling this ripple effect out, and are trying to isolate the direct impact of a shock on a future value of the series. In a stationary AR(p) time series, the PACF contains (p) statistically significant bars, but they should each be well below 1.0 in value. For a series with a unit root, the value of the PACF corresponding to the first lag is typically very close to 1.0 and all of the subsequent bars are very low and statistically insignificant. <\/p>\n\n\n\n<p>To plot both the ACF and PACF, run the following Python code:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Plot the First difference and the ACF and PACF<\/span>\n\n<span style=\"color: green; font-weight: bold;\">from<\/span> statsmodels.graphics.tsaplots <span style=\"color: green; font-weight: bold;\">import<\/span> plot_acf, plot_pacf\n\nfig, axes <span style=\"color: purple; font-weight: bold;\">=<\/span> plt.subplots(nrows<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">2<\/span>, ncols<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">1<\/span>, figsize<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">6<\/span>,<span style=\"color: green;\">8<\/span>))\nax1, ax2 <span style=\"color: purple; font-weight: bold;\">=<\/span> axes.flatten()\nplot_acf(d_home_price, ax<span style=\"color: purple; font-weight: bold;\">=<\/span>ax1)\nplot_pacf(d_home_price, ax<span style=\"color: purple; font-weight: bold;\">=<\/span>ax2)\nplt.savefig(<span style=\"color: red;\">'acf_and_pacf_d1.png'<\/span>)\nplt.show()<\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"800\" data-id=\"131\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d1.png\" alt=\"\" class=\"wp-image-131\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d1.png 600w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d1-225x300.png 225w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/figure>\n<\/figure>\n\n\n\n<p>Given the clear evidence that a unit root is present in the first differenced time series, we rerun the ADF test on this series.  <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Run the ADF Test of the 1st Differenced Data<\/span>\nadf_result <span style=\"color: purple; font-weight: bold;\">=<\/span> run_adftest(d_home_price)<\/span><\/code><\/pre>\n\n\n\n<p>The results confirm our suspicion:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\">The Augmented Dickey Fuller Test Statistic is: -2.703482\nwith a p-value of: 0.073438\ntherefore, we CANNOT reject the null hypothesis of a unit root<\/span><\/code><\/pre>\n\n\n\n<p>Now, let&#8217;s plot the second differenced series along with the corresponding ACF and PACF functions.  The second differenced series looks a lot more like random noise, and the ACF and PACF plots look clean.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Now, we will second difference the SA Home Price Series<\/span>\nd2_home_price = d_home_price.diff().dropna()\n<\/span>\n<span style=\"font-family: monospace; font-size: 1.1em;\"><span style=\"color: #008cba;\"># Plot the Second difference and the ACF and PACF<\/span>\nfig, axes <span style=\"color: purple; font-weight: bold;\">=<\/span> plt.subplots(nrows<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">3<\/span>, ncols<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">1<\/span>, figsize<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">10<\/span>,<span style=\"color: green;\">12<\/span>))\nax1, ax2, ax3 <span style=\"color: purple; font-weight: bold;\">=<\/span> axes.flatten()\nax1.plot(d2_home_price)\nplot_acf(d2_home_price, ax<span style=\"color: purple; font-weight: bold;\">=<\/span>ax2)\nplot_pacf(d2_home_price, ax<span style=\"color: purple; font-weight: bold;\">=<\/span>ax3)\nplt.savefig(<span style=\"color: red;\">'acf_and_pacf_d2.png'<\/span>)\nplt.show()<\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"853\" height=\"1024\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d2-853x1024.png\" alt=\"\" class=\"wp-image-158\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d2-853x1024.png 853w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d2-250x300.png 250w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d2-768x922.png 768w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/acf_and_pacf_d2.png 1000w\" sizes=\"(max-width: 853px) 100vw, 853px\" \/><\/figure>\n\n\n\n<p>To formally verify that the second difference of the home price series doesn&#8217;t contain a unit root, we re-run the ADF test:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Run the ADF Test of the 2nd Differenced Data<\/span>\nadf_result <span style=\"color: purple; font-weight: bold;\">=<\/span> run_adftest(d2_home_price)<\/span><\/code><\/pre>\n\n\n\n<p>The test results verify our hypothesis:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\">The Augmented Dickey Fuller Test Statistic is: -7.251626\nwith a p-value of: 0.000000\ntherefore, we reject the null hypothesis of a unit root<\/span><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Step 3 &#8211; ARIMA Forecasting<\/h2>\n\n\n\n<p>We will withhold the last five years of data (60 monthly observations) to evaluate the performance of our various, estimated ARIMA models. This is conceptually the same concept of a testing dataset (from a train\/test split) in data science parlance, but is commonly referred to as a &#8220;hold back&#8221; in econometrics.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Train\/Test Split (hold back the last 5 years = 60 monthly obs.)<\/span>\n\nhome_price <span style=\"color: purple; font-weight: bold;\">=<\/span> df_sa.home_price\n\nhp_train <span style=\"color: purple; font-weight: bold;\">=<\/span> home_price&#91;:(<span style=\"color: green;\">len<\/span>(home_price)<span style=\"color: purple; font-weight: bold;\">-<\/span><span style=\"color: green;\">60<\/span>)]\nhp_test <span style=\"color: purple; font-weight: bold;\">=<\/span> home_price&#91;(<span style=\"color: green;\">len<\/span>(home_price)<span style=\"color: purple; font-weight: bold;\">-<\/span><span style=\"color: green;\">60<\/span>):]<\/span><\/code><\/pre>\n\n\n\n<p>To compare the accuracy of our estimated\/trained models over the testing\/hold-back dataset, we will introduce two complementary metrics, mean absolute percentage error (MAPE) and root mean squared error (RMSE):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Evaluate Forecast Accuracy: MAPE and RMSE<\/span>\n\n<span style=\"color: green; font-weight: bold\">def<\/span> <span style=\"color: blue;\">forecast_mape<\/span>(actual,forecast):\n    abs_norm_resid <span style=\"color: purple; font-weight: bold;\">=<\/span> np.abs((forecast <span style=\"color: purple; font-weight: bold;\">-<\/span> actual)<span style=\"color: purple; font-weight: bold;\">\/<\/span>actual)\n    mape <span style=\"color: purple; font-weight: bold;\">=<\/span> np.mean(abs_norm_resid)\n    <span style=\"color: green; font-weight: bold\">return<\/span> <span style=\"color: green;\">100<\/span><span style=\"color: purple; font-weight: bold;\">*<\/span>mape\n\n<span style=\"color: green; font-weight: bold\">def<\/span> <span style=\"color: blue;\">forecast_rmse<\/span>(actual,forecast):\n    mean_sq_resid <span style=\"color: purple; font-weight: bold;\">=<\/span> np.mean((forecast <span style=\"color: purple; font-weight: bold;\">-<\/span> actual)<span style=\"color: purple; font-weight: bold;\">**<\/span><span style=\"color: green;\">2<\/span>)\n    <span style=\"color: green; font-weight: bold\">return<\/span> mean_sq_resid<span style=\"color: purple; font-weight: bold;\">**<\/span>(<span style=\"color: green;\">0.5<\/span>)<\/span><\/code><\/pre>\n\n\n\n<p>Now, we will perform a grid search over the space formed by our model parameters (p x d x q) for the best performing ARIMA model using the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) to guide model selection. There is strong evidence that the home price series is I(2), but there is a risk of over-differencing our model and degrading forecasting performance. Therefore, we will consider a wide range of both (possibly) weakly stationary I(1) and stationary I(2) forecasting models and see which has the best performance as measured by the AIC and BIC.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Grid Search for best performing ARIMA model<\/span>\n<span style=\"color: green; font-weight: bold\">from<\/span> statsmodels.tsa.arima.model <span style=\"color: green; font-weight: bold\">import<\/span> ARIMA\n\nwarnings.filterwarnings(<span style=\"color: red\">\"ignore\"<\/span>) <span style=\"color: #008cba;\"># specify to ignore warning messages<\/span>\n\n<span style=\"color: #008cba;\"># Define the p, d and q parameters to take non-negative integer values<\/span>\np <span style=\"color: purple; font-weight: bold;\">=<\/span> <span style=\"color: green\">range<\/span>(<span style=\"color: green\">1<\/span>, <span style=\"color: green\">4<\/span>) <span style=\"color: #008cba;\"># AR term goes back 1 and 3 quarters<\/span>\nd <span style=\"color: purple; font-weight: bold;\">=<\/span> <span style=\"color: green\">range<\/span>(<span style=\"color: green\">1<\/span>, <span style=\"color: green\">3<\/span>) <span style=\"color: #008cba;\"># Order of integration is either 1 or 2<\/span>\nq <span style=\"color: purple; font-weight: bold;\">=<\/span> <span style=\"color: green\">range<\/span>(<span style=\"color: green\">0<\/span>, <span style=\"color: green\">4<\/span>) <span style=\"color: #008cba;\"># MA term goes back 0 to 3 quarters<\/span>\n\n<span style=\"color: #008cba;\"># Generate all different combinations of p, q and q triplets (take Cartesian Product = p x d x q)<\/span>\npdq <span style=\"color: purple; font-weight: bold;\">=<\/span> <span style=\"color: green\">list<\/span>(itertools.product(p, d, q))\n\nmodel_parameters_and_performance <span style=\"color: purple; font-weight: bold;\">=<\/span> &#91;]\n\n<span style=\"color: green; font-weight: bold\">for<\/span> param <span style=\"color: green; font-weight: bold\">in<\/span> pdq:\n    \n    <span style=\"color: green; font-weight: bold\">try<\/span>:\n        model <span style=\"color: purple; font-weight: bold;\">=<\/span> ARIMA(hp_train, order=param)\n        results <span style=\"color: purple; font-weight: bold;\">=<\/span> model.fit()\n        results_aic <span style=\"color: purple; font-weight: bold;\">=<\/span> results.aic\n        results_bic <span style=\"color: purple; font-weight: bold;\">=<\/span> results.bic\n        \n        <span style=\"color: #008cba;\"># Forecast 60 months ahead<\/span>\n        forecast <span style=\"color: purple; font-weight: bold;\">=<\/span> results.forecast(<span style=\"color: green\">60<\/span>)\n        precision <span style=\"color: purple; font-weight: bold;\">=<\/span> forecast_mape(hp_test,forecast)\n        fcast_rmse <span style=\"color: purple; font-weight: bold;\">=<\/span> forecast_rmse(hp_test,forecast)\n        \n        next_row <span style=\"color: purple; font-weight: bold;\">=<\/span> (param&#91;<span style=\"color: green\">0<\/span>], param&#91;<span style=\"color: green\">1<\/span>], param&#91;<span style=\"color: green\">2<\/span>], results_aic, results_bic, precision, fcast_rmse)\n        \n        model_parameters_and_performance.append(next_row)\n \n    <span style=\"color: green; font-weight: bold\">except<\/span>:\n        <span style=\"color: green; font-weight: bold\">continue<\/span>\n<\/span><\/code><\/pre>\n\n\n\n<p>Before proceeding, let&#8217;s look at the RMSE of the various models over the forecast period. Specifically, lets do a side-by-side comparison of the I(1) and I(2) models with matching p and q parameters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Comparing RMSE Forecasting Performance in the I(1) vs I(2) ARIMA Models<\/span>\nmodel_df <span style=\"color: purple; font-weight: bold;\">=<\/span> pd.DataFrame(model_parameters_and_performance, columns<span style=\"color: purple; font-weight: bold;\">=<\/span>&#91;<span style=\"color: red\">'p'<\/span>,<span style=\"color: red\">'d'<\/span>,<span style=\"color: red\">'q'<\/span>,<span style=\"color: red\">'aic'<\/span>,<span style=\"color: red\">'bic'<\/span>,<span style=\"color: red\">'forecast_mape'<\/span>,<span style=\"color: red\">'forecast_rmse'<\/span>])\nplt.figure(figsize<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">12<\/span>,<span style=\"color: green;\">5<\/span>), dpi<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">100<\/span>)\n<span style=\"color: #008cba;\"># I(1) models in Blue and I(2) models in Red<\/span>\nplt.plot(model_df.loc&#91;(model_df.d <span style=\"color: purple; font-weight: bold;\">==<\/span> <span style=\"color: green;\">1<\/span>)]&#91;<span style=\"color: red\">'forecast_rmse'<\/span>].reset_index(drop<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green; font-weight: bold;\">True<\/span>), color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'blue'<\/span>)\nplt.plot(model_df.loc&#91;(model_df.d <span style=\"color: purple; font-weight: bold;\">==<\/span> <span style=\"color: green;\">2<\/span>)]&#91;<span style=\"color: red\">'forecast_rmse'<\/span>].reset_index(drop<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green; font-weight: bold;\">True<\/span>), color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'red'<\/span>)\nplt.title(<span style=\"color: red\">'RMSE for I(1) vs I(2) Models'<\/span>)\nplt.xlabel(<span style=\"color: red\">'Models'<\/span>)\nplt.ylabel(<span style=\"color: red\">'RMSE'<\/span>)\nplt.show()<\/span><\/code><\/pre>\n\n\n\n<p>Which produces the following plot:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"597\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_rmse-3-1024x597.png\" alt=\"\" class=\"wp-image-214\" style=\"aspect-ratio:2;width:826px;height:auto\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_rmse-3-1024x597.png 1024w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_rmse-3-300x175.png 300w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_rmse-3-768x448.png 768w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_rmse-3.png 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Clearly, the I(2) ARIMA models (red) have better forecasting performance across-the-board compared to I(1) models with the same (p,q) parameters. Therefore, I will focus my attention on just the I(2) models.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Filter dataframe where the parameter d = 2<\/span>\nmodel_df.loc&#91;(model_df.d <span style=\"color: purple; font-weight: bold;\">==<\/span> <span style=\"color: green;\">2<\/span>)]<\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"428\" height=\"360\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/Screen-Shot-2023-10-18-at-1.53.34-PM.png\" alt=\"\" class=\"wp-image-219\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/Screen-Shot-2023-10-18-at-1.53.34-PM.png 428w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/Screen-Shot-2023-10-18-at-1.53.34-PM-300x252.png 300w\" sizes=\"(max-width: 428px) 100vw, 428px\" \/><\/figure>\n\n\n\n<p><br>The AIC model selection criterion selects the ARIMA(3,2,3) model while the BIC selects the ARIMA(1,2,1). Given that the BIC criterion tends to select more parsimonious models than the AIC, this isn&#8217;t surprising. Moreover, the out-of-sample performance of both of these models is also comparable, which reduces fears that the bigger model is overfitting the data.  While the conventional wisdom in this situation is to select the simplest model (ARIMA(1,2,1)), the downside is that the short lag length may be insufficient to capture more complex dynamics in the series, such as turning points\/inflections, which a multi-period model could potentially detect. In this case, the predictions of both models are nearly identical over the 60-month holdback period:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># ARIMA(1,2,1) Forecast From July 2018 to June 2023<\/span> \nmodel1 <span style=\"color: purple; font-weight: bold;\">=<\/span> ARIMA(hp_train, order<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">1<\/span>,<span style=\"color: green;\">2<\/span>,<span style=\"color: green;\">1<\/span>))\nresults1 <span style=\"color: purple; font-weight: bold;\">=<\/span> model1.fit()\nforecast1 <span style=\"color: purple; font-weight: bold;\">=<\/span> results1.forecast(<span style=\"color: green;\">60<\/span>)\n\n<span style=\"color: #008cba;\"># ARIMA(3,2,3) Forecast From July 2018 to June 2023<\/span> \nmodel2 <span style=\"color: purple; font-weight: bold;\">=<\/span> ARIMA(hp_train, order<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">3<\/span>,<span style=\"color: green;\">2<\/span>,<span style=\"color: green;\">3<\/span>))\nresults2 <span style=\"color: purple; font-weight: bold;\">=<\/span> model2.fit()\nforecast2 <span style=\"color: purple; font-weight: bold;\">=<\/span> results2.forecast(<span style=\"color: green;\">60<\/span>)\n       \n<span style=\"color: #008cba;\"># Confidence Intervals Model 1<\/span>\nforecast_and_ci1 <span style=\"color: purple; font-weight: bold;\">=<\/span> results1.get_forecast(<span style=\"color: green;\">60<\/span>, alpha<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">0.05<\/span>)\nconfidence_intervals1 <span style=\"color: purple; font-weight: bold;\">=<\/span> forecast_and_ci1.conf_int()\n\n<span style=\"color: #008cba;\"># Confidence Intervals Model 2<\/span>\nforecast_and_ci2 <span style=\"color: purple; font-weight: bold;\">=<\/span> results2.get_forecast(<span style=\"color: green;\">60<\/span>, alpha<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">0.05<\/span>)\nconfidence_intervals2 <span style=\"color: purple; font-weight: bold;\">=<\/span> forecast_and_ci2.conf_int()\n\n<span style=\"color: #008cba;\"># Plot Forecast over Holdback (Test) Period<\/span>\nplt.figure(figsize<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">12<\/span>,<span style=\"color: green;\">5<\/span>), dpi<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">100<\/span>)\nplt.plot(hp_test, color=<span style=\"color: red\">'black'<\/span>, linewidth<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">3<\/span>)\nplt.plot(forecast1, color=<span style=\"color: red\">'blue'<\/span>, linestyle=<span style=\"color: red\">'dashed'<\/span>, linewidth<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">2<\/span>)\nplt.plot(forecast2, color=<span style=\"color: red\">'green'<\/span>, linestyle=<span style=\"color: red\">'dashed'<\/span>, linewidth<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">2<\/span>)\nplt.fill_between(confidence_intervals1&#91;<span style=\"color: red\">'lower home_price'<\/span>].index, confidence_intervals1&#91;<span style=\"color: red\">'lower home_price'<\/span>], confidence_intervals1&#91;<span style=\"color: red\">'upper home_price'<\/span>], color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'k'<\/span>, alpha<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">0.15<\/span>)\nplt.ylabel(<span style=\"color: red\">'Home Price Index'<\/span>)\nplt.title(<span style=\"color: red\">'Forecast from July 2018 to June 2023'<\/span>)\nplt.show()<\/span><\/code><\/pre>\n\n\n\n<p>The solid black line is the actual home price series, the dashed blue line is the ARIMA(1,2,1) forecast, and the dashed green line is the ARIMA(3,2,3) forecast:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"427\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_2018to2023-1024x427.png\" alt=\"\" class=\"wp-image-237\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_2018to2023-1024x427.png 1024w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_2018to2023-300x125.png 300w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_2018to2023-768x320.png 768w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_2018to2023.png 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><br>Finally, let&#8217;s forecast national home prices over the second half of 2023. I&#8217;ve plotted the forecasts of both models and their respective shaded confidence regions. Given the growing consensus that the housing market is losing steam as of the time of writing (October 2023), the ARIMA(3,2,3) model appears to produce a more plausible price forecast (dashed green line):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><span style=\"font-family: monospace; font-size: 1.0em;\"><span style=\"color: #008cba;\"># Forecast Home Prices Over Q3 and Q4 2023<\/span>\n\n<span style=\"color: #008cba;\"># ARIMA(1,2,1)<\/span>\n<span style=\"color: #008cba;\"># Use entire sample to estimate\/train model<\/span>\nmodel1 <span style=\"color: purple; font-weight: bold;\">=<\/span> ARIMA(home_price, order<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">1<\/span>,<span style=\"color: green;\">2<\/span>,<span style=\"color: green;\">1<\/span>)) \nresults1 <span style=\"color: purple; font-weight: bold;\">=<\/span> model1.fit()\nforecast_and_ci1 <span style=\"color: purple; font-weight: bold;\">=<\/span> results1.get_forecast(<span style=\"color: green;\">6<\/span>, alpha<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">0.05<\/span>)\nconfidence_intervals1 <span style=\"color: purple; font-weight: bold;\">=<\/span> forecast_and_ci1.conf_int()\nforecast1 <span style=\"color: purple; font-weight: bold;\">=<\/span> results1.forecast(<span style=\"color: green;\">6<\/span>)\n\n<span style=\"color: #008cba;\"># ARIMA(3,2,3)<\/span>\n<span style=\"color: #008cba;\"># Use entire sample to estimate\/train model<\/span>\nmodel2 <span style=\"color: purple; font-weight: bold;\">=<\/span> ARIMA(home_price, order<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">3<\/span>,<span style=\"color: green;\">2<\/span>,<span style=\"color: green;\">3<\/span>))\nresults2 <span style=\"color: purple; font-weight: bold;\">=<\/span> model2.fit()\nforecast_and_ci2 <span style=\"color: purple; font-weight: bold;\">=<\/span> results2.get_forecast(<span style=\"color: green;\">6<\/span>, alpha<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">0.05<\/span>)\nconfidence_intervals2 <span style=\"color: purple; font-weight: bold;\">=<\/span> forecast_and_ci2.conf_int()\nforecast2 <span style=\"color: purple; font-weight: bold;\">=<\/span> results2.forecast(<span style=\"color: green;\">6<\/span>)\n\n<span style=\"color: #008cba;\"># Plot Forecasts<\/span>\nplt.figure(figsize<span style=\"color: purple; font-weight: bold;\">=<\/span>(<span style=\"color: green;\">12<\/span>,<span style=\"color: green;\">5<\/span>), dpi<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">100<\/span>)\nplt.plot(home_price&#91;(<span style=\"color: green;\">len<\/span>(home_price)<span style=\"color: purple; font-weight: bold;\">-<\/span><span style=\"color: green;\">12<\/span>):], color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'black'<\/span>, linewidth<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">3<\/span>)\nplt.plot(forecast1, color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'blue'<\/span>, linestyle<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'dashed'<\/span>, linewidth<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">2<\/span>)\nplt.plot(forecast2, color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'green'<\/span>, linestyle<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'dashed'<\/span>, linewidth<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">2<\/span>)\nplt.fill_between(confidence_intervals1&#91;<span style=\"color: red\">'lower home_price'<\/span>].index, confidence_intervals1&#91;<span style=\"color: red\">'lower home_price'<\/span>], confidence_intervals1&#91;<span style=\"color: red\">'upper home_price'<\/span>], color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'k'<\/span>, alpha<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">0.15<\/span>)\nplt.fill_between(confidence_intervals2&#91;<span style=\"color: red\">'lower home_price'<\/span>].index, confidence_intervals2&#91;<span style=\"color: red\">'lower home_price'<\/span>], confidence_intervals2&#91;<span style=\"color: red\">'upper home_price'<\/span>], color<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: red\">'k'<\/span>, alpha<span style=\"color: purple; font-weight: bold;\">=<\/span><span style=\"color: green;\">0.15<\/span>)\nplt.ylabel(<span style=\"color: red\">'Home Price Index'<\/span>)\nplt.title(<span style=\"color: red\">'Forecast from July to December 2023'<\/span>)\nplt.show()<\/span><\/code><\/pre>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"427\" data-id=\"241\" src=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_q3q42023-1024x427.png\" alt=\"\" class=\"wp-image-241\" srcset=\"https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_q3q42023-1024x427.png 1024w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_q3q42023-300x125.png 300w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_q3q42023-768x320.png 768w, https:\/\/drdustinchambers.com\/wp-content\/uploads\/2023\/10\/forecast_q3q42023.png 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/figure>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Closing Thoughts<\/h2>\n\n\n\n<p>Although most professional forecasters still prefer SAS, Stata, Eviews, and R (among other stats programs) to do forecasting, the growing popularity of Python in the data science community will likely change this over time. The purpose of this brief post is to show those interested in traditional univariate forecasting techniques how to perform a forecast in Python. In future posts, I plan to cover less orthodox forecasting methodologies that are growing in popularity among data scientists. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>To help data scientists learn some of the basic concepts behind time series forecasting, I&#8217;m planning to write a series of brief posts on the topic, starting with autoregressive integrated moving-average (ARIMA) models, a classic technique used by economists for over a quarter century. I will be primarily focusing on intuition and common pitfalls, and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[1],"tags":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/posts\/47"}],"collection":[{"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/comments?post=47"}],"version-history":[{"count":219,"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/posts\/47\/revisions"}],"predecessor-version":[{"id":280,"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/posts\/47\/revisions\/280"}],"wp:attachment":[{"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/media?parent=47"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/categories?post=47"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/drdustinchambers.com\/index.php\/wp-json\/wp\/v2\/tags?post=47"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}