Examples of time-series forecasting with Python#

  • Building on the excellent (and freely available!) Forecasting: Principles and Practice (3rd edition) by Hyndman and Athanasopoulos, this notebooks presents a number of examples using the following Python libraries:

    • statsmodels.tsa as the basis for time-series analysis

    • pmdarima which wraps statsmodels into a convenient auto.arima function like in R

    • TO DO: sktime as the new, unified framework for machine learning in Python

  • Use Altair for plotting, as an example how to use this library throughout your workflow

import altair as alt
import numpy as np
import pandas as pd
import rdata
import requests


# https://altair-viz.github.io/user_guide/faq.html#local-filesystem
# alt.data_transformers.enable("json")


def read_rda(url):
    """Reads .rda file from URL and returns dict with values.
    
    """
    r = requests.get(url)
    parsed = rdata.parser.parse_data(r.content)
    return rdata.conversion.convert(parsed)

EDA with an interactive dashboard#

Bike sharing dataset#

As an example, we will explore the bike sharing dataset, taken from the UCI Machine Learning Repository

Data Set Information#

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

Attribute Information#

Both hour.csv and day.csv have the following fields, except hr which is not available in day.csv

  • instant: record index

  • dteday : date

  • season : season (1:winter, 2:spring, 3:summer, 4:fall)

  • yr : year (0: 2011, 1:2012)

  • mnth : month ( 1 to 12)

  • hr : hour (0 to 23)

  • holiday : weather day is holiday or not (extracted from [Web Link])

  • weekday : day of the week

  • workingday : if day is neither weekend nor holiday is 1, otherwise is 0.

  • weathersit :

    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy

    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

    • 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

  • temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)

  • atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)

  • hum: Normalized humidity. The values are divided to 100 (max)

  • windspeed: Normalized wind speed. The values are divided to 67 (max)

  • casual: count of casual users

  • registered: count of registered users

  • cnt: count of total rental bikes including both casual and registered

def parse_date_hour(date, hour):
    """Construct datetime for index of hourly data."""
    return pd.to_datetime(" ".join([date, str(hour).zfill(2)]), format="%Y-%m-%d %H")


daily = pd.read_csv("./bike-sharing/bike-sharing-daily-processed.csv", parse_dates=["dteday"]).drop(
    columns=["instant", "Unnamed: 0"]
)
hourly = pd.read_csv("./bike-sharing/hour.csv").drop(columns=["instant"])
hourly.index = pd.DatetimeIndex(
    hourly.apply(lambda row: parse_date_hour(row.dteday, row.hr), axis=1),
    name="timestamp",
)
daily.head()
dteday season yr mnth holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt days_since_2011
0 2011-01-01 SPRING 2011 1 NO HOLIDAY SAT NO WORKING DAY MISTY 8.175849 28.363250 80.5833 10.749882 331 654 985 0
1 2011-01-02 SPRING 2011 1 NO HOLIDAY SUN NO WORKING DAY MISTY 9.083466 28.027126 69.6087 16.652113 131 670 801 1
2 2011-01-03 SPRING 2011 1 NO HOLIDAY MON WORKING DAY GOOD 1.229108 22.439770 43.7273 16.636703 120 1229 1349 2
3 2011-01-04 SPRING 2011 1 NO HOLIDAY TUE WORKING DAY GOOD 1.400000 23.212148 59.0435 10.739832 108 1454 1562 3
4 2011-01-05 SPRING 2011 1 NO HOLIDAY WED WORKING DAY GOOD 2.666979 23.795180 43.6957 12.522300 82 1518 1600 4
brush = alt.selection(type='interval', encodings=['x'])

base = (alt
 .Chart(daily)
 .mark_line()
 .encode(x='dteday', y='cnt')
 .properties(width=700, height=200)
 )

overview = base.properties(height=50).add_selection(brush)
detail = base.encode(alt.X('dteday:T', scale=alt.Scale(domain=brush)))
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/altair/utils/deprecation.py:65: AltairDeprecationWarning: 'selection' is deprecated.
   Use 'selection_point()' or 'selection_interval()' instead; these functions also include more helpful docstrings.
  warnings.warn(message, AltairDeprecationWarning, stacklevel=1)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/altair/utils/deprecation.py:65: AltairDeprecationWarning: 'add_selection' is deprecated. Use 'add_params' instead.
  warnings.warn(message, AltairDeprecationWarning, stacklevel=1)
detail & overview
monthly = daily.groupby(['yr', 'mnth'], as_index=False)['cnt'].sum('cnt')
monthly['yr_mnth'] = monthly.apply(lambda df: '-'.join([str(df.yr), str(df.mnth).zfill(2)]), axis=1)

def simple_ts_plot(df, x='yr_mnth', y='cnt', width=700, height=200):
    return alt.Chart(df).mark_line().encode(x=x, y=y).properties(width=width)
simple_ts_plot(monthly)

Decomposition#

Simple decomposition (statsmodels)#

from statsmodels.tsa.seasonal import seasonal_decompose, STL

simple_decomposition = seasonal_decompose(monthly.cnt, period=12)
simple_decomposition.plot();
../_images/ab85b65e9a410a746c0245bec7eaf382e46c4d262a2ff1ec1a7d4a45ccd67404.png

STL decomposition (statsmodels)#

stl = STL(monthly.cnt, period=12).fit()
stl.plot();
../_images/05930c00f574e085dbcf90e165fa527d4e3a80e7fdbc9829722467f5744d6232.png

Let’s make those plots better looking with Altair.

_ts = {}
for result in ['observed', 'trend', 'seasonal', 'resid']:
    df = pd.DataFrame({'yr_mnth': monthly.yr_mnth, result: getattr(stl, result)})
    _ts[result] = simple_ts_plot(df, x='yr_mnth', y=result, height=50)


alt.vconcat(*_ts.values()).resolve_scale(x='shared')

Seasonal decomposition (pmdarima)#

Same result as statsmodels seasonal_decompose.

from pmdarima import arima, utils

# note pmdarima expects nd.array as input
arima_decomp = arima.decompose(monthly.cnt.values, type_='additive', m=12)
utils.decomposed_plot(arima_decomp, figure_kwargs={'figsize': (12,12)})
../_images/0c30045b6c5bae759d1c1bc1bafc0a66fdb9a30a97b745d013fc8a73da6029b5.png

Forecasting with ETS#

We will reproduce the analysis from FPP2 section 7.7 in Python. following the example from the statsmodels documentation.

Get data: international visitor nights in Australia#

# read data and set proper DateRangeIndex in pandas
austourists_url = "https://github.com/robjhyndman/fpp2-package/blob/master/data/austourists.rda?raw=true" 
austourists_data = read_rda(austourists_url)
austourists = pd.DataFrame(austourists_data).set_index(
    pd.period_range(start="1999Q1", end="2015Q4", freq="Q")
)
austourists.head()
austourists
1999Q1 30.052513
1999Q2 19.148496
1999Q3 25.317692
1999Q4 27.591437
2000Q1 32.076456

Forecast ETS(M, A, M) model#

Note: use exponential_smoothing.ETSModel and notholtwinters.ExponentialSmoothing

from statsmodels.tsa.exponential_smoothing.ets import ETSModel


aust_ets = ETSModel(
    austourists.austourists.rename('observed'), # NB: only take 1D array or pd.Series
    error="mul",
    trend="add",
    seasonal="mul",
    damped_trend=True,
    seasonal_periods=4
).fit(maxiter=10000)

print(aust_ets.summary())
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            9     M =           10

At X0         1 variables are exactly at the bounds

At iterate    0    f=  3.47328D+00    |proj g|=  1.98628D+00

At iterate    1    f=  3.14211D+00    |proj g|=  2.46124D+00

At iterate    2    f=  2.58862D+00    |proj g|=  5.91331D-01

At iterate    3    f=  2.46717D+00    |proj g|=  4.83241D-01

At iterate    4    f=  2.43892D+00    |proj g|=  6.51827D-01

At iterate    5    f=  2.37965D+00    |proj g|=  3.91487D-01

At iterate    6    f=  2.34624D+00    |proj g|=  3.02613D-01

At iterate    7    f=  2.32328D+00    |proj g|=  3.61819D-01

At iterate    8    f=  2.30622D+00    |proj g|=  1.60187D-01

At iterate    9    f=  2.29895D+00    |proj g|=  1.69179D-01

At iterate   10    f=  2.28998D+00    |proj g|=  1.79909D-01

At iterate   11    f=  2.28880D+00    |proj g|=  3.09700D-01

At iterate   12    f=  2.28586D+00    |proj g|=  8.68166D-02

At iterate   13    f=  2.28339D+00    |proj g|=  8.95314D-02

At iterate   14    f=  2.27444D+00    |proj g|=  5.47130D-01

At iterate   15    f=  2.26329D+00    |proj g|=  8.34884D-01

At iterate   16    f=  2.26104D+00    |proj g|=  7.65831D-01

At iterate   17    f=  2.26051D+00    |proj g|=  7.37692D-01

At iterate   18    f=  2.26014D+00    |proj g|=  7.18019D-01

At iterate   19    f=  2.26008D+00    |proj g|=  7.08332D-01

At iterate   20    f=  2.26002D+00    |proj g|=  7.12180D-01

At iterate   21    f=  2.25892D+00    |proj g|=  6.75977D-01

At iterate   22    f=  2.25797D+00    |proj g|=  6.86087D-01

At iterate   23    f=  2.24578D+00    |proj g|=  7.31361D-01

At iterate   24    f=  2.23994D+00    |proj g|=  1.05531D+00

At iterate   25    f=  2.23562D+00    |proj g|=  8.89131D-01

At iterate   26    f=  2.23124D+00    |proj g|=  3.80693D-01

At iterate   27    f=  2.22585D+00    |proj g|=  2.96862D-01

At iterate   28    f=  2.22393D+00    |proj g|=  1.78492D-01

At iterate   29    f=  2.22174D+00    |proj g|=  1.19679D-01

At iterate   30    f=  2.22090D+00    |proj g|=  6.99965D-02

At iterate   31    f=  2.21842D+00    |proj g|=  2.06793D-01

At iterate   32    f=  2.21784D+00    |proj g|=  8.13951D-02

At iterate   33    f=  2.21766D+00    |proj g|=  2.72645D-02

At iterate   34    f=  2.21764D+00    |proj g|=  2.22443D-02

At iterate   35    f=  2.21763D+00    |proj g|=  1.71872D-02

At iterate   36    f=  2.21761D+00    |proj g|=  2.69771D-02

At iterate   37    f=  2.21758D+00    |proj g|=  4.95665D-02

At iterate   38    f=  2.21748D+00    |proj g|=  1.07864D-01

At iterate   39    f=  2.21714D+00    |proj g|=  2.26432D-01

At iterate   40    f=  2.21635D+00    |proj g|=  3.94526D-01

At iterate   41    f=  2.21441D+00    |proj g|=  6.34753D-01

At iterate   42    f=  2.21033D+00    |proj g|=  8.81598D-01

At iterate   43    f=  2.20373D+00    |proj g|=  9.25483D-01

At iterate   44    f=  2.19844D+00    |proj g|=  4.46696D-01

At iterate   45    f=  2.19721D+00    |proj g|=  3.77480D-01

At iterate   46    f=  2.19645D+00    |proj g|=  6.05318D-02

At iterate   47    f=  2.19638D+00    |proj g|=  9.81619D-03

At iterate   48    f=  2.19638D+00    |proj g|=  3.46647D-03

At iterate   49    f=  2.19638D+00    |proj g|=  3.79865D-03

At iterate   50    f=  2.19638D+00    |proj g|=  1.17339D-02

At iterate   51    f=  2.19637D+00    |proj g|=  4.86529D-02

At iterate   52    f=  2.19637D+00    |proj g|=  2.13492D-02

At iterate   53    f=  2.19637D+00    |proj g|=  2.81362D-03

At iterate   54    f=  2.19636D+00    |proj g|=  1.18585D-03

At iterate   55    f=  2.19636D+00    |proj g|=  1.30074D-04

At iterate   56    f=  2.19636D+00    |proj g|=  4.66294D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    9     56     68     59     0     3   4.663D-05   2.196D+00
  F =   2.1963648196307086     

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH             
                                 ETS Results                                  
==============================================================================
Dep. Variable:               observed   No. Observations:                   68
Model:                      ETS(MAdM)   Log Likelihood                -149.353
Date:                Sat, 11 Nov 2023   AIC                            320.706
Time:                        11:46:17   BIC                            345.120
Sample:                    03-31-1999   HQIC                           330.379
                         - 12-31-2015   Scale                            0.003
Covariance Type:               approx                                         
======================================================================================
                         coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------
smoothing_level        0.4969      0.112      4.452      0.000       0.278       0.716
smoothing_trend     4.969e-05      0.009      0.005      0.996      -0.018       0.018
smoothing_seasonal  5.031e-05        nan        nan        nan         nan         nan
damping_trend          0.9800      0.011     92.307      0.000       0.959       1.001
initial_level         28.9572    952.799      0.030      0.976   -1838.494    1896.408
initial_trend          1.0006     32.926      0.030      0.976     -63.534      65.535
initial_seasonal.0     0.8395     27.622      0.030      0.976     -53.300      54.979
initial_seasonal.1     0.7777     25.588      0.030      0.976     -49.374      50.929
initial_seasonal.2     0.6242     20.539      0.030      0.976     -39.632      40.881
initial_seasonal.3     1.0000     32.904      0.030      0.976     -63.490      65.490
===================================================================================
Ljung-Box (Q):                        4.38   Jarque-Bera (JB):                15.41
Prob(Q):                              0.82   Prob(JB):                         0.00
Heteroskedasticity (H):               0.16   Skew:                            -0.89
Prob(H) (two-sided):                  0.00   Kurtosis:                         4.50
===================================================================================

Warnings:
[1] Covariance matrix calculated using numerical (complex-step) differentiation.
aust_forecast = aust_ets.get_prediction(start="2014", end="2020")
_forecast = (
    aust_forecast.summary_frame()
    .reset_index()
    .rename(columns={"index": "quarter"})
    .assign(quarter=lambda df: df.quarter.apply(str))
)
_forecast.head()
quarter mean mean_numerical pi_lower pi_upper
0 2014Q1 67.905019 68.005701 61.157418 75.371850
1 2014Q2 41.743568 41.810152 37.031209 45.964023
2 2014Q3 53.038218 53.087422 47.111868 58.694715
3 2014Q4 58.410221 58.516103 52.269220 64.683609
4 2015Q1 70.690958 70.855760 62.741749 79.276381

Plot forecast results using Altair#

Because we love Altair, let’s do some plots with that. There are some details you need to consider.

First, Altair can’t plot indexes, hence we need to reformat the data

_aust = (
    austourists.reset_index()
    .assign(quarter=lambda df: df["index"].apply(str))
    .drop(columns="index")
)
_aust.head()
austourists quarter
0 30.052513 1999Q1
1 19.148496 1999Q2
2 25.317692 1999Q3
3 27.591437 1999Q4
4 32.076456 2000Q1

Now we can plot the data.

(
    alt.Chart(
        austourists.join(aust_ets.fittedvalues.rename("fitted"))
        .stack()
        .reset_index(name="values")
        .rename(columns={"level_0": "quarter", "level_1": "label"})
        .assign(quarter=lambda df: df.quarter.apply(str))
    )
    .mark_line()
    .encode(
        x=alt.X("quarter:O", axis=alt.Axis(values=_aust.quarter.values[::4])),
        y="values:Q",
        color="label",
    )
    .properties(width=800)
)
line = (
    alt.Chart(_forecast)
    .mark_line()
    .encode(
        x=alt.X("quarter:O", axis=alt.Axis(values=_forecast.quarter.values[::4])),
        y="mean:Q",
    )
)

band = line.mark_area(opacity=0.5).encode(x="quarter", y="pi_lower:Q", y2="pi_upper:Q")
_forecast
quarter mean mean_numerical pi_lower pi_upper
0 2014Q1 67.905019 68.005701 61.157418 75.371850
1 2014Q2 41.743568 41.810152 37.031209 45.964023
2 2014Q3 53.038218 53.087422 47.111868 58.694715
3 2014Q4 58.410221 58.516103 52.269220 64.683609
4 2015Q1 70.690958 70.855760 62.741749 79.276381
5 2015Q2 45.088088 45.072703 40.102652 50.070189
6 2015Q3 57.987363 57.758803 51.208943 64.228500
7 2015Q4 64.479834 64.590966 57.749161 71.409733
8 2016Q1 77.990539 77.775062 69.181335 86.170029
9 2016Q2 48.836287 48.593699 42.980661 54.306214
10 2016Q3 61.026694 60.969122 53.178136 69.217190
11 2016Q4 66.075834 65.793079 56.347986 75.342749
12 2017Q1 78.939466 78.526000 66.963969 90.707224
13 2017Q2 49.416785 49.283671 41.691282 57.980926
14 2017Q3 61.735416 61.650289 50.909787 72.616748
15 2017Q4 66.825603 66.489709 54.854743 79.323166
16 2018Q1 79.814727 79.485743 65.637092 94.838278
17 2018Q2 49.952218 49.743438 40.610386 60.120013
18 2018Q3 62.389119 62.151853 51.053077 74.955298
19 2018Q4 67.517166 67.157857 53.478997 81.741446
20 2019Q1 80.622039 80.386399 63.559875 99.109976
21 2019Q2 50.446084 50.304063 40.336828 61.713522
22 2019Q3 62.992073 62.667405 49.928769 78.634942
23 2019Q4 68.155042 67.725103 52.914080 84.358500
24 2020Q1 81.366678 81.180737 63.282757 101.026583
line + band

Forecasting with ARIMA#

Reproducing the example in FPP2 section 8.7 for a non-seasonal model of monthly electrical equipment sales.

Get data: monthly electrical equipment orders#

ee_url = "https://github.com/robjhyndman/fpp2-package/blob/master/data/elecequip.rda?raw=true" 
ee_data = read_rda(ee_url)
ee = pd.DataFrame(ee_data).set_index(
    pd.period_range(start="1996-01", end="2012-03", freq="M"))
ee.head()
elecequip
1996-01 79.35
1996-02 75.78
1996-03 86.32
1996-04 72.60
1996-05 74.86

Forecasting with auto.arima#

As in FPP2, we are going to build an ARIMA model on the seasonally adjusted data.

ee_stl = STL(ee.elecequip.values, period=12, robust=True).fit()
ee_stl.plot();
../_images/3a84786bcdc1ec170fedb5390010a7f5a5c491fc22abec136d67ee131934b5bc.png
from pmdarima import auto_arima, model_selection


ee_adj = ee_stl.observed - ee_stl.seasonal
ee_train, ee_test = model_selection.train_test_split(ee_adj, train_size=150)
ee_arima = auto_arima(ee_train, error_action='ignore', trace=True,
                      suppress_warnings=True, maxiter=10,
                      seasonal=False)
Performing stepwise search to minimize aic
 ARIMA(2,1,2)(0,0,0)[0] intercept   : AIC=707.575, Time=0.07 sec
 ARIMA(0,1,0)(0,0,0)[0] intercept   : AIC=724.165, Time=0.01 sec
 ARIMA(1,1,0)(0,0,0)[0] intercept   : AIC=715.304, Time=0.02 sec
 ARIMA(0,1,1)(0,0,0)[0] intercept   : AIC=713.256, Time=0.02 sec
 ARIMA(0,1,0)(0,0,0)[0]             : AIC=722.821, Time=0.01 sec
 ARIMA(1,1,2)(0,0,0)[0] intercept   : AIC=712.176, Time=0.05 sec
 ARIMA(2,1,1)(0,0,0)[0] intercept   : AIC=707.187, Time=0.05 sec
 ARIMA(1,1,1)(0,0,0)[0] intercept   : AIC=715.144, Time=0.03 sec
 ARIMA(2,1,0)(0,0,0)[0] intercept   : AIC=710.610, Time=0.03 sec
 ARIMA(3,1,1)(0,0,0)[0] intercept   : AIC=703.640, Time=0.06 sec
 ARIMA(3,1,0)(0,0,0)[0] intercept   : AIC=703.060, Time=0.04 sec
 ARIMA(4,1,0)(0,0,0)[0] intercept   : AIC=704.363, Time=0.06 sec
 ARIMA(4,1,1)(0,0,0)[0] intercept   : AIC=705.507, Time=0.07 sec
 ARIMA(3,1,0)(0,0,0)[0]             : AIC=702.067, Time=0.03 sec
 ARIMA(2,1,0)(0,0,0)[0]             : AIC=710.534, Time=0.02 sec
 ARIMA(4,1,0)(0,0,0)[0]             : AIC=703.193, Time=0.03 sec
 ARIMA(3,1,1)(0,0,0)[0]             : AIC=702.190, Time=0.04 sec
 ARIMA(2,1,1)(0,0,0)[0]             : AIC=706.917, Time=0.03 sec
 ARIMA(4,1,1)(0,0,0)[0]             : AIC=704.063, Time=0.05 sec

Best model:  ARIMA(3,1,0)(0,0,0)[0]          
Total fit time: 0.732 seconds

pmdarima yields a ARIMA(3,1,0) model with an AIC = 702. The example in FPP2 manually selected ARIMA(3,1,1). Let’s look at the predictions.

ee_predict, ee_ci = ee_arima.predict(ee_test.shape[0], return_conf_int=True)
ee_forecast = (
    pd.DataFrame(
        {
            "predicted": ee_predict,
            "actual": ee_test.flatten(),
        }
    )
    .join(pd.DataFrame(ee_ci).rename(columns={0: "lower_ci", 1: "upper_ci"}))
#     .stack()
    .reset_index()
#     .rename(columns={"level_0": "index", "level_1": "label"})
)
line_predicted_elec = (
    alt.Chart(ee_forecast)
    .mark_line()
    .encode(x="index:O", y="predicted:Q")
    .properties(width=800)
)
line_actual_elec = line_predicted_elec.mark_line(color='orange').encode(x="index:O", y="actual")
band_elec = line_predicted_elec.mark_area(opacity=0.5).encode(x="index", y="lower_ci:Q", y2="upper_ci:Q")
#TODO: fix top x-axis
line_predicted_elec + line_actual_elec + band_elec
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
ee_residuals = ee_arima.predict_in_sample() - ee_train
plot_acf(ee_residuals);
../_images/fe5ca9c20d1e2b2f688cc4189a88a3e6a097b4cf47942a752031cab27f7645aa.png
plot_pacf(ee_residuals, method='ywm');
../_images/b6ab70a636d9a48e4b37b1c80dfa1fba7804c9e8514af41cc87848a6e5314d46.png
#TODO: weird outlier at -80?
alt.Chart(pd.DataFrame({"residuals": ee_residuals})).mark_bar().encode(
    alt.X("residuals:Q", bin=alt.Bin(maxbins=30)), y="count():Q"
)