temporal_test

Description

Test for the correlation of the target variable in time.
Options are:
  • Augmented dickey-fuller (ADF) test

  • Auto-correlation function (ACF) plot

  • Fit the autoregressive model with user-specified lags and report the coefficients

Usage

temporal_test(data, spatial_id=None, test_type='ADF', lags=1, column_identifier=None, saving_plot_path='./')

Parameters

#

Input Name

Input Description

1






















data






















type: data frame or str
default:-
details: a data frame containing target variable. or its address.

The data must includes following columns:

Spatial ids: The id of the units in the finest spatial scale of input
data must be included in the data in a column with the name ‘spatial
id level 1’.

Temporal ids: The id of time units recorded in the input data for each
temporal scale must be included as a separate column in the data
with a name in a format ‘temporal id level x’, where ‘x’ is the
related temporal scale level beginning with level 1 for the smallest
scale. The temporal units could have a free but sortable format like
year number, week number and so on. The combination of these
temporal scale levels’ ids should form a unique identifier. However the
integrated format of date and time is also supported. In the case of
using integrated format, only the smallest temporal scale must be
included in the data with the column name of ‘temporal id’. The
expected format of each scale is shown in Table 2.

example: ‘my_directory/my_data.csv’
2









spatial_id









type: list<any type> or None
default: None
details: The ids of the spatial units whose target variable
values will be used in the test. If None is passed, the test is
performed for all spatial units in the data.

Note that only one spatial unit should be specified for the ACF
test.

example: [01001],[1001],[‘Alabama’]
3
















test_type
















type: {‘ACF’, ‘ADF’, ‘autoreg’}
default: ‘ADF’
details: The type of test that is used to check the
correlation of target variable in time.

‘ADF’:
The augmented dickey-fuller (ADF) test is performed to check if
the target variable is a stationary time series.

‘ACF’:
The autocorrelation function is plotted for specified lags. The
resulting plot depicts correlation (vertical axis) against lag
(horisontal axis).

‘autoreg’:
An autoregressive model is fitted to the target variable values
with specified lags to obtain coefficients.
4





lags





type: int
default: 1
details: The number of temporal lags considered in
the test.

example: 3
5

















column_identifier

















type: dict or None
default: None
details: If the input data column names does not match the
specific format of temporal and spatial ids (i.e. ‘temporal id’,
‘temporal id level x’,’spatial id level x’), a dictionary must be
passed to specify the content of each column.
The keys must be a string in one of the formats: {‘temporal
id’,’temporal id level x’,’spatial id level x’}
The values of ‘temporal id level x’ and ‘spatial id level x’ must be
the name of the column containing the temporal or spatial ids in the
scale level x respectively.
If the input data have integrated format for temporal ids, the name of
the corresponding column must be specified with the key ‘temporal
id’.

example: {‘temporal id level 1’: ‘week’,’temporal id level 2’:
‘year’,’spatial id level 1’: ‘county_fips’, ‘spatial id level 2’:
‘state_fips’}
6





saving_plot_path





type: string or None
default: None
details: The path to save a plots
If None is passed, the plot will not be saved.

example: ‘./’

Returns

#

Output Name

Output Description

1








test result








type: dict or None
details: If test_type is ‘ADF’, the test statistics and critical
values is returned.

If test_type is ‘autoreg’, the coefficients of the fitted
autoregressive model is returned.

If test_type is ‘ACF’, nothing is returned and the plot is saved in
saving_plot_path.

Note

The implementation of the statsmodels package is used for all tests.

Example

from stpredict.preprocess import temporal_test
from stpredict import load_earthquake_data

data = load_earthquake_data()

column_identifier={'temporal id level 1':'month ID', 'spatial id level 1':'sub-region ID',
                   'target':'occurrence'}

temporal_test(data=data, spatial_id = [1], test_type='autoreg', lags = 3,
              column_identifier = column_identifier, saving_plot_path = './')