temporal_test

Description

Test for the correlation of the target variable in time.

Options are:

Augmented dickey-fuller (ADF) test

Auto-correlation function (ACF) plot

Fit the autoregressive model with user-specified lags and report the coefficients

Usage

temporal_test(data, spatial_id=None, test_type='ADF', lags=1, column_identifier=None, saving_plot_path='./')

Parameters

#	Input Name	Input Description
1	data	type: data frame or str default:- details: a data frame containing target variable. or its address. The data must includes following columns: Spatial ids: The id of the units in the finest spatial scale of input data must be included in the data in a column with the name ‘spatial id level 1’. Temporal ids: The id of time units recorded in the input data for each temporal scale must be included as a separate column in the data with a name in a format ‘temporal id level x’, where ‘x’ is the related temporal scale level beginning with level 1 for the smallest scale. The temporal units could have a free but sortable format like year number, week number and so on. The combination of these temporal scale levels’ ids should form a unique identifier. However the integrated format of date and time is also supported. In the case of using integrated format, only the smallest temporal scale must be included in the data with the column name of ‘temporal id’. The expected format of each scale is shown in Table 2. example: ‘my_directory/my_data.csv’
2	spatial_id	type: list<any type> or None default: None details: The ids of the spatial units whose target variable values will be used in the test. If None is passed, the test is performed for all spatial units in the data. Note that only one spatial unit should be specified for the ACF test. example: [01001],[1001],[‘Alabama’]
3	test_type	type: {‘ACF’, ‘ADF’, ‘autoreg’} default: ‘ADF’ details: The type of test that is used to check the correlation of target variable in time. ‘ADF’: The augmented dickey-fuller (ADF) test is performed to check if the target variable is a stationary time series. ‘ACF’: The autocorrelation function is plotted for specified lags. The resulting plot depicts correlation (vertical axis) against lag (horisontal axis). ‘autoreg’: An autoregressive model is fitted to the target variable values with specified lags to obtain coefficients.
4	lags	type: int default: 1 details: The number of temporal lags considered in the test. example: 3
5	column_identifier	type: dict or None default: None details: If the input data column names does not match the specific format of temporal and spatial ids (i.e. ‘temporal id’, ‘temporal id level x’,’spatial id level x’), a dictionary must be passed to specify the content of each column. The keys must be a string in one of the formats: {‘temporal id’,’temporal id level x’,’spatial id level x’} The values of ‘temporal id level x’ and ‘spatial id level x’ must be the name of the column containing the temporal or spatial ids in the scale level x respectively. If the input data have integrated format for temporal ids, the name of the corresponding column must be specified with the key ‘temporal id’. example: {‘temporal id level 1’: ‘week’,’temporal id level 2’: ‘year’,’spatial id level 1’: ‘county_fips’, ‘spatial id level 2’: ‘state_fips’}
6	saving_plot_path	type: string or None default: None details: The path to save a plots If None is passed, the plot will not be saved. example: ‘./’

Returns

#	Output Name	Output Description
1	test result	type: dict or None details: If test_type is ‘ADF’, the test statistics and critical values is returned. If test_type is ‘autoreg’, the coefficients of the fitted autoregressive model is returned. If test_type is ‘ACF’, nothing is returned and the plot is saved in saving_plot_path.

Note

The implementation of the statsmodels package is used for all tests.

Example

from stpredict.preprocess import temporal_test
from stpredict import load_earthquake_data

data = load_earthquake_data()

column_identifier={'temporal id level 1':'month ID', 'spatial id level 1':'sub-region ID',
                   'target':'occurrence'}

temporal_test(data=data, spatial_id = [1], test_type='autoreg', lags = 3,
              column_identifier = column_identifier, saving_plot_path = './')