temporal_scale_transform

Description

Change the data temporal scale to the desired temporal scale specified by the user and create a data set that contains covariate values for the units of the higher-level temporal scale. To obtain the values of each covariate for each unit of the bigger scale, the values of this covariate are averaged over all the units of the smaller scale which belong to that bigger scale unit, and if the user prefers to augment the data, the moving average method will be used to obtain data with a bigger temporal scale, but almost the same volume as input data.

Usage

preprocess.temporal_scale_transform(data, column_identifier=None, temporal_scale_level=2, augmentation=False, verbose=0)

Parameters

#	Input Name	Input Description
1	data	type: data frame or str default: - details: a data frame or address of a data frame containing temporal covariates. The data includes the following columns: Spatial ids: The id of the units in the finest spatial scale of input data must be included in the data with the name ‘spatial id level 1’. The id of units in the secondary spatial scales of input data could be included in the data in columns named ‘spatial id level x’, where x shows the related scale level. Temporal ids: The id of temporal units recorded in the input data for each temporal scale must be included as a separate column in the data with a name in a format ‘temporal id level x’, where ‘x’ is the related temporal scale level beginning with level 1 for the smallest scale. The temporal units could have a free but sortable format like year number, week number and so on. However, the integrated format of date and time is also supported. In the case of using integrated format, only the smallest temporal scale must be included in the data with the column name of ‘temporal id’. The expected format of each scale is shown in Table 2. All the remaining columns are considered as covariates. example: ‘./USA COVID-19 temporal data.csv’
2	column_identifier	type: dict or None default: None details:If the input data column names do not match the specific format of temporal and spatial ids (i.e., ‘temporal id’, ‘temporal id level x’, ‘spatial id level x’), a dictionary must be passed to specify the content of each column. The keys must be a string in one of the formats: {‘temporal id’,’temporal id level x’,’spatial id level x’} The values of ‘temporal id level x’ and ‘spatial id level x’ must be the name of the column containing the temporal or spatial ids in the scale level x respectively. If the input data has integrated format for temporal ids, the name of the corresponding column must be specified with the key ‘temporal id’. example: {‘temporal id level 1’: ‘week’,’temporal id level 2’: ‘year’,’spatial id level 1’: ‘county_fips’, ‘spatial id level 2’: ‘state_fips’}
3	temporal_scale_level	type: {2, 3, …} default: 2 details: The level of the desired temporal scale which must be equal to one of the input data temporal id levels. if the temporal id have a integrated format, the scale of the specified level will be determined based on the input data scale and the following sequence of temporal scales: Second, Minute, Hour, Day, Week, Month, Year
4	augmentation	type: bool default: False details: Specify whether or not to augment data when using bigger temporal scales to avoid data volume decrease. If true, the moving average method will be used to obtain data with the higher level temporal scale, but almost the same volume as the input data with smaller temporal scale.
5	verbose	type: int default: 0 details: The level of details in produced logging information available options: 0: no logging 1: only important information logging

Returns

#	Output Name	Output Description
1	transformed_data	type:data frame details: a data frame containing the covariate values for desired temporal scale units.

Example

import pandas as pd
from stpredict.preprocess import temporal_scale_transform

df = pd.read_csv('USA COVID-19 spatial data.csv')

transformed_df = temporal_scale_transform(data = df, temporal_scale_level = 3)