temporal_scale_transform
Description
Change the data temporal scale to the desired temporal scale specified by the user and create a data set that contains covariate values for the units of the higher-level temporal scale. To obtain the values of each covariate for each unit of the bigger scale, the values of this covariate are averaged over all the units of the smaller scale which belong to that bigger scale unit, and if the user prefers to augment the data, the moving average method will be used to obtain data with a bigger temporal scale, but almost the same volume as input data.
Usage
- preprocess.temporal_scale_transform(data, column_identifier=None, temporal_scale_level=2, augmentation=False, verbose=0)
Parameters
# |
Input Name |
Input Description |
|---|---|---|
1
|
data
|
type: data frame or str
default: -
details:
a data frame or address of a data frame containing temporal
covariates. The data includes the following columns:
Spatial ids:
The id of the units in the finest spatial scale of input data must be
included in the data with the name ‘spatial id level 1’.
The id of units in the secondary spatial scales of input data could be
included in the data in columns named ‘spatial id level x’, where x
shows the related scale level.
Temporal ids:
The id of temporal units recorded in the input data for each temporal
scale must be included as a separate column in the data with a name
in a format ‘temporal id level x’, where ‘x’ is the related temporal
scale level beginning with level 1 for the smallest scale.
The temporal units could have a free but sortable format like year
number, week number and so on.
However, the integrated format of date and time is also supported. In
the case of using integrated format, only the smallest temporal scale
must be included in the data with the column name of ‘temporal id’.
The expected format of each scale is shown in Table 2.
All the remaining columns are considered as covariates.
example: ‘./USA COVID-19 temporal data.csv’
|
2
|
column_identifier
|
type: dict or None
default: None
details:If the input data column names do not match the specific
format of temporal and spatial ids (i.e., ‘temporal id’, ‘temporal id
level x’, ‘spatial id level x’), a dictionary must be passed to
specify the content of each column.
The keys must be a string in one of the formats: {‘temporal
id’,’temporal id level x’,’spatial id level x’}
The values of ‘temporal id level x’ and ‘spatial id level x’ must be
the name of the column containing the temporal or spatial ids in the
scale level x respectively.
If the input data has integrated format for temporal ids, the name
of the corresponding column must be specified with the key ‘temporal
id’.
example: {‘temporal id level 1’: ‘week’,’temporal id level 2’:
‘year’,’spatial id level 1’: ‘county_fips’, ‘spatial id level 2’:
‘state_fips’}
|
3
|
temporal_scale_level
|
type: {2, 3, …}
default: 2
details: The level of the desired temporal scale which must be
equal to one of the input data temporal id levels. if the temporal id
have a integrated format, the scale of the specified level will be
determined based on the input data scale and the following sequence of
temporal scales:
Second, Minute, Hour, Day, Week, Month, Year
|
4
|
augmentation
|
type: bool
default: False
details: Specify whether or not to augment data when using bigger
temporal scales to avoid data volume decrease. If true, the moving
average method will be used to obtain data with the higher level
temporal scale, but almost the same volume as the input data with
smaller temporal scale.
|
5
|
verbose
|
type: int
default: 0
details: The level of details in produced logging information
available options:
0: no logging
1: only important information logging
|
Returns
# |
Output Name |
Output Description |
|---|---|---|
1
|
transformed_data
|
type:data frame
details: a data frame containing the covariate values for desired
temporal scale units.
|
Example
import pandas as pd
from stpredict.preprocess import temporal_scale_transform
df = pd.read_csv('USA COVID-19 spatial data.csv')
transformed_df = temporal_scale_transform(data = df, temporal_scale_level = 3)