spatial_scale_transform

Description

Change the data spatial scale to the desired spatial scale specified by the user. To obtain a data frame that contains covariate values for the units of the higher level spatial scale, each covariate is aggregated over all the units of smaller level scale which belong to a unit of bigger level scale (desired spatial scale).The aggregation of each covariate is performed based on a specified aggregation mode (mean or sum) for that covariate.

Usage

preprocess.spatial_scale_transform(data, data_type, spatial_scale_table=None, spatial_scale_level=2, aggregation_mode='mean', column_identifier=None, verbose=0)

Parameters

#	Input Name	Input Description
1	data	type: data frame or str default: - details: a data frame or address of a data frame containing temporal or spatial covariates and the id of spatial (and temporal) units. Spatial ids: The id of the units in the finest spatial scale of input data must be included in the data with the name ‘spatial id level 1’. The id of units in the secondary spatial scales of input data could be included in the data in columns named ‘spatial id level x’, where x shows the related scale level or could be given in a spatial_scale_table. Temporal ids (only for temporal data_type): The id of temporal units recorded in the input data for each temporal scale must be included as a separate column in the data with a name in a format ‘temporal id level x’, where ‘x’ is the related temporal scale level beginning with level 1 for the smallest scale. The temporal units could have a free but sortable format like year number, week number and so on. The combination of these temporal scale levels’ ids should form a unique identifier. However, the integrated format of date and time is also supported. In the case of using integrated format, only the smallest temporal scale must be included in the data with the column name of ‘temporal id’. The expected format of each scale is shown in Table 2. All the remaining columns are considered as covariates.
2	data_type	type: {‘spatial’,’temporal’} default: - details: type of input data, If data in addition to spatial dimension has an temporal dimension the data_type is ‘temporal’, otherwise it is ‘spatial’.
3	spatial_scale_table	type: data frame or None default: None details: If the ids of secondary spatial scale units are not included in the input data, a data frame must be passed to the function containing different spatial scales information, with the first column named ‘spatial id level 1’, and including the id of the units in the smallest spatial scale and the rest of the columns including the id of bigger scale units for each unit of the smallest scale. If the column names do not match the format ‘spatial id level x’ the content of each column must be specified using the column_identifier argument. the address of the data frame could also be passed.
4	spatial_scale_level	type: {2, 3, …} default: 2 details: Level of spatial scale which data scale will be transformed to.
5	aggregation_mode	type: {‘sum’,’mean’} or dict default: ‘mean’ details: Aggregation operator which is used to derive covariate values for samples of bigger spatial scale from samples of smaller spatial scale. This operator could be different for each covariate, which in this case, a dictionary must be passed with the covariate names (or tuple of multiple covariate names) as its keys and ‘mean’ or ‘sum’ as its values. example: {‘temperature’:’mean’,’precipitation’:’sum’}
6	column_identifier	type: dict or None default: None details:If the input data column names do not match the specific format of temporal and spatial ids (i.e., ‘temporal id’, ‘temporal id level x’, ‘spatial id level x’), a dictionary must be passed to specify the content of each column. The keys must be a string in one of the formats: {‘temporal id’,’temporal id level x’,’spatial id level x’} The values of ‘temporal id level x’ and ‘spatial id level x’ must be the name of the column containing the temporal or spatial ids in the scale level x respectively. If the input data has integrated format for temporal ids, the name of the corresponding column must be specified with the key ‘temporal id’. example: {‘temporal id level 1’: ‘week’,’temporal id level 2’: ‘year’,’spatial id level 1’: ‘county_fips’, ‘spatial id level 2’: ‘state_fips’}
7	verbose	type: int default: 0 details: The level of details in produced logging information available options: 0: no logging 1: only important information logging 2: all details logging

Returns

#	Output Name	Output Description
1	transformed_data	type: data frame details: a data frame containing the covariate values for desired spatial scale units.

Example

import pandas as pd
from stpredict.preprocess import spatial_scale_transform

df = pd.read_csv('USA COVID-19 spatial data.csv')
scales_df = pd.read_csv('spatial scales data.csv')

transformed_df = spatial_scale_transform(data = df, data_type = 'spatial',
                                         spatial_scale_table = scales_df,
                                         spatial_scale_level = 3)