target_modification

Description

Modify the target variable based on specified target mode

Usage

preprocess.target_modification(data, target_mode, column_identifier=None, verbose=0)

Parameters

#	Input Name	Input Description
1	data	type: data frame or str default:- details: The data frame or address of data frame containing target variable values for different spatial and temporal units. The data includes the following columns: Spatial id: The id of the units in the finest spatial scale of input data must be included in the data with the name ‘spatial id level 1’. Temporal ids: The id of temporal units recorded in the input data for each temporal scale must be included as a separate column in the data with a name in a format ‘temporal id level x’, where ‘x’ is the related temporal scale level beginning with level 1 for the smallest scale. The temporal units could have a free but sortable format like year number, week number and so on. However, the integrated format of date and time is also supported. In the case of using integrated format, only the smallest temporal scale must be included in the data with the column name of ‘temporal id’. The expected format of each scale is shown in:numref:target tab 1. Note. When using the integrated format, for the week scale the date of the week’s first day must be specified as a temporal id. Target: The values of the target variable must be included in a column named ‘target’ in the data. Note. extra columns are allowed. example: ‘./USA COVID-19 temporal data.csv’
2	target_mode	type: {‘normal’,’cumulative’, ‘differential’, ‘moving average’} default: ‘normal’ details: The desired mode of target variable for modeling: ‘normal’: No modification. ‘cumulative’: Modify the target variable to show the cumulative value of this variable from the first temporal unit in the data. ‘differential’: Modify the target variable to show the difference between the value of the variable in current and previous temporal unit. ‘moving average’ : Modify the target variable values to represent the moving average of the variable on the next higher-level temporal scale units. More clearly the value of the target variable for each temporal unit is the average of the variable values on the previous temporal unit, with a bigger scale (e.g. for initial temporal scale day, the target value for each day is the average of variable values on the previous week of that day). The next higher-level scale is determined based on the temporal id levels in input data, and if the temporal ids have a integrated format, it is determined based on the scale of the input data and the following sequence of temporal scales: Second, Minute, Hour, Day, Week, Month, Year
3	column_identifier	type: dict or None default: None details: If the input data column names do not match the specific format of temporal and spatial ids and target variable (i.e., ‘temporal id’, ‘temporal id level x’, ‘spatial id level x’, ‘target’), a dictionary must be passed to specify the content of each column. The keys must be a string in one of the formats: {‘temporal id’,’temporal id level x’,’spatial id level x’, ‘target’} The values of ‘temporal id level x’ and ‘spatial id level x’ must be the name of the column containing the temporal or spatial ids in the scale level x respectively. If the input data has integrated format for temporal ids, the name of the corresponding column must be specified with the key ‘temporal id’. The value of the ‘target’ is the column name of the target variable. example: {‘temporal id level 1’: ‘week’,’temporal id level 2’: ‘year’,’spatial id level 1’: ‘county_fips’, ‘spatial id level 2’: ‘state_fips’, ‘target’:’covid-19 deaths’}
4	verbose	type: int default: 0 details: The level of details in produced logging information available options: 0: no logging 1: only important information logging

Returns

#	Output Name	Output Description
1	modified_data	type: data frame details: The input data frame with the target variable values in user-specified target mode

Example

import pandas as pd
from stpredict.preprocess import target_modification

df = pd.read_csv('USA COVID-19 temporal data.csv')

modified_df = target_modification(data = df, target_mode = 'moving average')