Data structure
Raw data
stpredict provides a flexible data structure to receive a spatio-temporal data in the functions. Input data of the preprocess module functions can be presented in two dataframes:
Temporal data: Temporal data (also called as time dependent data) includes the information of the variables that are time-varying and their values change over time. Temporal data must include the following columns:
Spatial ids: The id of the units in the finest spatial scale of input data must be included in the temporal data in a column with the name ‘spatial id level 1’. The id of units in the secondary spatial scales of input data could be included in the temporal data in columns named ‘spatial id level x’, where x shows the related scale level or could be given in a spatial scale table. Note that spatial id(s) must have unique values.
Temporal ids: The id of time units recorded in the input data for each temporal scale must be included as a separate column in the temporal data with a name in a format ‘temporal id level x’, where ‘x’ is the related temporal scale level beginning with level 1 for the smallest scale. The temporal units could have a free but sortable format like year number, week number and so on. The combination of these temporal scale levels’ ids should form a unique identifier. However the integrated format of date and time is also supported. In the case of using integrated format, only the smallest temporal scale must be included in the temporal data with the column name of ‘temporal id’. The expected format of each scale is shown in Table 2.
Temporal covariates: The temporal covariates must be specified in a temporal data with the column name in a format ‘temporal covariate x’ where ‘x’ is the covariate number.
Target: The column of the target variable in the temporal data must be named ‘target’.
Spatial data: Spatial data (also called as time independent data) includes the information on variables which their values only depend on the spatial aspect of the problem. spatial data must includes following columns:
Spatial ids: The id of the units in the finest spatial scale of input data must be included in the spatial data with the name ‘spatial id level 1’. The id of units in the secondary spatial scales of input data could be included in the spatial data in columns named ‘spatial id level x’, where x shows the related scale level or could be given in the spatial scale table.
Spatial covariates: The spatial covariates must be specified in a spatial data with the column names in a format ‘spatial covariate x’, where the ‘x’ is the covariate number.
Scale |
Id format |
|---|---|
second |
YYYY/MM/DD HH:MM:SS |
minute |
YYYY/MM/DD HH:MM |
hour |
YYYY/MM/DD HH |
day |
YYYY/MM/DD |
week |
YYYY/MM/DD |
month |
YYYY/MM |
year |
YYYY |
Note. for the week scale the date of the week’s first day must be considered. |
|
Fig. 3 represent a sample input data tables. As it is obvious the ids of secondary spatial scale can be included in the spatial and temporal data tables or be received in a separate input (i.e. spatial scale table), and the temporal units can be specified using multiple temporal scales with free form ids or using an integrated format of date and time.
Fig. 3 Sample spatial and temporal data tables
If user desire to use the information of some of the covariates (denoted by futuristic covariates) in the future temporal units for prediction, The values of these covariates in the future temporal units (i.e. the temporal units after the last temporal unit in the input temporal data) can be passed to the related functions to be considered in making the historical data. The expected format is a data frame including the exactly same temporal and spatial id columns as in the input data. Fig. 4 shows the sample data frame having expected format for future data table.
Fig. 4 Sample future data table
Historical data
Fig. 5 Sample historical data