Package 'opera'

Title:	Online Prediction by Expert Aggregation
Description:	Misc methods to form online predictions, for regression-oriented time-series, by combining a finite set of forecasts provided by the user. See Cesa-Bianchi and Lugosi (2006) <doi:10.1017/CBO9780511546921> for an overview.
Authors:	Pierre Gaillard [cre, aut], Yannig Goude [aut], Laurent Plagne [ctb], Thibaut Dubois [ctb], Benoit Thieurmel [ctb]
Maintainer:	Pierre Gaillard <[email protected]>
License:	LGPL
Version:	1.2.2
Built:	2025-03-07 05:36:20 UTC
Source:	https://github.com/dralliag/opera

Help Index

Function to check validy of provided loss function
Function to check and modify the input class and type
Electricity forecasting data set
Implementation of FTRL (Follow The Regularized Leader)
Errors suffered by a sequence of predictions
Compute an aggregation rule
Compute oracle predictions
Functions to render dynamic mixture graphs using rAmCharts
Plot an object of class mixture
Plot an aggregation procedure
Functions to render dynamic oracle graphs using rAmCharts
Predict method for Mixture models
Convert a 1-dimensional series to blocks

Function to check validy of provided loss function

Description

Function to check validy of provided loss function

Usage

check_loss(loss.type, loss.gradient, Y = NULL, model = NULL)
check_loss(loss.type, loss.gradient, Y = NULL, model = NULL)

Arguments

`loss.type`	`character, list or function`. character Name of the loss to be applied ('square', 'absolute', 'percentage', or 'pinball'); list When using pinball loss: list with field name equal to 'pinball' and field tau equal to the required quantile in [0,1]; function A custom loss as a function of two parameters.
`loss.gradient`	`boolean, function`. boolean If TRUE, the aggregation rule will not be directly applied to the loss function at hand, but to a gradient version of it. The aggregation rule is then similar to gradient descent aggregation rule. function If loss.type is a function, the derivative should be provided to be used (it is not automatically computed).
`Y`	`numeric` (NULL). (Optional) Target values (to perform some checks).
`model`	`character` (NULL). (Optional) Model used (to perform some checks).

Value

loss.type

Function to check and modify the input class and type

Description

Function to check and modify the input class and type

Usage

check_matrix(mat, name)
check_matrix(mat, name)

Arguments

`mat`	`data.frame, data.table, tibble`. Object to be cast to matrix.
`name`	`character`. Name of the object to be cast.

Value

a 3d array if a 3d array is provided, else a matrix.

Electricity forecasting data set

Description

Electricity forecasting data set provided by EDF R&D. It contains weekly measurements of the total electricity consumption in France from 1996 to 2009, together with several covariates, including temperature, industrial production indices (source: INSEE) and calendar information.

Usage

data(electric_load)
data(electric_load)

Format

An object of class data.frame with 731 rows and 11 columns.

Examples

 data(electric_load)
 # a few graphs to display the data
 attach(electric_load)
 plot(Load, type = 'l')
 plot(Temp, Load, pch = 16, cex = 0.5)
 plot(NumWeek, Load, pch = 16, cex = 0.5)
 plot(Load, Load1, pch = 16, cex = 0.5)
 acf(Load, lag.max = 20)
 detach(electric_load)
data(electric_load)
 # a few graphs to display the data
 attach(electric_load)
 plot(Load, type = 'l')
 plot(Temp, Load, pch = 16, cex = 0.5)
 plot(NumWeek, Load, pch = 16, cex = 0.5)
 plot(Load, Load1, pch = 16, cex = 0.5)
 acf(Load, lag.max = 20)
 detach(electric_load)

Implementation of FTRL (Follow The Regularized Leader)

Description

FTRL (Shalev-Shwartz and Singer 2007) and Chap. 5 of (Hazan 2019) is the online counterpart of empirical risk minimization. It is a family of aggregation rules (including OGD) that uses at any time the empirical risk minimizer so far with an additional regularization. The online optimization can be performed on any bounded convex set that can be expressed with equality or inequality constraints. Note that this method is still under development and a beta version.

Usage

FTRL(
  y,
  experts,
  eta = NULL,
  fun_reg = NULL,
  fun_reg_grad = NULL,
  constr_eq = NULL,
  constr_eq_jac = NULL,
  constr_ineq = NULL,
  constr_ineq_jac = NULL,
  loss.type = list(name = "square"),
  loss.gradient = TRUE,
  w0 = NULL,
  max_iter = 50,
  obj_tol = 0.01,
  training = NULL,
  default = FALSE,
  quiet = TRUE
)
FTRL(
  y,
  experts,
  eta = NULL,
  fun_reg = NULL,
  fun_reg_grad = NULL,
  constr_eq = NULL,
  constr_eq_jac = NULL,
  constr_ineq = NULL,
  constr_ineq_jac = NULL,
  loss.type = list(name = "square"),
  loss.gradient = TRUE,
  w0 = NULL,
  max_iter = 50,
  obj_tol = 0.01,
  training = NULL,
  default = FALSE,
  quiet = TRUE
)

Arguments

`y`	`vector`. Real observations.
`experts`	`matrix`. Matrix of experts previsions.
`eta`	`numeric`. Regularization parameter.
`fun_reg`	`function` (NULL). Regularization function to be applied during the optimization.
`fun_reg_grad`	`function` (NULL). Gradient of the regularization function (to speed up the computations).
`constr_eq`	`function` (NULL). Constraints (equalities) to be applied during the optimization.
`constr_eq_jac`	`function` (NULL). Jacobian of the equality constraints (to speed up the computations).
`constr_ineq`	`function` (NULL). Constraints (inequalities) to be applied during the optimization (... > 0).
`constr_ineq_jac`	`function` (NULL). Jacobian of the inequality constraints (to speed up the computations).
`loss.type`	`character, list or function` ("square"). character Name of the loss to be applied ('square', 'absolute', 'percentage', or 'pinball'); list List with field `name` equal to the loss name. If using pinball loss, field `tau` equal to the required quantile in [0,1]; function A custom loss as a function of two parameters (prediction, label).
`loss.gradient`	`boolean, function` (TRUE). boolean If TRUE, the aggregation rule will not be directly applied to the loss function at hand, but to a gradient version of it. The aggregation rule is then similar to gradient descent aggregation rule. function If loss.type is a function, the derivative of the loss in its first component should be provided to be used (it is not automatically computed).
`w0`	`numeric` (NULL). Vector of initialization for the weights.
`max_iter`	`integer` (50). Maximum number of iterations of the optimization algorithm per round.
`obj_tol`	`numeric` (1e-2). Tolerance over objective function between two iterations of the optimization.
`training`	`list` (NULL). List of previous parameters.
`default`	`boolean` (FALSE). Whether or not to use default parameters for fun_reg, constr_eq, constr_ineq and their grad/jac, which values are ALL ignored when TRUE.
`quiet`	`boolean` (FALSE). Whether or not to display progress bars.

Value

object of class mixture.

References

Hazan E (2019). “Introduction to online convex optimization.” arXiv preprint arXiv:1909.05207.

Shalev-Shwartz S, Singer Y (2007). “A primal-dual perspective of online learning algorithms.” Machine Learning, 69(2), 115–142.

Errors suffered by a sequence of predictions

Description

The function loss computes the sequence of instantaneous losses suffered by the predictions in x to predict the observation in y.

Usage

loss(
  x,
  y,
  pred = NULL,
  loss.type = list(name = "square"),
  loss.gradient = FALSE
)
loss(
  x,
  y,
  pred = NULL,
  loss.type = list(name = "square"),
  loss.gradient = FALSE
)

Arguments

`x`	`numeric`. A vector of length `T` containing the sequence of predictions to be evaluated.
`y`	`numeric`. A vector of length `T` that contains the observations to be predicted.
`pred`	`numeric`. A vector of length `T` containing the sequence of real values.
`loss.type`	`character, list or function` ("square"). character Name of the loss to be applied ("square", "absolute", "percentage", or "pinball"). list List with field `name` equal to the loss name. If using pinball loss, field `tau` specifies the quantile in [0,1]. function A custom loss function of two parameters.
`loss.gradient`	`boolean, function` (TRUE). boolean If TRUE, the aggregation rule will not be directly applied to the loss function, but to a gradient version of it, similar to a gradient descent aggregation rule. function If loss.type is a function, provide the derivative to be used (not computed automatically).

Value

A vector of length T containing the sequence of instantaneous losses suffered by the expert predictions (x) or the gradient computed on the aggregated predictions (pred).

Author(s)

Pierre Gaillard <[email protected]>

Compute an aggregation rule

Description

The function mixture builds an aggregation rule chosen by the user. It can then be used to predict new observations Y sequentially. If observations Y and expert advice experts are provided, mixture is trained by predicting the observations in Y sequentially with the help of the expert advice in experts. At each time instance $t=1,2,\dots,T$ , the mixture forms a prediction of Y[t,] by assigning a weight to each expert and by combining the expert advice.

Usage

mixture(
  Y = NULL,
  experts = NULL,
  model = "MLpol",
  loss.type = "square",
  loss.gradient = TRUE,
  coefficients = "Uniform",
  awake = NULL,
  parameters = list(),
  quiet = TRUE,
  ...
)

## S3 method for class 'mixture'
print(x, ...)

## S3 method for class 'mixture'
summary(object, ...)
mixture(
  Y = NULL,
  experts = NULL,
  model = "MLpol",
  loss.type = "square",
  loss.gradient = TRUE,
  coefficients = "Uniform",
  awake = NULL,
  parameters = list(),
  quiet = TRUE,
  ...
)

## S3 method for class 'mixture'
print(x, ...)

## S3 method for class 'mixture'
summary(object, ...)

Arguments

`Y`	A matrix with T rows and d columns. Each row `Y[t,]` contains a d-dimensional observation to be predicted sequentially.
`experts`	An array of dimension `c(T,d,K)`, where `T` is the length of the data-set, `d` the dimension of the observations, and `K` is the number of experts. It contains the expert forecasts. Each vector `experts[t,,k]` corresponds to the d-dimensional prediction of `Y[t,]` proposed by expert k at time $t=1,\dots,T$ . In the case of real prediction (i.e., $d = 1$ ), `experts` is a matrix with `T` rows and `K` columns.
`model`	A character string specifying the aggregation rule to use. Currently available aggregation rules are: 'EWA' Exponentially weighted average aggregation rules (Cesa-Bianchi and Lugosi 2006). A positive learning rate eta can be chosen by the user. The bigger it is the faster the aggregation rule will learn from observations and experts performances. However, too high values lead to unstable weight vectors and thus unstable predictions. If it is not specified, the learning rate is calibrated online. A finite grid of potential learning rates to be optimized online can be specified with grid.eta. 'FS' Fixed-share aggregation rule (Cesa-Bianchi and Lugosi 2006). As for `ewa`, a learning rate eta can be chosen by the user or calibrated online. The main difference with `ewa` aggregation rule rely in the mixing rate alpha $\in [0,1]$ which considers at each instance a small probability `alpha` to have a rupture in the sequence and that the best expert may change. Fixed-share aggregation rule can thus compete with the best sequence of experts that can change a few times (see `oracle`), while `ewa` can only compete with the best fixed expert. The mixing rate alpha is either chosen by the user either calibrated online. Finite grids of learning rates and mixing rates to be optimized can be specified with parameters grid.eta and grid.alpha. 'Ridge' Online Ridge regression (Cesa-Bianchi and Lugosi 2006). It minimizes at each instance a penalized criterion. It forms at each instance linear combination of the experts' forecasts and can assign negative weights that not necessarily sum to one. It is useful if the experts are biased or correlated. It cannot be used with specialized experts. A positive regularization coefficient lambda can either be chosen by the user or calibrated online. A finite grid of coefficient to be optimized can be specified with a parameter grid.lambda. 'MLpol', 'MLewa', 'MLprod' Aggregation rules with multiple learning rates that are theoretically calibrated (Gaillard et al. 2014). 'BOA' Bernstein online Aggregation (Wintenberger 2017). The learning rates are automatically calibrated. 'OGD' Online Gradient descent (Zinkevich 2003). See also (Hazan 2019). The optimization is performed with a time-varying learning rate. At time step $t \geq 1$ , the learning rate is chosen to be $t^{-\alpha}$ , where $\alpha$ is provided by alpha in the parameters argument. The algorithm may or not perform a projection step into the simplex space (non-negative weights that sum to one) according to the value of the parameter 'simplex' provided by the user. 'FTRL' Follow The Regularized Leader (Shalev-Shwartz and Singer 2007). Note that here, the linearized version of FTRL is implemented (see Chap. 5 of (Hazan 2019)). `FTRL` is the online counterpart of empirical risk minimization. It is a family of aggregation rules (including OGD) that uses at any time the empirical risk minimizer so far with an additional regularization. The online optimization can be performed on any bounded convex set that can be expressed with equality or inequality constraints. Note that this method is still under development and a beta version. The user must provide (in the parameters's list): 'eta' The learning rate. 'fun_reg' The regularization function to be applied on the weigths. See `auglag`: fn. 'constr_eq' The equality constraints (e.g. sum(w) = 1). See `auglag`: heq. 'constr_ineq' The inequality constraints (e.g. w > 0). See `auglag`: hin. 'fun_reg_grad' (optional) The gradient of the regularization function. See `auglag`: gr. 'constr_eq_jac' (optional) The Jacobian of the equality constraints. See `auglag`: heq.jac 'constr_ineq_jac' (optional) The Jacobian of the inequality constraints. See `auglag`: hin.jac or set default to TRUE. In the latter, FTRL is performed with Kullback regularization (`fun_reg(x) = sum(x log (x/w0))` on the simplex (`constr_eq(w) = sum(w) - 1` and `constr_ineq(w) = w`). Parameters w0 (weight initialization), and max_iter can also be provided.
`loss.type`	`character, list, or function` ("square"). character Name of the loss to be applied ('square', 'absolute', 'percentage', or 'pinball'); list List with field `name` equal to the loss name. If using pinball loss, field `tau` equal to the required quantile in [0,1]; function A custom loss as a function of two parameters (prediction, observation). For example, $f(x,y) = abs(x-y)/y$ for the Mean absolute percentage error or $f(x,y) = (x-y)^2$ for the squared loss.
`loss.gradient`	`boolean, function` (TRUE). boolean If TRUE, the aggregation rule will not be directly applied to the loss function at hand, but to a gradient version of it. The aggregation rule is then similar to gradient descent aggregation rule. function Can be provided if loss.type is a function. It should then be a sub-derivative of the loss in its first component (i.e., in the prediction). For instance, $g(x) = (x-y)$ for the squared loss.
`coefficients`	A probability vector of length K containing the prior weights of the experts (not possible for 'MLpol'). The weights must be non-negative and sum to 1.
`awake`	A matrix specifying the activation coefficients of the experts. Its entries lie in `[0,1]`. Possible if some experts are specialists and do not always form and suggest prediction. If the expert number `k` at instance `t` does not form any prediction of observation `Y_t`, we can put `awake[t,k]=0` so that the mixture does not consider expert `k` in the mixture to predict `Y_t`.
`parameters`	A list that contains optional parameters for the aggregation rule. If no parameters are provided, the aggregation rule is fully calibrated online. Possible parameters are: eta A positive number defining the learning rate. Possible if model is either 'EWA' or 'FS' grid.eta A vector of positive numbers defining potential learning rates for 'EWA' of 'FS'. The learning rate is then calibrated by sequentially optimizing the parameter in the grid. The grid may be extended online if needed by the aggregation rule. gamma A positive number defining the exponential step of extension of grid.eta when it is needed. The default value is 2. alpha A number in [0,1]. If the model is 'FS', it defines the mixing rate. If the model is 'OGD', it defines the order of the learning rate: $\eta_t = t^{-\alpha}$ . grid.alpha A vector of numbers in [0,1] defining potential mixing rates for 'FS' to be optimized online. The grid is fixed over time. The default value is `[0.0001,0.001,0.01,0.1]`. lambda A positive number defining the smoothing parameter of 'Ridge' aggregation rule. grid.lambda Similar to `grid.eta` for the parameter `lambda`. simplex A boolean that specifies if 'OGD' does a project on the simplex. In other words, if TRUE (default) the online gradient descent will be under the constraint that the weights sum to 1 and are non-negative. If FALSE, 'OGD' performs an online gradient descent on K dimensional real space. without any projection step. averaged A boolean (default is FALSE). If TRUE the coefficients and the weights returned (and used to form the predictions) are averaged over the past. It leads to more stability on the time evolution of the weights but needs more regularity assumption on the underlying process generating the data (i.i.d. for instance).
`quiet`	`boolean`. Whether or not to display progress bars.
`...`	Additional parameters
`x`	An object of class mixture
`object`	An object of class mixture

Value

An object of class mixture that can be used to perform new predictions. It contains the parameters model, loss.type, loss.gradient, experts, Y, awake, and the fields

`coefficients`	A vector of coefficients assigned to each expert to perform the next prediction.
`weights`	A matrix of dimension `c(T,K)`, with `T` the number of instances to be predicted and `K` the number of experts. Each row contains the convex combination to form the predictions
`prediction`	A matrix with `T` rows and `d` columns that contains the predictions outputted by the aggregation rule.
`loss`	The average loss (as stated by parameter `loss.type`) suffered by the aggregation rule.
`parameters`	The learning parameters chosen by the aggregation rule or by the user.
`training`	A list that contains useful temporary information of the aggregation rule to be updated and to perform predictions.

Author(s)

Pierre Gaillard <[email protected]> Yannig Goude <[email protected]>

References

Cesa-Bianchi N, Lugosi G (2006). Prediction, learning, and games. Cambridge university press.

Gaillard P, Stoltz G, van Erven T (2014). “A Second-order Bound with Excess Losses.” In Proceedings of COLT'14, volume 35, 176–196.

Hazan E (2019). “Introduction to online convex optimization.” arXiv preprint arXiv:1909.05207.

Shalev-Shwartz S, Singer Y (2007). “A primal-dual perspective of online learning algorithms.” Machine Learning, 69(2), 115–142.

Wintenberger O (2017). “Optimal learning with Bernstein online aggregation.” Machine Learning, 106(1), 119–141.

Zinkevich M (2003). “Online convex programming and generalized infinitesimal gradient ascent.” In Proceedings of the 20th international conference on machine learning (icml-03), 928–936.

Examples

library('opera')  # load the package
set.seed(1)

# Example: find the best one week ahead forecasting strategy (weekly data)
# packages
library(mgcv)

# import data
data(electric_load)
idx_data_test <- 620:nrow(electric_load)
data_train <- electric_load[-idx_data_test, ]
data_test <- electric_load[idx_data_test, ]

# Build the expert forecasts 
# ##########################

# 1) A generalized additive model
gam.fit <- gam(Load ~ s(IPI) + s(Temp) + s(Time, k=3) + 
                 s(Load1) + as.factor(NumWeek), data = data_train)
gam.forecast <- predict(gam.fit, newdata = data_test)

# 2) An online autoregressive model on the residuals of a medium term model

# Medium term model to remove trend and seasonality (using generalized additive model)
detrend.fit <- gam(Load ~ s(Time,k=3) + s(NumWeek) + s(Temp) + s(IPI), data = data_train)
electric_load$Trend <- c(predict(detrend.fit), predict(detrend.fit,newdata = data_test))
electric_load$Load.detrend <- electric_load$Load - electric_load$Trend

# Residual analysis
ar.forecast <- numeric(length(idx_data_test))
for (i in seq(idx_data_test)) {
  ar.fit <- ar(electric_load$Load.detrend[1:(idx_data_test[i] - 1)])
  ar.forecast[i] <- as.numeric(predict(ar.fit)$pred) + electric_load$Trend[idx_data_test[i]]
}

# Aggregation of experts
###########################

X <- cbind(gam.forecast, ar.forecast)
colnames(X) <- c('gam', 'ar')
Y <- data_test$Load

matplot(cbind(Y, X), type = 'l', col = 1:6, ylab = 'Weekly load', xlab = 'Week')


# How good are the expert? Look at the oracles
oracle.convex <- oracle(Y = Y, experts = X, loss.type = 'square', model = 'convex')

if(interactive()){
  plot(oracle.convex)
}

oracle.convex

# Is a single expert the best over time ? Are there breaks ?
oracle.shift <- oracle(Y = Y, experts = X, loss.type = 'percentage', model = 'shifting')
if(interactive()){
  plot(oracle.shift)
}
oracle.shift

# Online aggregation of the experts with BOA
#############################################

# Initialize the aggregation rule
m0.BOA <- mixture(model = 'BOA', loss.type = 'square')

# Perform online prediction using BOA There are 3 equivalent possibilities 1)
# start with an empty model and update the model sequentially
m1.BOA <- m0.BOA
for (i in 1:length(Y)) {
  m1.BOA <- predict(m1.BOA, newexperts = X[i, ], newY = Y[i], quiet = TRUE)
}

# 2) perform online prediction directly from the empty model
m2.BOA <- predict(m0.BOA, newexpert = X, newY = Y, online = TRUE, quiet = TRUE)

# 3) perform the online aggregation directly
m3.BOA <- mixture(Y = Y, experts = X, model = 'BOA', loss.type = 'square', quiet = TRUE)

# These predictions are equivalent:
identical(m1.BOA, m2.BOA)  # TRUE
identical(m1.BOA, m3.BOA)  # TRUE

# Display the results
summary(m3.BOA)
if(interactive()){
  plot(m1.BOA)
}

# Plot options
##################################

# ?plot.mixture

# static or dynamic : dynamic = F/T
plot(m1.BOA, dynamic = FALSE)

# just one plot with custom label ?
# 'plot_weight', 'boxplot_weight', 'dyn_avg_loss', 
# 'cumul_res', 'avg_loss', 'contrib'
if(interactive()){
  plot(m1.BOA, type = "plot_weight", 
       main = "Poids", ylab = "Poids", xlab = "Temps" )
}

# subset rows / time 
plot(m1.BOA, dynamic = FALSE, subset = 1:10)

# plot best n expert
plot(m1.BOA, dynamic = FALSE, max_experts = 1)

# Using d-dimensional time-series
##################################

# Consider the above exemple of electricity consumption 
#  to be predicted every four weeks
YBlock <- seriesToBlock(X = Y, d = 4)
XBlock <- seriesToBlock(X = X, d = 4)

# The four-week-by-four-week predictions can then be obtained 
# by directly using the `mixture` function as we did earlier. 

MLpolBlock <- mixture(Y = YBlock, experts = XBlock, model = "MLpol", loss.type = "square", 
                      quiet = TRUE)


# The predictions can finally be transformed back to a 
# regular one dimensional time-series by using the function `blockToSeries`.

prediction <- blockToSeries(MLpolBlock$prediction)

#### Using the `online = FALSE` option

# Equivalent solution is to use the `online = FALSE` option in the predict function. 
# The latter ensures that the model coefficients are not 
# updated between the next four weeks to forecast.
MLpolBlock <- mixture(model = "BOA", loss.type = "square")
d = 4
n <- length(Y)/d
for (i in 0:(n-1)) { 
  idx <- 4*i + 1:4 # next four weeks to be predicted
  MLpolBlock <- predict(MLpolBlock, newexperts = X[idx, ], newY = Y[idx], online = FALSE, 
                        quiet = TRUE)
}

print(head(MLpolBlock$weights))

library('opera')  # load the package
set.seed(1)

# Example: find the best one week ahead forecasting strategy (weekly data)
# packages
library(mgcv)

# import data
data(electric_load)
idx_data_test <- 620:nrow(electric_load)
data_train <- electric_load[-idx_data_test, ]
data_test <- electric_load[idx_data_test, ]

# Build the expert forecasts 
# ##########################

# 1) A generalized additive model
gam.fit <- gam(Load ~ s(IPI) + s(Temp) + s(Time, k=3) + 
                 s(Load1) + as.factor(NumWeek), data = data_train)
gam.forecast <- predict(gam.fit, newdata = data_test)

# 2) An online autoregressive model on the residuals of a medium term model

# Medium term model to remove trend and seasonality (using generalized additive model)
detrend.fit <- gam(Load ~ s(Time,k=3) + s(NumWeek) + s(Temp) + s(IPI), data = data_train)
electric_load$Trend <- c(predict(detrend.fit), predict(detrend.fit,newdata = data_test))
electric_load$Load.detrend <- electric_load$Load - electric_load$Trend

# Residual analysis
ar.forecast <- numeric(length(idx_data_test))
for (i in seq(idx_data_test)) {
  ar.fit <- ar(electric_load$Load.detrend[1:(idx_data_test[i] - 1)])
  ar.forecast[i] <- as.numeric(predict(ar.fit)$pred) + electric_load$Trend[idx_data_test[i]]
}

# Aggregation of experts
###########################

X <- cbind(gam.forecast, ar.forecast)
colnames(X) <- c('gam', 'ar')
Y <- data_test$Load

matplot(cbind(Y, X), type = 'l', col = 1:6, ylab = 'Weekly load', xlab = 'Week')


# How good are the expert? Look at the oracles
oracle.convex <- oracle(Y = Y, experts = X, loss.type = 'square', model = 'convex')

if(interactive()){
  plot(oracle.convex)
}

oracle.convex

# Is a single expert the best over time ? Are there breaks ?
oracle.shift <- oracle(Y = Y, experts = X, loss.type = 'percentage', model = 'shifting')
if(interactive()){
  plot(oracle.shift)
}
oracle.shift

# Online aggregation of the experts with BOA
#############################################

# Initialize the aggregation rule
m0.BOA <- mixture(model = 'BOA', loss.type = 'square')

# Perform online prediction using BOA There are 3 equivalent possibilities 1)
# start with an empty model and update the model sequentially
m1.BOA <- m0.BOA
for (i in 1:length(Y)) {
  m1.BOA <- predict(m1.BOA, newexperts = X[i, ], newY = Y[i], quiet = TRUE)
}

# 2) perform online prediction directly from the empty model
m2.BOA <- predict(m0.BOA, newexpert = X, newY = Y, online = TRUE, quiet = TRUE)

# 3) perform the online aggregation directly
m3.BOA <- mixture(Y = Y, experts = X, model = 'BOA', loss.type = 'square', quiet = TRUE)

# These predictions are equivalent:
identical(m1.BOA, m2.BOA)  # TRUE
identical(m1.BOA, m3.BOA)  # TRUE

# Display the results
summary(m3.BOA)
if(interactive()){
  plot(m1.BOA)
}

# Plot options
##################################

# ?plot.mixture

# static or dynamic : dynamic = F/T
plot(m1.BOA, dynamic = FALSE)

# just one plot with custom label ?
# 'plot_weight', 'boxplot_weight', 'dyn_avg_loss', 
# 'cumul_res', 'avg_loss', 'contrib'
if(interactive()){
  plot(m1.BOA, type = "plot_weight", 
       main = "Poids", ylab = "Poids", xlab = "Temps" )
}

# subset rows / time 
plot(m1.BOA, dynamic = FALSE, subset = 1:10)

# plot best n expert
plot(m1.BOA, dynamic = FALSE, max_experts = 1)

# Using d-dimensional time-series
##################################

# Consider the above exemple of electricity consumption 
#  to be predicted every four weeks
YBlock <- seriesToBlock(X = Y, d = 4)
XBlock <- seriesToBlock(X = X, d = 4)

# The four-week-by-four-week predictions can then be obtained 
# by directly using the `mixture` function as we did earlier. 

MLpolBlock <- mixture(Y = YBlock, experts = XBlock, model = "MLpol", loss.type = "square", 
                      quiet = TRUE)


# The predictions can finally be transformed back to a 
# regular one dimensional time-series by using the function `blockToSeries`.

prediction <- blockToSeries(MLpolBlock$prediction)

#### Using the `online = FALSE` option

# Equivalent solution is to use the `online = FALSE` option in the predict function. 
# The latter ensures that the model coefficients are not 
# updated between the next four weeks to forecast.
MLpolBlock <- mixture(model = "BOA", loss.type = "square")
d = 4
n <- length(Y)/d
for (i in 0:(n-1)) { 
  idx <- 4*i + 1:4 # next four weeks to be predicted
  MLpolBlock <- predict(MLpolBlock, newexperts = X[idx, ], newY = Y[idx], online = FALSE, 
                        quiet = TRUE)
}

print(head(MLpolBlock$weights))

Compute oracle predictions

Description

The function oracle performs a strategie that cannot be defined online (in contrast to mixture). It requires in advance the knowledge of the whole data set Y and the expert advice to be well defined. Examples of oracles are the best fixed expert, the best fixed convex combination rule, the best linear combination rule, or the best expert that can shift a few times.

Usage

oracle(
  Y,
  experts,
  model = "convex",
  loss.type = "square",
  awake = NULL,
  lambda = NULL,
  niter = NULL,
  ...
)
oracle(
  Y,
  experts,
  model = "convex",
  loss.type = "square",
  awake = NULL,
  lambda = NULL,
  niter = NULL,
  ...
)

Arguments

`Y`	A vector containing the observations to be predicted.
`experts`	A matrix containing the experts forecasts. Each column corresponds to the predictions proposed by an expert to predict `Y`. It has as many columns as there are experts.
`model`	A character string specifying the oracle to use or a list with a component `name` specifying the oracle and any additional parameter needed. Currently available oracles are: 'expert' The best fixed (constant over time) expert oracle. 'convex' The best fixed convex combination (vector of non-negative weights that sum to 1) 'linear' The best fixed linear combination of expert 'shifting' It computes for all number $m$ of stwitches the sequence of experts with at most $m$ shifts that would have performed the best to predict the sequence of observations in `Y`.
`loss.type`	`character, list or function`. character Name of the loss to be applied ('square', 'absolute', 'percentage', or 'pinball'); list When using pinball loss: list with field name equal to 'pinball' and field tau equal to the required quantile in [0,1]; function A custom loss as a function of two parameters.
`awake`	A matrix specifying the activation coefficients of the experts. Its entries lie in `[0,1]`. Possible if some experts are specialists and do not always form and suggest prediction. If the expert number `k` at instance `t` does not form any prediction of observation `Y_t`, we can put `awake[t,k]=0` so that the mixture does not consider expert `k` in the mixture to predict `Y_t`. Remark that to compute the best expert oracle, the performance of unactive (or partially active) experts is computed by using the prediction of the uniform average of active experts.
`lambda`	A positive number used by the 'linear' oracle only. A possible $L_2$ regularization parameter for computing the linear oracle (if the design matrix is not identifiable)
`niter`	A positive integer for 'convex' and 'linear' oracles if direct computation of the oracle is not implemented. It defines the number of optimization steps to perform in order to approximate the oracle (default value is 3).
`...`	Additional parameters that are passed to `optim` function is order to perform convex optimization (see parameter `niter`).

Value

An object of class 'oracle' that contains:

`loss`	The average loss suffered by the oracle. For the 'shifting' oracle, it is a vector of length `T` where `T` is the number of instance to be predicted (i.e., the length of the sequence `Y`). The value of $loss(m)$ is the loss (determined by the parameter `loss.type`) suffered by the best sequence of expert with at most $m-1$ shifts.
`coefficients`	Not for the 'shifting' oracle. A vector containing the best weight vector corresponding to the oracle.
`prediction`	Not for the 'shifting' oracle. A vector containing the predictions of the oracle.
`rmse`	If loss.type is the square loss (default) only. The root mean square error (i.e., it is the square root of `loss`.

Author(s)

Pierre Gaillard <[email protected]>

Functions to render dynamic mixture graphs using rAmCharts

Description

Functions to render dynamic mixture graphs using rAmCharts

Usage

plot_ridge_weights(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_weights(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

boxplot_weights(
  data,
  colors = NULL,
  max_experts = 50,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_dyn_avg_loss(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_cumul_res(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_avg_loss(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_contrib(
  data,
  colors = NULL,
  alpha = 0.1,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)
plot_ridge_weights(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_weights(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

boxplot_weights(
  data,
  colors = NULL,
  max_experts = 50,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_dyn_avg_loss(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_cumul_res(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_avg_loss(
  data,
  colors = NULL,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

plot_contrib(
  data,
  colors = NULL,
  alpha = 0.1,
  max_experts = 50,
  round = 3,
  xlab = NULL,
  ylab = NULL,
  main = NULL
)

Arguments

`data`	`mixture object`. Displays graphs.
`colors`	`character`. Colors of the lines and bullets.
`max_experts`	`integer`. Maximum number of experts to be displayed (only the more influencial).
`round`	`integer`. Precision of the displayed values.
`xlab`	`character`. Custom x-axis label (individual plot only)
`ylab`	`character`. Custom y-axis label (individual plot only)
`main`	`character`. Custom title (individual plot only)
`alpha`	`numeric`. Smoothing parameter for contribution plot (parameter 'f' of function `lowess`).

Value

a rAmCharts plot

Plot an object of class mixture

Description

provides different diagnostic plots for an aggregation procedure.

Usage

## S3 method for class 'mixture'
plot(
  x,
  pause = FALSE,
  col = NULL,
  alpha = 0.01,
  dynamic = FALSE,
  type = "all",
  max_experts = 50,
  col_by_weight = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  subset = NULL,
  ...
)
## S3 method for class 'mixture'
plot(
  x,
  pause = FALSE,
  col = NULL,
  alpha = 0.01,
  dynamic = FALSE,
  type = "all",
  max_experts = 50,
  col_by_weight = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  subset = NULL,
  ...
)

Arguments

`x`	an object of class mixture. If awake is provided (i.e., some experts are unactive), their residuals and cumulative losses are computed by using the predictions of the mixture.
`pause`	if set to TRUE (default) displays the plots separately, otherwise on a single page
`col`	the color to use to represent each experts, if set to NULL (default) use R`RColorBrewer::brewer.pal(...,"Spectral"`
`alpha`	`numeric`. Smoothing parameter for contribution plot (parameter 'f' of function `lowess`).
`dynamic`	`boolean`. If TRUE, graphs are generated with `rAmCharts`, else with base R.
`type`	`char`. 'all' Display all the graphs ; 'plot_weight', 'boxplot_weight', 'dyn_avg_loss', 'cumul_res', 'avg_loss', 'contrib' Display the selected graph alone.
`max_experts`	`integer`. Maximum number of experts to be displayed (only the more influencial).
`col_by_weight`	`boolean`. If TRUE (default), colors are ordered by weights of each expert, else by column
`xlab`	`character`. Custom x-axis label (individual plot only)
`ylab`	`character`. Custom y-axis label (individual plot only)
`main`	`character`. Custom title (individual plot only)
`subset`	`numeric`. Positive indices for subsetting data before plot.
`...`	additional plotting parameters

Value

plots representing: plot of weights of each expert in function of time, boxplots of these weights, cumulative loss $L_T=\sum_{t=1}^T l_{i,t}$ of each expert in function of time, cumulative residuals $\sum_{t=1}^T (y_t-f_{i,t})$ of each expert's forecast in function of time, average loss suffered by the experts and the contribution of each expert to the aggregation $p_{i,t}f_{i,t}$ in function of time.

Author(s)

Pierre Gaillard <[email protected]>

Yannig Goude <[email protected]>

Plot an aggregation procedure

Description

oracle plot. It has one optional arguments.

Usage

## S3 method for class 'oracle'
plot(x, sort = TRUE, col = NULL, dynamic = FALSE, ...)
## S3 method for class 'oracle'
plot(x, sort = TRUE, col = NULL, dynamic = FALSE, ...)

Arguments

`x`	An object of class `oracle`.
`sort`	if set to TRUE (default), it sorts the experts by performance before the plots.
`col`	colors
`dynamic`	If TRUE, graphs are generated with `rAmCharts`, else with base R.
`...`	additional arguments to function plot.

Functions to render dynamic oracle graphs using rAmCharts

Description

Functions to render dynamic oracle graphs using rAmCharts

Usage

plt_oracle_convex(data, colors, round = 2)
plt_oracle_convex(data, colors, round = 2)

Arguments

`data`	`named vector`. Vector of values to be displayed.
`colors`	`character`. Colors to be used.
`round`	`integer` (2). Precision of the values in the tooltips..

Value

a rAmCharts plot

Predict method for Mixture models

Description

Performs sequential predictions and updates of a mixture object based on new observations and expert advice.

Usage

## S3 method for class 'mixture'
predict(
  object,
  newexperts = NULL,
  newY = NULL,
  awake = NULL,
  online = TRUE,
  type = c("model", "response", "weights", "all"),
  quiet = TRUE,
  ...
)
## S3 method for class 'mixture'
predict(
  object,
  newexperts = NULL,
  newY = NULL,
  awake = NULL,
  online = TRUE,
  type = c("model", "response", "weights", "all"),
  quiet = TRUE,
  ...
)

Arguments

`object`	Object of class inheriting from 'mixture'
`newexperts`	An optional matrix in which to look for expert advice with which predict. If omitted, the past predictions of the object are returned and the object is not updated.
`newY`	An optional matrix with d columns (or vector if $d=1$ ) of observations to be predicted. If provided, it should have the same number of rows as the number of rows of `newexperts`. If omitted, the object (i.e, the aggregation rule) is not updated.
`awake`	An optional array specifying the activation coefficients of the experts. It must have the same dimension as experts. Its entries lie in `[0,1]`. Possible if some experts are specialists and do not always form and suggest prediction. If the expert number `k` at instance `t` does not form any prediction of observation `Y_t`, we can put `awake[t,k]=0` so that the mixture does not consider expert `k` in the mixture to predict `Y_t`.
`online`	A boolean determining if the observations in newY are predicted sequentially (by updating the object step by step) or not. If FALSE, the observations are predicting using the object (without using any past information in newY). If TRUE, newY and newexperts should not be null.
`type`	Type of prediction. It can be model return the updated version of object (using newY and newexperts). response return the forecasts. If type is 'model', forecasts can also be obtained from the last values of object$prediction. weights return the weights assigned to the expert advice to produce the forecasts. If type is 'model', forecasts can also be obtained from the last rows of object$weights. all return a list containing 'model', 'response', and 'weights'.
`quiet`	`boolean`. Whether or not to display progress bars.
`...`	further arguments are ignored

Value

predict.mixture produces a matrix of predictions (type = 'response'), an updated object (type = 'model'), or a matrix of weights (type = 'weights').

Convert a 1-dimensional series to blocks

Description

The functions seriesToBlock and blockToSeries convert 1-dimensional series into series of higher dimension. For instance, suppose you have a time-series that consists of $T = 100$ days of $d = 24$ hours. The function seriesToBlock converts the time-series X of $Td = 2400$ observations into a matrix of size c(T=100,d =24), where each line corresponds to a specific day. This function is usefull if you need to perform the prediction day by day, instead of hour by hour. The function can also be used to convert a matrix of expert prediction of dimension c(dT,K) where K is the number of experts, into an array of dimension c(T,d,K). The new arrays of observations and of expert predictions can be given to the aggregation rule procedure to perform d-dimensional predictions (i.e., day predictions).

Usage

seriesToBlock(X, d)

blockToSeries(X)
seriesToBlock(X, d)

blockToSeries(X)

Arguments

`X`	An array or a vector to be converted.
`d`	A positive integer defining the block size.

Details

The function blockToSeries performs the inverse operation.

Package 'opera'

Help Index

Function to check validy of provided loss function

Description

Usage

Arguments

Value

Function to check and modify the input class and type

Description

Usage

Arguments

Value

Electricity forecasting data set

Description

Usage

Format

Examples

Implementation of FTRL (Follow The Regularized Leader)

Description

Usage

Arguments

Value

References

Errors suffered by a sequence of predictions

Description

Usage

Arguments

Value

Author(s)

Compute an aggregation rule

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Compute oracle predictions

Description

Usage

Arguments

Value

Author(s)

Functions to render dynamic mixture graphs using rAmCharts

Description

Usage

Arguments

Value

Plot an object of class mixture

Description

Usage

Arguments

Value

Author(s)

See Also

Plot an aggregation procedure

Description

Usage

Arguments

Functions to render dynamic oracle graphs using rAmCharts

Description

Usage

Arguments

Value

Predict method for Mixture models

Description

Usage

Arguments

Value

Convert a 1-dimensional series to blocks

Description

Usage

Arguments

Details