Title: | Online Prediction by Expert Aggregation |
---|---|
Description: | Misc methods to form online predictions, for regression-oriented time-series, by combining a finite set of forecasts provided by the user. See Cesa-Bianchi and Lugosi (2006) <doi:10.1017/CBO9780511546921> for an overview. |
Authors: | Pierre Gaillard [cre, aut], Yannig Goude [aut], Laurent Plagne [ctb], Thibaut Dubois [ctb], Benoit Thieurmel [ctb] |
Maintainer: | Pierre Gaillard <[email protected]> |
License: | LGPL |
Version: | 1.2.2 |
Built: | 2025-03-07 05:36:20 UTC |
Source: | https://github.com/dralliag/opera |
Function to check validy of provided loss function
check_loss(loss.type, loss.gradient, Y = NULL, model = NULL)
check_loss(loss.type, loss.gradient, Y = NULL, model = NULL)
loss.type |
|
loss.gradient |
|
Y |
|
model |
|
loss.type
Function to check and modify the input class and type
check_matrix(mat, name)
check_matrix(mat, name)
mat |
|
name |
|
a 3d array if a 3d array is provided, else a matrix.
Electricity forecasting data set provided by EDF R&D. It contains weekly measurements of the total electricity consumption in France from 1996 to 2009, together with several covariates, including temperature, industrial production indices (source: INSEE) and calendar information.
data(electric_load)
data(electric_load)
An object of class data.frame
with 731 rows and 11 columns.
data(electric_load) # a few graphs to display the data attach(electric_load) plot(Load, type = 'l') plot(Temp, Load, pch = 16, cex = 0.5) plot(NumWeek, Load, pch = 16, cex = 0.5) plot(Load, Load1, pch = 16, cex = 0.5) acf(Load, lag.max = 20) detach(electric_load)
data(electric_load) # a few graphs to display the data attach(electric_load) plot(Load, type = 'l') plot(Temp, Load, pch = 16, cex = 0.5) plot(NumWeek, Load, pch = 16, cex = 0.5) plot(Load, Load1, pch = 16, cex = 0.5) acf(Load, lag.max = 20) detach(electric_load)
FTRL (Shalev-Shwartz and Singer 2007) and Chap. 5 of (Hazan 2019) is the online counterpart of empirical risk minimization. It is a family of aggregation rules (including OGD) that uses at any time the empirical risk minimizer so far with an additional regularization. The online optimization can be performed on any bounded convex set that can be expressed with equality or inequality constraints. Note that this method is still under development and a beta version.
FTRL( y, experts, eta = NULL, fun_reg = NULL, fun_reg_grad = NULL, constr_eq = NULL, constr_eq_jac = NULL, constr_ineq = NULL, constr_ineq_jac = NULL, loss.type = list(name = "square"), loss.gradient = TRUE, w0 = NULL, max_iter = 50, obj_tol = 0.01, training = NULL, default = FALSE, quiet = TRUE )
FTRL( y, experts, eta = NULL, fun_reg = NULL, fun_reg_grad = NULL, constr_eq = NULL, constr_eq_jac = NULL, constr_ineq = NULL, constr_ineq_jac = NULL, loss.type = list(name = "square"), loss.gradient = TRUE, w0 = NULL, max_iter = 50, obj_tol = 0.01, training = NULL, default = FALSE, quiet = TRUE )
y |
|
experts |
|
eta |
|
fun_reg |
|
fun_reg_grad |
|
constr_eq |
|
constr_eq_jac |
|
constr_ineq |
|
constr_ineq_jac |
|
loss.type |
|
loss.gradient |
|
w0 |
|
max_iter |
|
obj_tol |
|
training |
|
default |
|
quiet |
|
object of class mixture.
Hazan E (2019).
“Introduction to online convex optimization.”
arXiv preprint arXiv:1909.05207.
Shalev-Shwartz S, Singer Y (2007).
“A primal-dual perspective of online learning algorithms.”
Machine Learning, 69(2), 115–142.
The function loss
computes the sequence of instantaneous losses suffered
by the predictions in x
to predict the observation in y
.
loss( x, y, pred = NULL, loss.type = list(name = "square"), loss.gradient = FALSE )
loss( x, y, pred = NULL, loss.type = list(name = "square"), loss.gradient = FALSE )
x |
|
y |
|
pred |
|
loss.type |
|
loss.gradient |
|
A vector of length T
containing the sequence of
instantaneous losses suffered by the expert predictions (x) or the gradient computed on the aggregated predictions (pred).
Pierre Gaillard <[email protected]>
The function mixture
builds an
aggregation rule chosen by the user.
It can then be used to predict new observations Y sequentially.
If observations Y
and expert advice experts
are provided,
mixture
is trained by predicting the observations in Y
sequentially with the help of the expert advice in experts
.
At each time instance , the mixture forms a prediction of
Y[t,]
by assigning
a weight to each expert and by combining the expert advice.
mixture( Y = NULL, experts = NULL, model = "MLpol", loss.type = "square", loss.gradient = TRUE, coefficients = "Uniform", awake = NULL, parameters = list(), quiet = TRUE, ... ) ## S3 method for class 'mixture' print(x, ...) ## S3 method for class 'mixture' summary(object, ...)
mixture( Y = NULL, experts = NULL, model = "MLpol", loss.type = "square", loss.gradient = TRUE, coefficients = "Uniform", awake = NULL, parameters = list(), quiet = TRUE, ... ) ## S3 method for class 'mixture' print(x, ...) ## S3 method for class 'mixture' summary(object, ...)
Y |
A matrix with T rows and d columns. Each row |
experts |
An array of dimension |
model |
A character string specifying the aggregation rule to use. Currently available aggregation rules are:
|
loss.type |
|
loss.gradient |
|
coefficients |
A probability vector of length K containing the prior weights of the experts (not possible for 'MLpol'). The weights must be non-negative and sum to 1. |
awake |
A matrix specifying the
activation coefficients of the experts. Its entries lie in |
parameters |
A list that contains optional parameters for the aggregation rule. If no parameters are provided, the aggregation rule is fully calibrated online. Possible parameters are:
|
quiet |
|
... |
Additional parameters |
x |
An object of class mixture |
object |
An object of class mixture |
An object of class mixture that can be used to perform new predictions.
It contains the parameters model
, loss.type
, loss.gradient
,
experts
, Y
, awake
, and the fields
coefficients |
A vector of coefficients assigned to each expert to perform the next prediction. |
weights |
A matrix of dimension |
prediction |
A matrix with |
loss |
The average loss (as stated by parameter |
parameters |
The learning parameters chosen by the aggregation rule or by the user. |
training |
A list that contains useful temporary information of the aggregation rule to be updated and to perform predictions. |
Pierre Gaillard <[email protected]> Yannig Goude <[email protected]>
Cesa-Bianchi N, Lugosi G (2006).
Prediction, learning, and games.
Cambridge university press.
Gaillard P, Stoltz G, van Erven T (2014).
“A Second-order Bound with Excess Losses.”
In Proceedings of COLT'14, volume 35, 176–196.
Hazan E (2019).
“Introduction to online convex optimization.”
arXiv preprint arXiv:1909.05207.
Shalev-Shwartz S, Singer Y (2007).
“A primal-dual perspective of online learning algorithms.”
Machine Learning, 69(2), 115–142.
Wintenberger O (2017).
“Optimal learning with Bernstein online aggregation.”
Machine Learning, 106(1), 119–141.
Zinkevich M (2003).
“Online convex programming and generalized infinitesimal gradient ascent.”
In Proceedings of the 20th international conference on machine learning (icml-03), 928–936.
See opera-package
and opera-vignette for a brief example about how to use the package.
library('opera') # load the package set.seed(1) # Example: find the best one week ahead forecasting strategy (weekly data) # packages library(mgcv) # import data data(electric_load) idx_data_test <- 620:nrow(electric_load) data_train <- electric_load[-idx_data_test, ] data_test <- electric_load[idx_data_test, ] # Build the expert forecasts # ########################## # 1) A generalized additive model gam.fit <- gam(Load ~ s(IPI) + s(Temp) + s(Time, k=3) + s(Load1) + as.factor(NumWeek), data = data_train) gam.forecast <- predict(gam.fit, newdata = data_test) # 2) An online autoregressive model on the residuals of a medium term model # Medium term model to remove trend and seasonality (using generalized additive model) detrend.fit <- gam(Load ~ s(Time,k=3) + s(NumWeek) + s(Temp) + s(IPI), data = data_train) electric_load$Trend <- c(predict(detrend.fit), predict(detrend.fit,newdata = data_test)) electric_load$Load.detrend <- electric_load$Load - electric_load$Trend # Residual analysis ar.forecast <- numeric(length(idx_data_test)) for (i in seq(idx_data_test)) { ar.fit <- ar(electric_load$Load.detrend[1:(idx_data_test[i] - 1)]) ar.forecast[i] <- as.numeric(predict(ar.fit)$pred) + electric_load$Trend[idx_data_test[i]] } # Aggregation of experts ########################### X <- cbind(gam.forecast, ar.forecast) colnames(X) <- c('gam', 'ar') Y <- data_test$Load matplot(cbind(Y, X), type = 'l', col = 1:6, ylab = 'Weekly load', xlab = 'Week') # How good are the expert? Look at the oracles oracle.convex <- oracle(Y = Y, experts = X, loss.type = 'square', model = 'convex') if(interactive()){ plot(oracle.convex) } oracle.convex # Is a single expert the best over time ? Are there breaks ? oracle.shift <- oracle(Y = Y, experts = X, loss.type = 'percentage', model = 'shifting') if(interactive()){ plot(oracle.shift) } oracle.shift # Online aggregation of the experts with BOA ############################################# # Initialize the aggregation rule m0.BOA <- mixture(model = 'BOA', loss.type = 'square') # Perform online prediction using BOA There are 3 equivalent possibilities 1) # start with an empty model and update the model sequentially m1.BOA <- m0.BOA for (i in 1:length(Y)) { m1.BOA <- predict(m1.BOA, newexperts = X[i, ], newY = Y[i], quiet = TRUE) } # 2) perform online prediction directly from the empty model m2.BOA <- predict(m0.BOA, newexpert = X, newY = Y, online = TRUE, quiet = TRUE) # 3) perform the online aggregation directly m3.BOA <- mixture(Y = Y, experts = X, model = 'BOA', loss.type = 'square', quiet = TRUE) # These predictions are equivalent: identical(m1.BOA, m2.BOA) # TRUE identical(m1.BOA, m3.BOA) # TRUE # Display the results summary(m3.BOA) if(interactive()){ plot(m1.BOA) } # Plot options ################################## # ?plot.mixture # static or dynamic : dynamic = F/T plot(m1.BOA, dynamic = FALSE) # just one plot with custom label ? # 'plot_weight', 'boxplot_weight', 'dyn_avg_loss', # 'cumul_res', 'avg_loss', 'contrib' if(interactive()){ plot(m1.BOA, type = "plot_weight", main = "Poids", ylab = "Poids", xlab = "Temps" ) } # subset rows / time plot(m1.BOA, dynamic = FALSE, subset = 1:10) # plot best n expert plot(m1.BOA, dynamic = FALSE, max_experts = 1) # Using d-dimensional time-series ################################## # Consider the above exemple of electricity consumption # to be predicted every four weeks YBlock <- seriesToBlock(X = Y, d = 4) XBlock <- seriesToBlock(X = X, d = 4) # The four-week-by-four-week predictions can then be obtained # by directly using the `mixture` function as we did earlier. MLpolBlock <- mixture(Y = YBlock, experts = XBlock, model = "MLpol", loss.type = "square", quiet = TRUE) # The predictions can finally be transformed back to a # regular one dimensional time-series by using the function `blockToSeries`. prediction <- blockToSeries(MLpolBlock$prediction) #### Using the `online = FALSE` option # Equivalent solution is to use the `online = FALSE` option in the predict function. # The latter ensures that the model coefficients are not # updated between the next four weeks to forecast. MLpolBlock <- mixture(model = "BOA", loss.type = "square") d = 4 n <- length(Y)/d for (i in 0:(n-1)) { idx <- 4*i + 1:4 # next four weeks to be predicted MLpolBlock <- predict(MLpolBlock, newexperts = X[idx, ], newY = Y[idx], online = FALSE, quiet = TRUE) } print(head(MLpolBlock$weights))
library('opera') # load the package set.seed(1) # Example: find the best one week ahead forecasting strategy (weekly data) # packages library(mgcv) # import data data(electric_load) idx_data_test <- 620:nrow(electric_load) data_train <- electric_load[-idx_data_test, ] data_test <- electric_load[idx_data_test, ] # Build the expert forecasts # ########################## # 1) A generalized additive model gam.fit <- gam(Load ~ s(IPI) + s(Temp) + s(Time, k=3) + s(Load1) + as.factor(NumWeek), data = data_train) gam.forecast <- predict(gam.fit, newdata = data_test) # 2) An online autoregressive model on the residuals of a medium term model # Medium term model to remove trend and seasonality (using generalized additive model) detrend.fit <- gam(Load ~ s(Time,k=3) + s(NumWeek) + s(Temp) + s(IPI), data = data_train) electric_load$Trend <- c(predict(detrend.fit), predict(detrend.fit,newdata = data_test)) electric_load$Load.detrend <- electric_load$Load - electric_load$Trend # Residual analysis ar.forecast <- numeric(length(idx_data_test)) for (i in seq(idx_data_test)) { ar.fit <- ar(electric_load$Load.detrend[1:(idx_data_test[i] - 1)]) ar.forecast[i] <- as.numeric(predict(ar.fit)$pred) + electric_load$Trend[idx_data_test[i]] } # Aggregation of experts ########################### X <- cbind(gam.forecast, ar.forecast) colnames(X) <- c('gam', 'ar') Y <- data_test$Load matplot(cbind(Y, X), type = 'l', col = 1:6, ylab = 'Weekly load', xlab = 'Week') # How good are the expert? Look at the oracles oracle.convex <- oracle(Y = Y, experts = X, loss.type = 'square', model = 'convex') if(interactive()){ plot(oracle.convex) } oracle.convex # Is a single expert the best over time ? Are there breaks ? oracle.shift <- oracle(Y = Y, experts = X, loss.type = 'percentage', model = 'shifting') if(interactive()){ plot(oracle.shift) } oracle.shift # Online aggregation of the experts with BOA ############################################# # Initialize the aggregation rule m0.BOA <- mixture(model = 'BOA', loss.type = 'square') # Perform online prediction using BOA There are 3 equivalent possibilities 1) # start with an empty model and update the model sequentially m1.BOA <- m0.BOA for (i in 1:length(Y)) { m1.BOA <- predict(m1.BOA, newexperts = X[i, ], newY = Y[i], quiet = TRUE) } # 2) perform online prediction directly from the empty model m2.BOA <- predict(m0.BOA, newexpert = X, newY = Y, online = TRUE, quiet = TRUE) # 3) perform the online aggregation directly m3.BOA <- mixture(Y = Y, experts = X, model = 'BOA', loss.type = 'square', quiet = TRUE) # These predictions are equivalent: identical(m1.BOA, m2.BOA) # TRUE identical(m1.BOA, m3.BOA) # TRUE # Display the results summary(m3.BOA) if(interactive()){ plot(m1.BOA) } # Plot options ################################## # ?plot.mixture # static or dynamic : dynamic = F/T plot(m1.BOA, dynamic = FALSE) # just one plot with custom label ? # 'plot_weight', 'boxplot_weight', 'dyn_avg_loss', # 'cumul_res', 'avg_loss', 'contrib' if(interactive()){ plot(m1.BOA, type = "plot_weight", main = "Poids", ylab = "Poids", xlab = "Temps" ) } # subset rows / time plot(m1.BOA, dynamic = FALSE, subset = 1:10) # plot best n expert plot(m1.BOA, dynamic = FALSE, max_experts = 1) # Using d-dimensional time-series ################################## # Consider the above exemple of electricity consumption # to be predicted every four weeks YBlock <- seriesToBlock(X = Y, d = 4) XBlock <- seriesToBlock(X = X, d = 4) # The four-week-by-four-week predictions can then be obtained # by directly using the `mixture` function as we did earlier. MLpolBlock <- mixture(Y = YBlock, experts = XBlock, model = "MLpol", loss.type = "square", quiet = TRUE) # The predictions can finally be transformed back to a # regular one dimensional time-series by using the function `blockToSeries`. prediction <- blockToSeries(MLpolBlock$prediction) #### Using the `online = FALSE` option # Equivalent solution is to use the `online = FALSE` option in the predict function. # The latter ensures that the model coefficients are not # updated between the next four weeks to forecast. MLpolBlock <- mixture(model = "BOA", loss.type = "square") d = 4 n <- length(Y)/d for (i in 0:(n-1)) { idx <- 4*i + 1:4 # next four weeks to be predicted MLpolBlock <- predict(MLpolBlock, newexperts = X[idx, ], newY = Y[idx], online = FALSE, quiet = TRUE) } print(head(MLpolBlock$weights))
The function oracle
performs a strategie that cannot be defined online
(in contrast to mixture). It requires in advance the knowledge of the whole
data set Y
and the expert advice to be well defined.
Examples of oracles are the best fixed expert, the best fixed convex
combination rule, the best linear combination rule, or the best expert
that can shift a few times.
oracle( Y, experts, model = "convex", loss.type = "square", awake = NULL, lambda = NULL, niter = NULL, ... )
oracle( Y, experts, model = "convex", loss.type = "square", awake = NULL, lambda = NULL, niter = NULL, ... )
Y |
A vector containing the observations to be predicted. |
experts |
A matrix containing the experts
forecasts. Each column corresponds to the predictions proposed by an expert
to predict |
model |
A character string specifying the oracle to use or a list with a component
|
loss.type |
|
awake |
A matrix specifying the
activation coefficients of the experts. Its entries lie in |
lambda |
A positive number used by the 'linear' oracle only. A possible $L_2$ regularization parameter for computing the linear oracle (if the design matrix is not identifiable) |
niter |
A positive integer for 'convex' and 'linear' oracles if direct computation of the oracle is not implemented. It defines the number of optimization steps to perform in order to approximate the oracle (default value is 3). |
... |
Additional parameters
that are passed to |
An object of class 'oracle' that contains:
loss |
The average loss suffered by the oracle. For the 'shifting' oracle,
it is a vector of length |
coefficients |
Not for the 'shifting' oracle. A vector containing the best weight vector corresponding to the oracle. |
prediction |
Not for the 'shifting' oracle. A vector containing the predictions of the oracle. |
rmse |
If loss.type is the square loss (default) only.
The root mean square error (i.e., it is the square root of |
Pierre Gaillard <[email protected]>
Functions to render dynamic mixture graphs using rAmCharts
plot_ridge_weights( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_weights( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) boxplot_weights( data, colors = NULL, max_experts = 50, xlab = NULL, ylab = NULL, main = NULL ) plot_dyn_avg_loss( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_cumul_res( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_avg_loss( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_contrib( data, colors = NULL, alpha = 0.1, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL )
plot_ridge_weights( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_weights( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) boxplot_weights( data, colors = NULL, max_experts = 50, xlab = NULL, ylab = NULL, main = NULL ) plot_dyn_avg_loss( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_cumul_res( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_avg_loss( data, colors = NULL, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL ) plot_contrib( data, colors = NULL, alpha = 0.1, max_experts = 50, round = 3, xlab = NULL, ylab = NULL, main = NULL )
data |
|
colors |
|
max_experts |
|
round |
|
xlab |
|
ylab |
|
main |
|
alpha |
|
a rAmCharts
plot
provides different diagnostic plots for an aggregation procedure.
## S3 method for class 'mixture' plot( x, pause = FALSE, col = NULL, alpha = 0.01, dynamic = FALSE, type = "all", max_experts = 50, col_by_weight = TRUE, xlab = NULL, ylab = NULL, main = NULL, subset = NULL, ... )
## S3 method for class 'mixture' plot( x, pause = FALSE, col = NULL, alpha = 0.01, dynamic = FALSE, type = "all", max_experts = 50, col_by_weight = TRUE, xlab = NULL, ylab = NULL, main = NULL, subset = NULL, ... )
x |
an object of class mixture. If awake is provided (i.e., some experts are unactive), their residuals and cumulative losses are computed by using the predictions of the mixture. |
pause |
if set to TRUE (default) displays the plots separately, otherwise on a single page |
col |
the color to use to represent each experts, if set to NULL (default) use R |
alpha |
|
dynamic |
|
type |
|
max_experts |
|
col_by_weight |
|
xlab |
|
ylab |
|
main |
|
subset |
|
... |
additional plotting parameters |
plots representing: plot of weights of each expert in function of time, boxplots of these weights,
cumulative loss of each expert in function of time, cumulative residuals
of each
expert's forecast in function of time, average loss suffered by the experts and the contribution of each expert to the aggregation
in function of time.
Pierre Gaillard <[email protected]>
Yannig Goude <[email protected]>
See opera-package
and opera-vignette for a brief example about how to use the package.
oracle plot
. It has one optional arguments.
## S3 method for class 'oracle' plot(x, sort = TRUE, col = NULL, dynamic = FALSE, ...)
## S3 method for class 'oracle' plot(x, sort = TRUE, col = NULL, dynamic = FALSE, ...)
x |
An object of class |
sort |
if set to TRUE (default), it sorts the experts by performance before the plots. |
col |
colors |
dynamic |
If TRUE, graphs are generated with |
... |
additional arguments to function plot. |
Functions to render dynamic oracle graphs using rAmCharts
plt_oracle_convex(data, colors, round = 2)
plt_oracle_convex(data, colors, round = 2)
data |
|
colors |
|
round |
|
a rAmCharts plot
Performs sequential predictions and updates of a mixture object based on new observations and expert advice.
## S3 method for class 'mixture' predict( object, newexperts = NULL, newY = NULL, awake = NULL, online = TRUE, type = c("model", "response", "weights", "all"), quiet = TRUE, ... )
## S3 method for class 'mixture' predict( object, newexperts = NULL, newY = NULL, awake = NULL, online = TRUE, type = c("model", "response", "weights", "all"), quiet = TRUE, ... )
object |
Object of class inheriting from 'mixture' |
newexperts |
An optional matrix in which to look for expert advice with which predict. If omitted, the past predictions of the object are returned and the object is not updated. |
newY |
An optional matrix with d columns (or vector if |
awake |
An optional array specifying the
activation coefficients of the experts. It must have the same dimension as experts. Its entries lie in |
online |
A boolean determining if the observations in newY are predicted sequentially (by updating the object step by step) or not. If FALSE, the observations are predicting using the object (without using any past information in newY). If TRUE, newY and newexperts should not be null. |
type |
Type of prediction. It can be
|
quiet |
|
... |
further arguments are ignored |
predict.mixture
produces a matrix of predictions
(type = 'response'), an updated object (type = 'model'), or a matrix of
weights (type = 'weights').
The functions seriesToBlock
and blockToSeries
convert 1-dimensional series into series of higher dimension.
For instance, suppose you have a time-series that consists of days of
hours.
The function seriesToBlock converts the time-series X of
observations into a matrix of size
c(T=100,d =24)
,
where each line corresponds to a specific day. This function is usefull if you need to perform the prediction day by day, instead of hour by hour.
The function can also be used to convert a matrix of expert prediction of dimension c(dT,K)
where K is the number of experts,
into an array of dimension c(T,d,K)
. The new arrays of observations and of expert predictions can be
given to the aggregation rule procedure to perform d
-dimensional predictions (i.e., day predictions).
seriesToBlock(X, d) blockToSeries(X)
seriesToBlock(X, d) blockToSeries(X)
X |
An array or a vector to be converted. |
d |
A positive integer defining the block size. |
The function blockToSeries performs the inverse operation.