Train a BayesClassifier object


  naive = FALSE,
  prior = "proportional",
  var_eps = 0.01



a formula object explaining the functional relation between input variables and target


a data.frame containing the dataset


whether a Bayes or Naive Bayes classifier should be used


type of class prior; either "uniform" (all classes are equally weighted) or "proportional" (all classes are weighted proportionally to their numbers of samples in the training data)


scalar to add to the main diagonal of the covariance matrix to assure numerical stability. If 0, no scalar will be added.


an S3 object of class ´BayesClassifier´, which has the following internal structure:

  • ´param´: a list of model parameters; each list element represents one class, and contains a vector mu (class mean), a matrix Sigma (class covariance), and a scalar prior (prior probability)

  • ´prior´: the type of prior model used to call the BayesClassifier

  • ´formula´: the formula used to call the BayesClassifier

  • ´naive´: whether the model contains a Naive Bayes or a full Bayes classifier

  • ´n´: the number of samples during training

  • ´all.features´: the vector of all feature names in the training data

  • ´logLik´: the value of the log-likelihood


The function trains a Bayes classifier on a given classification dataset, specified by a data.frame ´data´ and a formula ´formula´. The specified target variable must be a factor. The function estimates class-wise mean values (´mu´), covariance matrices (´Sigma´) and prior probabilities (´prior´) to represent classes. If a Naive Bayes classifier is selected, only a diagonal covariance matrix is estimated. Prior options indicate whether a uniform prior (all classes are equally weighted), or a proportional prior (all classes are weighted by their proportions in the training data) should be used.

See also

SSC for semi-supervised classification with known and unknown classes, EM for semi-supervised classification with known classes only


 dat <- data.frame(y = rep(c(1,2,3), each = 5),
    x1 = as.vector(sapply(c(5,0,0), rnorm, n = 5)),
    x2 = as.vector(sapply(c(0,0,5), rnorm, n = 5)))
 mod <- BayesClassifier(y ~ ., dat, naive = TRUE)
#> BayesClassifier model with 3 classes and 2 non-constant features
#> Note: the total number of features is 3 
#> ==============================
#> formula:  y ~ . 
#> used features:  x1, x2 
#> parameters: 
#> |mu        |Sigma         | prior|
#> |:---------|:-------------|-----:|
#> |5.13,0.46 |0.93,0,0,0.23 |  0.33|
#> |0.14,0.08 |0.46,0,0,1.45 |  0.33|
#> |0.04,4.65 |2.26,0,0,0.51 |  0.33|
 predict(mod, newdata = expand.grid(x1 = c(0,5), x2 = c(0,5), y = NA), type = "class")
#> [1] "2" "1" "3" "3"