Skip to contents

Train a BayesClassifier object

Usage

BayesClassifier(
  formula,
  data,
  naive = FALSE,
  prior = "proportional",
  var_eps = 0.01
)

Arguments

formula

a formula object explaining the functional relation between input variables and target

data

a data.frame containing the dataset

naive

whether a Bayes or Naive Bayes classifier should be used

prior

type of class prior; either "uniform" (all classes are equally weighted) or "proportional" (all classes are weighted proportionally to their numbers of samples in the training data)

var_eps

scalar to add to the main diagonal of the covariance matrix to assure numerical stability. If 0, no scalar will be added.

Value

an S3 object of class ´BayesClassifier´, which has the following internal structure:

  • ´param´: a list of model parameters; each list element represents one class, and contains a vector mu (class mean), a matrix Sigma (class covariance), and a scalar prior (prior probability)

  • ´prior´: the type of prior model used to call the BayesClassifier

  • ´formula´: the formula used to call the BayesClassifier

  • ´naive´: whether the model contains a Naive Bayes or a full Bayes classifier

  • ´n´: the number of samples during training

  • ´all.features´: the vector of all feature names in the training data

  • ´logLik´: the value of the log-likelihood

Details

The function trains a Bayes classifier on a given classification dataset, specified by a data.frame ´data´ and a formula ´formula´. The specified target variable must be a factor. The function estimates class-wise mean values (´mu´), covariance matrices (´Sigma´) and prior probabilities (´prior´) to represent classes. If a Naive Bayes classifier is selected, only a diagonal covariance matrix is estimated. Prior options indicate whether a uniform prior (all classes are equally weighted), or a proportional prior (all classes are weighted by their proportions in the training data) should be used.

See also

SSC for semi-supervised classification with known and unknown classes, EM for semi-supervised classification with known classes only

Examples

 set.seed(1)
 dat <- data.frame(y = rep(c(1,2,3), each = 5),
    x1 = as.vector(sapply(c(5,0,0), rnorm, n = 5)),
    x2 = as.vector(sapply(c(0,0,5), rnorm, n = 5)))
 mod <- BayesClassifier(y ~ ., dat, naive = TRUE)
 summary(mod)
#> BayesClassifier model with 3 classes and 2 non-constant features
#> Note: the total number of features is 3 
#> ==============================
#> formula:  y ~ . 
#> used features:  x1, x2 
#> parameters: 
#> 
#> 
#> |mu        |Sigma         | prior|
#> |:---------|:-------------|-----:|
#> |5.13,0.46 |0.93,0,0,0.23 |  0.33|
#> |0.14,0.08 |0.46,0,0,1.45 |  0.33|
#> |0.04,4.65 |2.26,0,0,0.51 |  0.33|
 predict(mod, newdata = expand.grid(x1 = c(0,5), x2 = c(0,5), y = NA), type = "class")
#> [1] "2" "1" "3" "3"