Skip to contents

Builds and trains a semi-supervised classifier using the Expectation-Maximization (EM) algorithm.

Usage

EM(
  formula,
  data,
  naive = FALSE,
  prior = "proportional",
  var_eps = 0.01,
  fixed_labels = NULL,
  maxiter = 20,
  verbose = TRUE
)

Arguments

formula

a formula object explaining the functional relation between input variables and target

data

a data.frame containing the dataset

naive

whether a Bayes or Naive Bayes classifier should be used

prior

type of class prior; either "uniform" (all classes are equally weighted) or "proportional" (all classes are weighted proportionally to their numbers of samples in the training data)

var_eps

scalar to add to the main diagonal of the covariance matrix to assure numerical stability. If 0, no scalar will be added.

fixed_labels

indices of rows in data, which have fixed labels that must not be changed during training

maxiter

maximum number of iterations in EM algorithm

verbose

whether outputs should be shown or suppressed

Value

a ´BayesClassifier´ object, see BayesClassifier

Details

Given a semi-supervised setup with labeled and unlabeled training samples, the function returns a BayesClassifier comprising all classes covered by the labeled training data. All classes are assumed to be known and represented in the labeled training dataset. Labeled training data are fixed (specified as ´fixed_labels´) and do not change during the algorithm. Unlabeled data are given with 'NA' as class label. Training is performed using the Expectation-Maximization (EM) algorithm, which comprises two alternating steps: (a) estimating the model parameters of a BayesClassifier model from the current labeling, and (b) updating the labels by predicting from the current BayesClassifier model. The algorithm terminates, if either a maximum number of iterations ´maxiter´ is reached, or if the same labels are predicted in two consecutive iterations.

See also

SSC for semi-supervised classification with known and unknown classes, BayesClassifier for supervised classification