Skip to contents

Trains a semi-supervised classifier with awareness of unknown classes, as described in (Schrunner et al. 2020) .

Usage

SSC(
  formula,
  data,
  perc_spies = 0.05,
  naive = FALSE,
  prior = "proportional",
  var_eps = 0.01,
  max_unknown = 6,
  fixed_unknown = NULL,
  runs = 10
)

Arguments

formula

a formula object explaining the functional relation between input variables and target

data

a data.frame containing the dataset

perc_spies

percentage of unlabeled points sampled as spies

naive

whether a Bayes or Naive Bayes classifier should be used

prior

type of class prior; either "uniform" (all classes are equally weighted) or "proportional" (all classes are weighted proportionally to their numbers of samples in the training data)

var_eps

scalar to add to the main diagonal of the covariance matrix to assure numerical stability. If 0, no scalar will be added.

max_unknown

maximum number of unknown classes (when number of unknown classes is determined automatically)

fixed_unknown

fixed number of unknown classes (when number of unknown classes is specified manually)

runs

number of bootstrap runs for selecting "likely unknowns"

Value

a ´BayesClassifier´ object, see BayesClassifier

Details

Given a semi-supervised setup with labeled and unlabeled training data, as well as known and unknown classes (classes represented in the labeled and unlabeled training data or in unlabeled training data only, respectively), a Bayes classifier shall be trained. The algorithm is described in (Schrunner et al. 2020) .

References

Schrunner S, Geiger BC, Zernig A, Kern R (2020). “A generative semi-supervised classifier for datasets with unknown classes.” In Proceedings of the 35th Annual ACM Symposium on Applied Computing. doi:10.1145/3341105.3373890 .

See also

EM for semi-supervised classification with known classes only, BayesClassifier for supervised classification