Train an SSC-UC model
SSC.Rd
Trains a semi-supervised classifier with awareness of unknown classes, as described in (Schrunner et al. 2020) .
Usage
SSC(
formula,
data,
perc_spies = 0.05,
naive = FALSE,
prior = "proportional",
var_eps = 0.01,
max_unknown = 6,
fixed_unknown = NULL,
runs = 10
)
Arguments
- formula
a formula object explaining the functional relation between input variables and target
- data
a data.frame containing the dataset
- perc_spies
percentage of unlabeled points sampled as spies
- naive
whether a Bayes or Naive Bayes classifier should be used
- prior
type of class prior; either "uniform" (all classes are equally weighted) or "proportional" (all classes are weighted proportionally to their numbers of samples in the training data)
- var_eps
scalar to add to the main diagonal of the covariance matrix to assure numerical stability. If 0, no scalar will be added.
- max_unknown
maximum number of unknown classes (when number of unknown classes is determined automatically)
- fixed_unknown
fixed number of unknown classes (when number of unknown classes is specified manually)
- runs
number of bootstrap runs for selecting "likely unknowns"
Value
a ´BayesClassifier´ object, see BayesClassifier
Details
Given a semi-supervised setup with labeled and unlabeled training data, as well as known and unknown classes (classes represented in the labeled and unlabeled training data or in unlabeled training data only, respectively), a Bayes classifier shall be trained. The algorithm is described in (Schrunner et al. 2020) .
References
Schrunner S, Geiger BC, Zernig A, Kern R (2020). “A generative semi-supervised classifier for datasets with unknown classes.” In Proceedings of the 35th Annual ACM Symposium on Applied Computing. doi:10.1145/3341105.3373890 .
See also
EM for semi-supervised classification with known classes only, BayesClassifier for supervised classification