Class NaiveBayesMultinomial

  • All Implemented Interfaces:
    Configurable, Serializable, CapabilitiesHandler, Classifier, MultiClassClassifier, AWTRenderable, Learner<Example<Instance>>, MOAObject, OptionHandler

    public class NaiveBayesMultinomial
    extends AbstractClassifier
    implements MultiClassClassifier
    Class for building and using a multinomial Naive Bayes classifier. Performs text classic bayesian prediction while making naive assumption that all inputs are independent. For more information see,

    Andrew Mccallum, Kamal Nigam: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI-98 Workshop on 'Learning for Text Categorization', 1998.

    The core equation for this classifier:

    P[Ci|D] = (P[D|Ci] x P[Ci]) / P[D] (Bayes rule)

    where Ci is class i and D is a document.

    Incremental version of the algorithm.

    * BibTeX:

     @inproceedings{Mccallum1998,
        author = {Andrew Mccallum and Kamal Nigam},
        booktitle = {AAAI-98 Workshop on 'Learning for Text Categorization'},
        title = {A Comparison of Event Models for Naive Bayes Text Classification},
        year = {1998}
     }
     

    See Also:
    Serialized Form
    • Field Detail

      • laplaceCorrectionOption

        public FloatOption laplaceCorrectionOption
      • m_classTotals

        protected double[] m_classTotals
        sum of weight_of_instance * word_count_of_instance for each class
      • m_headerInfo

        protected Instances m_headerInfo
        copy of header information for use in toString method
      • m_numClasses

        protected int m_numClasses
        number of class values
      • m_probOfClass

        protected double[] m_probOfClass
        the probability of a class (i.e. Pr[H])
      • m_wordTotalForClass

        protected DoubleVector[] m_wordTotalForClass
        probability that a word (w) exists in a class (H) (i.e. Pr[w|H]) The matrix is in the this format: m_wordTotalForClass[wordAttribute][class]
      • reset

        protected boolean reset
    • Constructor Detail

      • NaiveBayesMultinomial

        public NaiveBayesMultinomial()
    • Method Detail

      • resetLearningImpl

        public void resetLearningImpl()
        Description copied from class: AbstractClassifier
        Resets this classifier. It must be similar to starting a new classifier from scratch.

        The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.
        Specified by:
        resetLearningImpl in class AbstractClassifier
      • trainOnInstanceImpl

        public void trainOnInstanceImpl​(Instance inst)
        Trains the classifier with the given instance.
        Specified by:
        trainOnInstanceImpl in class AbstractClassifier
        Parameters:
        inst - the new training instance to include in the model
      • getVotesForInstance

        public double[] getVotesForInstance​(Instance instance)
        Calculates the class membership probabilities for the given test instance.
        Specified by:
        getVotesForInstance in interface Classifier
        Specified by:
        getVotesForInstance in class AbstractClassifier
        Parameters:
        instance - the instance to be classified
        Returns:
        predicted class probability distribution
      • totalSize

        public double totalSize​(Instance instance)
      • getModelMeasurementsImpl

        protected Measurement[] getModelMeasurementsImpl()
        Description copied from class: AbstractClassifier
        Gets the current measurements of this classifier.

        The reason for ...Impl methods: ease programmer burden by not requiring them to remember calls to super in overridden methods. Note that this will produce compiler errors if not overridden.
        Specified by:
        getModelMeasurementsImpl in class AbstractClassifier
        Returns:
        an array of measurements to be used in evaluation tasks
      • getModelDescription

        public void getModelDescription​(StringBuilder result,
                                        int indent)
        Description copied from class: AbstractClassifier
        Returns a string representation of the model.
        Specified by:
        getModelDescription in class AbstractClassifier
        Parameters:
        result - the stringbuilder to add the description
        indent - the number of characters to indent
      • isRandomizable

        public boolean isRandomizable()
        Description copied from interface: Learner
        Gets whether this learner needs a random seed. Examples of methods that needs a random seed are bagging and boosting.
        Specified by:
        isRandomizable in interface Learner<Example<Instance>>
        Returns:
        true if the learner needs a random seed.