Class LinearRegressionJ

  • All Implemented Interfaces:
    Stoppable, StoppableWithFeedback, Serializable, Cloneable, weka.classifiers.Classifier, weka.core.BatchPredictor, weka.core.CapabilitiesHandler, weka.core.CapabilitiesIgnorer, weka.core.CommandlineRunnable, weka.core.OptionHandler, weka.core.RevisionHandler, weka.core.WeightedInstancesHandler

    public class LinearRegressionJ
    extends StoppableClassifier
    implements weka.core.OptionHandler, weka.core.WeightedInstancesHandler
    Class for using linear regression for prediction. Uses the Akaike criterion for model selection, and is able to deal with weighted instances.

    Valid options are:

     -S <number of selection method>
      Set the attribute selection method to use. 1 = None, 2 = Greedy.
      (default 0 = M5' method)
     -C
      Do not try to eliminate colinear attributes.
     -R <double>
      Set ridge parameter (default 1.0e-8).
     
     -minimal
      Conserve memory, don't keep dataset header and means/stdevs.
      Model cannot be printed out if this option is enabled. (default: keep data)
     -additional-stats
      Output additional statistics.
     -disable-preprocessing
      Disable preprocessing.
     -disable-input-scaling
      Disable input scaling.
     -output-debug-info
      If set, classifier is run in debug mode and
      may output additional info to the console
     -do-not-check-capabilities
      If set, classifier capabilities are not checked before classifier is built
      (use with caution).
     -num-decimal-places
      The number of decimal places for the output of numbers in the model (default 4).
     -batch-size
      The desired batch size for batch prediction  (default 100).
    LinearRegression version based on r12246 before switch to MTJ.
    Author:
    Eibe Frank ([email protected]), Len Trigg ([email protected])
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected int m_AttributeSelection
      The current attribute selection method
      protected int m_ClassIndex
      The index of the class attribute
      protected double m_ClassMean
      The mean of the class attribute
      protected double m_ClassStdDev
      The standard deviations of the class attribute
      protected double[] m_Coefficients
      Array for storing coefficients of linear regression.
      protected int m_DF
      The degrees of freedom of the regression model
      protected boolean m_EliminateColinearAttributes
      Try to eliminate correlated attributes?
      protected boolean m_EnableInputScaling
      whether to scale the input.
      protected boolean m_EnablePreprocessing
      whether to perform preprocessing.
      protected double m_FStat
      The F-statistic of the regression model
      protected double[] m_Means
      The attributes means
      protected boolean m_Minimal
      Conserve memory?
      protected weka.filters.unsupervised.attribute.ReplaceMissingValues m_MissingFilter
      The filter for removing missing values.
      protected boolean m_ModelBuilt
      Model already built?
      protected boolean m_OutputAdditionalStats
      Whether to output additional statistics such as std. dev. of coefficients and t-stats
      protected double m_Ridge
      The ridge parameter
      protected double m_RSquared
      The R-squared value of the regression model
      protected double m_RSquaredAdj
      The adjusted R-squared value of the regression model
      protected boolean[] m_SelectedAttributes
      Which attributes are relevant?
      protected double[] m_StdDevs
      The attribute standard deviations
      protected double[] m_StdErrorOfCoef
      Array for storing the standard error of each coefficient
      protected weka.core.Instances m_TransformedData
      Variable for storing transformed training data.
      protected weka.filters.supervised.attribute.NominalToBinary m_TransformFilter
      The filter storing the transformation from nominal to binary attributes.
      protected double[] m_TStats
      Array for storing the t-statistic of each coefficient
      static int SELECTION_GREEDY
      Attribute selection method: Greedy method
      static int SELECTION_M5
      Attribute selection method: M5 method
      static int SELECTION_NONE
      Attribute selection method: No attribute selection
      static weka.core.Tag[] TAGS_SELECTION
      Attribute selection methods
      • Fields inherited from class weka.classifiers.AbstractClassifier

        BATCH_SIZE_DEFAULT, m_BatchSize, m_Debug, m_DoNotCheckCapabilities, m_numDecimalPlaces, NUM_DECIMAL_PLACES_DEFAULT
    • Constructor Summary

      Constructors 
      Constructor Description
      LinearRegressionJ()
      Default constructor.
    • Field Detail

      • SELECTION_M5

        public static final int SELECTION_M5
        Attribute selection method: M5 method
        See Also:
        Constant Field Values
      • SELECTION_NONE

        public static final int SELECTION_NONE
        Attribute selection method: No attribute selection
        See Also:
        Constant Field Values
      • SELECTION_GREEDY

        public static final int SELECTION_GREEDY
        Attribute selection method: Greedy method
        See Also:
        Constant Field Values
      • TAGS_SELECTION

        public static final weka.core.Tag[] TAGS_SELECTION
        Attribute selection methods
      • m_Coefficients

        protected double[] m_Coefficients
        Array for storing coefficients of linear regression.
      • m_SelectedAttributes

        protected boolean[] m_SelectedAttributes
        Which attributes are relevant?
      • m_TransformedData

        protected weka.core.Instances m_TransformedData
        Variable for storing transformed training data.
      • m_MissingFilter

        protected weka.filters.unsupervised.attribute.ReplaceMissingValues m_MissingFilter
        The filter for removing missing values.
      • m_TransformFilter

        protected weka.filters.supervised.attribute.NominalToBinary m_TransformFilter
        The filter storing the transformation from nominal to binary attributes.
      • m_ClassStdDev

        protected double m_ClassStdDev
        The standard deviations of the class attribute
      • m_ClassMean

        protected double m_ClassMean
        The mean of the class attribute
      • m_ClassIndex

        protected int m_ClassIndex
        The index of the class attribute
      • m_Means

        protected double[] m_Means
        The attributes means
      • m_StdDevs

        protected double[] m_StdDevs
        The attribute standard deviations
      • m_OutputAdditionalStats

        protected boolean m_OutputAdditionalStats
        Whether to output additional statistics such as std. dev. of coefficients and t-stats
      • m_AttributeSelection

        protected int m_AttributeSelection
        The current attribute selection method
      • m_EliminateColinearAttributes

        protected boolean m_EliminateColinearAttributes
        Try to eliminate correlated attributes?
      • m_EnablePreprocessing

        protected boolean m_EnablePreprocessing
        whether to perform preprocessing.
      • m_EnableInputScaling

        protected boolean m_EnableInputScaling
        whether to scale the input.
      • m_Ridge

        protected double m_Ridge
        The ridge parameter
      • m_Minimal

        protected boolean m_Minimal
        Conserve memory?
      • m_ModelBuilt

        protected boolean m_ModelBuilt
        Model already built?
      • m_DF

        protected int m_DF
        The degrees of freedom of the regression model
      • m_RSquared

        protected double m_RSquared
        The R-squared value of the regression model
      • m_RSquaredAdj

        protected double m_RSquaredAdj
        The adjusted R-squared value of the regression model
      • m_FStat

        protected double m_FStat
        The F-statistic of the regression model
      • m_StdErrorOfCoef

        protected double[] m_StdErrorOfCoef
        Array for storing the standard error of each coefficient
      • m_TStats

        protected double[] m_TStats
        Array for storing the t-statistic of each coefficient
    • Constructor Detail

      • LinearRegressionJ

        public LinearRegressionJ()
        Default constructor.
    • Method Detail

      • globalInfo

        public String globalInfo()
        Returns a string describing this classifier
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • getCapabilities

        public weka.core.Capabilities getCapabilities()
        Returns default capabilities of the classifier.
        Specified by:
        getCapabilities in interface weka.core.CapabilitiesHandler
        Specified by:
        getCapabilities in interface weka.classifiers.Classifier
        Overrides:
        getCapabilities in class weka.classifiers.AbstractClassifier
        Returns:
        the capabilities of this classifier
      • buildClassifier

        public void buildClassifier​(weka.core.Instances data)
                             throws Exception
        Builds a regression model for the given data.
        Specified by:
        buildClassifier in interface weka.classifiers.Classifier
        Parameters:
        data - the training data to be used for generating the linear regression function
        Throws:
        Exception - if the classifier could not be built successfully
      • classifyInstance

        public double classifyInstance​(weka.core.Instance instance)
                                throws Exception
        Classifies the given instance using the linear regression function.
        Specified by:
        classifyInstance in interface weka.classifiers.Classifier
        Overrides:
        classifyInstance in class weka.classifiers.AbstractClassifier
        Parameters:
        instance - the test instance
        Returns:
        the classification
        Throws:
        Exception - if classification can't be done successfully
      • listOptions

        public Enumeration<weka.core.Option> listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface weka.core.OptionHandler
        Overrides:
        listOptions in class weka.classifiers.AbstractClassifier
        Returns:
        an enumeration of all the available options.
      • coefficients

        public double[] coefficients()
        Returns the coefficients for this linear model.
        Returns:
        the coefficients for this linear model
      • getOptions

        public String[] getOptions()
        Gets the current settings of the classifier.
        Specified by:
        getOptions in interface weka.core.OptionHandler
        Overrides:
        getOptions in class weka.classifiers.AbstractClassifier
        Returns:
        an array of strings suitable for passing to setOptions
      • setOptions

        public void setOptions​(String[] options)
                        throws Exception
        Parses a given list of options.
        Specified by:
        setOptions in interface weka.core.OptionHandler
        Overrides:
        setOptions in class weka.classifiers.AbstractClassifier
        Parameters:
        options - the list of options as an array of strings
        Throws:
        Exception - if an option is not supported
      • ridgeTipText

        public String ridgeTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getRidge

        public double getRidge()
        Get the value of Ridge.
        Returns:
        Value of Ridge.
      • setRidge

        public void setRidge​(double value)
        Set the value of Ridge.
        Parameters:
        value - Value to assign to Ridge.
      • eliminateColinearAttributesTipText

        public String eliminateColinearAttributesTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getEliminateColinearAttributes

        public boolean getEliminateColinearAttributes()
        Get the value of EliminateColinearAttributes.
        Returns:
        Value of EliminateColinearAttributes.
      • setEliminateColinearAttributes

        public void setEliminateColinearAttributes​(boolean value)
        Set the value of EliminateColinearAttributes.
        Parameters:
        value - Value to assign to EliminateColinearAttributes.
      • numParameters

        public int numParameters()
        Get the number of coefficients used in the model
        Returns:
        the number of coefficients
      • attributeSelectionMethodTipText

        public String attributeSelectionMethodTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getAttributeSelectionMethod

        public weka.core.SelectedTag getAttributeSelectionMethod()
        Gets the method used to select attributes for use in the linear regression.
        Returns:
        the method to use.
      • setAttributeSelectionMethod

        public void setAttributeSelectionMethod​(weka.core.SelectedTag value)
        Sets the method used to select attributes for use in the linear regression.
        Parameters:
        value - the attribute selection method to use.
      • minimalTipText

        public String minimalTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getMinimal

        public boolean getMinimal()
        Returns whether to be more memory conservative or being able to output the model as string.
        Returns:
        true if memory conservation is preferred over outputting model description
      • setMinimal

        public void setMinimal​(boolean value)
        Sets whether to be more memory conservative or being able to output the model as string.
        Parameters:
        value - if true memory will be conserved
      • outputAdditionalStatsTipText

        public String outputAdditionalStatsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getOutputAdditionalStats

        public boolean getOutputAdditionalStats()
        Get whether to output additional statistics (such as std. deviation of coefficients and t-statistics
        Returns:
        true if additional stats are to be output
      • setOutputAdditionalStats

        public void setOutputAdditionalStats​(boolean value)
        Set whether to output additional statistics (such as std. deviation of coefficients and t-statistics
        Parameters:
        value - true if additional stats are to be output
      • enablePreprocessingTipText

        public String enablePreprocessingTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getEnablePreprocessing

        public boolean getEnablePreprocessing()
        Get whether to enable preprocessing.
        Returns:
        true if enabled
        See Also:
        m_TransformFilter, m_MissingFilter
      • setEnablePreprocessing

        public void setEnablePreprocessing​(boolean value)
        Set whether to enable preprocessing.
        Parameters:
        value - true if to enable
        See Also:
        m_TransformFilter, m_MissingFilter
      • enableInputScalingTipText

        public String enableInputScalingTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getEnableInputScaling

        public boolean getEnableInputScaling()
        Get whether to scale the input
        Returns:
        true if to scale
      • setEnableInputScaling

        public void setEnableInputScaling​(boolean value)
        Set whether to scale the input.
        Parameters:
        value - true if to scale the input
      • deselectColinearAttributes

        protected boolean deselectColinearAttributes​(boolean[] selectedAttributes,
                                                     double[] coefficients)
        Removes the attribute with the highest standardised coefficient greater than 1.5 from the selected attributes.
        Parameters:
        selectedAttributes - an array of flags indicating which attributes are included in the regression model
        coefficients - an array of coefficients for the regression model
        Returns:
        true if an attribute was removed
      • findBestModel

        protected void findBestModel()
                              throws Exception
        Performs a greedy search for the best regression model using Akaike's criterion.
        Throws:
        Exception - if regression can't be done
      • calculateSE

        protected double calculateSE​(boolean[] selectedAttributes,
                                     double[] coefficients)
                              throws Exception
        Calculate the squared error of a regression model on the training data
        Parameters:
        selectedAttributes - an array of flags indicating which attributes are included in the regression model
        coefficients - an array of coefficients for the regression model
        Returns:
        the mean squared error on the training data
        Throws:
        Exception - if there is a missing class value in the training data
      • regressionPrediction

        protected double regressionPrediction​(weka.core.Instance transformedInstance,
                                              boolean[] selectedAttributes,
                                              double[] coefficients)
                                       throws Exception
        Calculate the dependent value for a given instance for a given regression model.
        Parameters:
        transformedInstance - the input instance
        selectedAttributes - an array of flags indicating which attributes are included in the regression model
        coefficients - an array of coefficients for the regression model
        Returns:
        the regression value for the instance.
        Throws:
        Exception - if the class attribute of the input instance is not assigned
      • doRegression

        protected double[] doRegression​(boolean[] selectedAttributes)
                                 throws Exception
        Calculate a linear regression using the selected attributes
        Parameters:
        selectedAttributes - an array of booleans where each element is true if the corresponding attribute should be included in the regression.
        Returns:
        an array of coefficients for the linear regression model.
        Throws:
        Exception - if an error occurred during the regression.
      • toString

        public String toString()
        Outputs the linear regression model as a string.
        Overrides:
        toString in class Object
        Returns:
        the model as string
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface weka.core.RevisionHandler
        Overrides:
        getRevision in class weka.classifiers.AbstractClassifier
        Returns:
        the revision
      • main

        public static void main​(String[] args)
        Generates a linear regression function predictor.
        Parameters:
        args - the options