Class NormalizableDistance

  • All Implemented Interfaces:
    DistanceFunction
    Direct Known Subclasses:
    EuclideanDistance

    public abstract class NormalizableDistance
    extends Object
    implements DistanceFunction
    Represents the abstract ancestor for normalizable distance functions, like Euclidean or Manhattan distance.
    Version:
    $Revision: 8034 $
    Author:
    Fracpete (fracpete at waikato dot ac dot nz), Gabi Schmidberger (gabi@cs.waikato.ac.nz) -- original code from weka.core.EuclideanDistance, Ashraf M. Kibriya (amk14@cs.waikato.ac.nz) -- original code from weka.core.EuclideanDistance
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected boolean[] m_ActiveIndices
      The boolean flags, whether an attribute will be used or not.
      protected Instances m_Data
      the instances used internally.
      protected boolean m_DontNormalize
      True if normalization is turned off (default false).
      protected double[][] m_Ranges
      The range of the attributes.
      protected boolean m_Validated
      Whether all the necessary preparations have been done.
      static int R_MAX
      Index in ranges for MAX.
      static int R_MIN
      Index in ranges for MIN.
      static int R_WIDTH
      Index in ranges for WIDTH.
    • Constructor Summary

      Constructors 
      Constructor Description
      NormalizableDistance()
      Invalidates the distance function, Instances must be still set.
      NormalizableDistance​(Instances data)
      Initializes the distance function and automatically initializes the ranges.
    • Field Detail

      • m_Data

        protected Instances m_Data
        the instances used internally.
      • m_DontNormalize

        protected boolean m_DontNormalize
        True if normalization is turned off (default false).
      • m_Ranges

        protected double[][] m_Ranges
        The range of the attributes.
      • m_ActiveIndices

        protected boolean[] m_ActiveIndices
        The boolean flags, whether an attribute will be used or not.
      • m_Validated

        protected boolean m_Validated
        Whether all the necessary preparations have been done.
    • Constructor Detail

      • NormalizableDistance

        public NormalizableDistance()
        Invalidates the distance function, Instances must be still set.
      • NormalizableDistance

        public NormalizableDistance​(Instances data)
        Initializes the distance function and automatically initializes the ranges.
        Parameters:
        data - the instances the distance function should work on
    • Method Detail

      • globalInfo

        public abstract String globalInfo()
        Returns a string describing this object.
        Returns:
        a description of the evaluator suitable for displaying in the explorer/experimenter gui
      • dontNormalizeTipText

        public String dontNormalizeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDontNormalize

        public void setDontNormalize​(boolean dontNormalize)
        Sets whether if the attribute values are to be normalized in distance calculation.
        Parameters:
        dontNormalize - if true the values are not normalized
      • getDontNormalize

        public boolean getDontNormalize()
        Gets whether if the attribute values are to be normazlied in distance calculation. (default false i.e. attribute values are normalized.)
        Returns:
        false if values get normalized
      • attributeIndicesTipText

        public String attributeIndicesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setAttributeIndices

        public void setAttributeIndices​(String value)
        Sets the range of attributes to use in the calculation of the distance. The indices start from 1, 'first' and 'last' are valid as well. E.g.: first-3,5,6-last
        Specified by:
        setAttributeIndices in interface DistanceFunction
        Parameters:
        value - the new attribute index range
      • getAttributeIndices

        public String getAttributeIndices()
        Gets the range of attributes used in the calculation of the distance.
        Specified by:
        getAttributeIndices in interface DistanceFunction
        Returns:
        the attribute index range
      • invertSelectionTipText

        public String invertSelectionTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setInvertSelection

        public void setInvertSelection​(boolean value)
        Sets whether the matching sense of attribute indices is inverted or not.
        Specified by:
        setInvertSelection in interface DistanceFunction
        Parameters:
        value - if true the matching sense is inverted
      • getInvertSelection

        public boolean getInvertSelection()
        Gets whether the matching sense of attribute indices is inverted or not.
        Specified by:
        getInvertSelection in interface DistanceFunction
        Returns:
        true if the matching sense is inverted
      • invalidate

        protected void invalidate()
        invalidates all initializations.
      • validate

        protected void validate()
        performs the initializations if necessary.
      • initialize

        protected void initialize()
        initializes the ranges and the attributes being used.
      • initializeAttributeIndices

        protected void initializeAttributeIndices()
        initializes the attribute indices.
      • postProcessDistances

        public void postProcessDistances​(double[] distances)
        Does nothing, derived classes may override it though.
        Specified by:
        postProcessDistances in interface DistanceFunction
        Parameters:
        distances - the distances to post-process
      • update

        public void update​(Instance ins)
        Update the distance function (if necessary) for the newly added instance.
        Specified by:
        update in interface DistanceFunction
        Parameters:
        ins - the instance to add
      • distance

        public double distance​(Instance first,
                               Instance second)
        Calculates the distance between two instances.
        Specified by:
        distance in interface DistanceFunction
        Parameters:
        first - the first instance
        second - the second instance
        Returns:
        the distance between the two given instances
      • distance

        public double distance​(Instance first,
                               Instance second,
                               double cutOffValue)
        Calculates the distance between two instances. Offers speed up (if the distance function class in use supports it) in nearest neighbour search by taking into account the cutOff or maximum distance. Depending on the distance function class, post processing of the distances by postProcessDistances(double []) may be required if this function is used.
        Specified by:
        distance in interface DistanceFunction
        Parameters:
        first - the first instance
        second - the second instance
        cutOffValue - If the distance being calculated becomes larger than cutOffValue then the rest of the calculation is discarded.
        Returns:
        the distance between the two given instances or Double.POSITIVE_INFINITY if the distance being calculated becomes larger than cutOffValue.
      • updateDistance

        protected abstract double updateDistance​(double currDist,
                                                 double diff)
        Updates the current distance calculated so far with the new difference between two attributes. The difference between the attributes was calculated with the difference(int,double,double) method.
        Parameters:
        currDist - the current distance calculated so far
        diff - the difference between two new attributes
        Returns:
        the update distance
        See Also:
        difference(int, double, double)
      • norm

        protected double norm​(double x,
                              int i)
        Normalizes a given value of a numeric attribute.
        Parameters:
        x - the value to be normalized
        i - the attribute's index
        Returns:
        the normalized value
      • difference

        protected double difference​(int index,
                                    double val1,
                                    double val2)
        Computes the difference between two given attribute values.
        Parameters:
        index - the attribute index
        val1 - the first value
        val2 - the second value
        Returns:
        the difference
      • initializeRanges

        public double[][] initializeRanges()
        Initializes the ranges using all instances of the dataset. Sets m_Ranges.
        Returns:
        the ranges
      • updateRangesFirst

        public void updateRangesFirst​(Instance instance,
                                      int numAtt,
                                      double[][] ranges)
        Used to initialize the ranges. For this the values of the first instance is used to save time. Sets low and high to the values of the first instance and width to zero.
        Parameters:
        instance - the new instance
        numAtt - number of attributes in the model
        ranges - low, high and width values for all attributes
      • updateRanges

        public void updateRanges​(Instance instance,
                                 int numAtt,
                                 double[][] ranges)
        Updates the minimum and maximum and width values for all the attributes based on a new instance.
        Parameters:
        instance - the new instance
        numAtt - number of attributes in the model
        ranges - low, high and width values for all attributes
      • initializeRangesEmpty

        public void initializeRangesEmpty​(int numAtt,
                                          double[][] ranges)
        Used to initialize the ranges.
        Parameters:
        numAtt - number of attributes in the model
        ranges - low, high and width values for all attributes
      • updateRanges

        public double[][] updateRanges​(Instance instance,
                                       double[][] ranges)
        Updates the ranges given a new instance.
        Parameters:
        instance - the new instance
        ranges - low, high and width values for all attributes
        Returns:
        the updated ranges
      • initializeRanges

        public double[][] initializeRanges​(int[] instList)
                                    throws Exception
        Initializes the ranges of a subset of the instances of this dataset. Therefore m_Ranges is not set.
        Parameters:
        instList - list of indexes of the subset
        Returns:
        the ranges
        Throws:
        Exception - if something goes wrong
      • initializeRanges

        public double[][] initializeRanges​(int[] instList,
                                           int startIdx,
                                           int endIdx)
                                    throws Exception
        Initializes the ranges of a subset of the instances of this dataset. Therefore m_Ranges is not set. The caller of this method should ensure that the supplied start and end indices are valid (start <= end, end<instList.length etc) and correct.
        Parameters:
        instList - list of indexes of the instances
        startIdx - start index of the subset of instances in the indices array
        endIdx - end index of the subset of instances in the indices array
        Returns:
        the ranges
        Throws:
        Exception - if something goes wrong
      • updateRanges

        public void updateRanges​(Instance instance)
        Update the ranges if a new instance comes.
        Parameters:
        instance - the new instance
      • inRanges

        public boolean inRanges​(Instance instance,
                                double[][] ranges)
        Test if an instance is within the given ranges.
        Parameters:
        instance - the instance
        ranges - the ranges the instance is tested to be in
        Returns:
        true if instance is within the ranges
      • rangesSet

        public boolean rangesSet()
        Check if ranges are set.
        Returns:
        true if ranges are set
      • getRanges

        public double[][] getRanges()
                             throws Exception
        Method to get the ranges.
        Returns:
        the ranges
        Throws:
        Exception - if no randes are set yet
      • toString

        public String toString()
        Returns an empty string.
        Overrides:
        toString in class Object
        Returns:
        an empty string
      • isMissingValue

        public static boolean isMissingValue​(double val)
        Tests if the given value codes "missing".
        Parameters:
        val - the value to be tested
        Returns:
        true if val codes "missing"