Class AbstractMerge

  • All Implemented Interfaces:
    adams.core.Destroyable, adams.core.GlobalInfoSupporter, adams.core.logging.LoggingLevelHandler, adams.core.logging.LoggingSupporter, adams.core.option.OptionHandler, adams.core.QuickInfoSupporter, adams.core.SizeOfHandler, Serializable
    Direct Known Subclasses:
    JoinOnID, Simple

    public abstract class AbstractMerge
    extends adams.core.option.AbstractOptionHandler
    implements adams.core.QuickInfoSupporter

    Ancestor for merge schemes.

    Author:
    Corey Sterling (csterlin at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • DATASET_KEYWORD

        protected static final String DATASET_KEYWORD
        The keyword to replace with the dataset name in attribute renaming.
        See Also:
        Constant Field Values
      • ROW_MISSING

        protected static final int ROW_MISSING
        The constant value for datasets that do not have an input row for this output row.
        See Also:
        Constant Field Values
      • m_ClassFinder

        protected ColumnFinder m_ClassFinder
        The column finder for selecting class attributes.
      • m_DatasetNames

        protected adams.core.base.BaseString[] m_DatasetNames
        The name of each dataset to use in attribute renaming.
      • m_AttributeRenameFindRegexs

        protected adams.core.base.BaseRegExp[] m_AttributeRenameFindRegexs
        The regexs to use to find attributes that require renaming.
      • m_AttributeRenameFormatStrings

        protected adams.core.base.BaseString[] m_AttributeRenameFormatStrings
        The format strings specifying how to rename attributes.
      • m_MergedDatasetName

        protected String m_MergedDatasetName
        The name to give the resulting dataset.
      • m_EnsureEqualValues

        protected boolean m_EnsureEqualValues
        Whether to check attributes with multiple sources for equal values among those sources.
      • m_Datasets

        protected weka.core.Instances[] m_Datasets
        The source datasets we are merging.
      • m_ClassAttributes

        protected int[][] m_ClassAttributes
        The set of class attributes for the given datasets.
    • Constructor Detail

      • AbstractMerge

        public AbstractMerge()
    • Method Detail

      • defineOptions

        public void defineOptions()
        Adds options to the internal list of options.
        Specified by:
        defineOptions in interface adams.core.option.OptionHandler
        Overrides:
        defineOptions in class adams.core.option.AbstractOptionHandler
      • getClassFinder

        public ColumnFinder getClassFinder()
        Gets the finder to use for finding class attributes in the source datasets.
        Returns:
        The class-attribute finder.
      • setClassFinder

        public void setClassFinder​(ColumnFinder value)
        Sets the finder to use for finding class attributes in the source datasets.
        Parameters:
        value - The class-attribute finder.
      • classFinderTipText

        public String classFinderTipText()
        Gets the tip-text for the classFinder option.
        Returns:
        The tip-text as a String.
      • getDatasetNames

        public adams.core.base.BaseString[] getDatasetNames()
        Gets the list of names to use in attribute renaming in place of the {DATASET} keyword.
        Returns:
        The list of dataset names.
      • setDatasetNames

        public void setDatasetNames​(adams.core.base.BaseString[] value)
        Sets the list of names to use in attribute renaming in place of the {DATASET} keyword.
        Parameters:
        value - The list of dataset names.
      • datasetNamesTipText

        public String datasetNamesTipText()
        Gets the tip-text for the dataset names option.
        Returns:
        The tip-text as a String.
      • getAttributeRenamesExp

        public adams.core.base.BaseRegExp[] getAttributeRenamesExp()
        Gets the array of attribute rename expressions.
        Returns:
        The array of regexs.
      • setAttributeRenamesExp

        public void setAttributeRenamesExp​(adams.core.base.BaseRegExp[] value)
        Sets the array of attribute rename expressions.
        Parameters:
        value - The array of regexs.
      • attributeRenamesExpTipText

        public String attributeRenamesExpTipText()
        Gets the tip-text for the attribute-renaming regexs option.
        Returns:
        The tip-text as a String.
      • getAttributeRenamesFormat

        public adams.core.base.BaseString[] getAttributeRenamesFormat()
        Gets the array of format strings used for attribute renaming.
        Returns:
        The array of format strings.
      • setAttributeRenamesFormat

        public void setAttributeRenamesFormat​(adams.core.base.BaseString[] value)
        Sets the array of format strings used for attribute renaming.
        Parameters:
        value - The array of format strings.
      • attributeRenamesFormatTipText

        public String attributeRenamesFormatTipText()
        Gets the tip-text for the attribute renaming format strings option.
        Returns:
        The tip-text as a String.
      • getOutputName

        public String getOutputName()
        Gets the name to use for the merged dataset.
        Returns:
        The name to use.
      • setOutputName

        public void setOutputName​(String value)
        Sets the name to use for the merged dataset.
        Parameters:
        value - The name to use.
      • outputNameTipText

        public String outputNameTipText()
        Gets the tip-text for the output name option.
        Returns:
        The tip-text as a String.
      • getEnsureEqualValues

        public boolean getEnsureEqualValues()
        Gets whether to check all data-sources for a merged attribute have the same value.
        Returns:
        True if value equality should be checked, false if not.
      • setEnsureEqualValues

        public void setEnsureEqualValues​(boolean value)
        Sets whether to check all data-sources for a merged attribute have the same value.
        Parameters:
        value - True if value equality should be checked, false if not.
      • ensureEqualValuesTipText

        public String ensureEqualValuesTipText()
        Gets the tip-text for the ensure-equal-values option.
        Returns:
        The tip-text as a String.
      • getQuickInfo

        public String getQuickInfo()
        Returns a quick info about the object, which can be displayed in the GUI.
        Default implementation returns just null.
        Specified by:
        getQuickInfo in interface adams.core.QuickInfoSupporter
        Returns:
        null if no info available, otherwise short string
      • setValue

        protected void setValue​(weka.core.Instance toSet,
                                int attributeIndex,
                                Object value)
        Sets the value of the given attribute in the given instance to the given value (handles object conversion).
        Parameters:
        toSet - The instance against which the value should be set.
        attributeIndex - The index of the attribute against which to set the value.
        value - The value to set the attribute to.
      • getValue

        protected Object getValue​(weka.core.Instance toGetFrom,
                                  int attributeIndex)
        Gets the value of the specified attribute from the given Instance.
        Parameters:
        toGetFrom - The instance to get a value from.
        attributeIndex - The index of the value's attribute.
        Returns:
        The value of the instance at the given index.
      • newInstanceForDataset

        protected weka.core.Instance newInstanceForDataset​(weka.core.Instances dataset)
        Creates a new dense instance of the size expected by the given dataset.
        Parameters:
        dataset - The dataset to create a new instance for.
        Returns:
        The created dataset.
      • check

        protected String check​(weka.core.Instances[] datasets)
        Hook method for performing checks before attempting the merge.
        Parameters:
        datasets - the datasets to merge
        Returns:
        null if successfully checked, otherwise error message
      • checkAttributeMapping

        protected String checkAttributeMapping​(Map<String,​List<AbstractMerge.SourceAttribute>> attributeMapping)
        Makes sure the source data for each mapped attribute is the same type.
        Parameters:
        attributeMapping - The attribute mapping.
        Returns:
        Null if all mappings are okay, or an error message if not.
      • merge

        public weka.core.Instances merge​(weka.core.Instances[] datasets)
        Merges the datasets.
        Parameters:
        datasets - the datasets to merge
        Returns:
        the merged dataset
      • getValueFirstAvailable

        protected Object getValueFirstAvailable​(int[] rowSet,
                                                List<AbstractMerge.SourceAttribute> sourceAttributes)
        Gets the first encountered source value for a merged attribute.
        Parameters:
        rowSet - The row-set of source data.
        sourceAttributes - The source attribute mapping elements.
        Returns:
        The value of the merged attribute.
      • getValueEnsureEqual

        protected Object getValueEnsureEqual​(int[] rowSet,
                                             List<AbstractMerge.SourceAttribute> sourceAttributeElements)
        Gets the value of the mapped attribute, ensuring that all possible sources either provide a missing value or the same value as each other.
        Parameters:
        rowSet - The row-set of source data.
        sourceAttributeElements - The source attribute mapping elements.
        Returns:
        The value of the merged attribute.
      • createAttributeMapping

        protected Map<String,​List<AbstractMerge.SourceAttribute>> createAttributeMapping()
        Creates a mapping from the attributes in each input dataset to the corresponding attribute in the merged dataset.
        Returns:
        The mapping from input attribute names to output attribute names.
      • isAnyClassAttribute

        protected boolean isAnyClassAttribute​(List<AbstractMerge.SourceAttribute> sources)
        Checks if any of the source attributes in the given list is a class attribute.
        Parameters:
        sources - The source attributes to check.
        Returns:
        True if a source attribute is a class, false if none are.
      • isClassAttribute

        protected boolean isClassAttribute​(AbstractMerge.SourceAttribute source)
        Checks if the given source attribute is a class attribute.
        Parameters:
        source - The source attribute to check.
        Returns:
        True if the source is a class attribute, false if not.
      • isClassAttribute

        protected boolean isClassAttribute​(int datasetIndex,
                                           int attributeIndex)
        Whether the given attribute is a class attribute.
        Parameters:
        datasetIndex - The dataset the attribute is in.
        attributeIndex - The index of the attribute in the dataset.
        Returns:
        True if the given attribute is a class attribute, false if not.
      • recordClassAttributes

        protected void recordClassAttributes()
        Scans the datasets for attributes that should be considered classes, and keeps a record of them.
      • createEmptyResultantDataset

        protected weka.core.Instances createEmptyResultantDataset​(Map<String,​List<AbstractMerge.SourceAttribute>> attributeMapping)
        Creates the resultant dataset, ready to be filled with data.
        Parameters:
        attributeMapping - The mapping from merged attribute names to their original names.
        Returns:
        The empty Instances object for the merged dataset.
      • createMappedAttribute

        protected weka.core.Attribute createMappedAttribute​(String name,
                                                            List<AbstractMerge.SourceAttribute> sources)
        Creates the attribute for the output merged dataset for the given attribute mapping.
        Parameters:
        name - The name of the mapped attribute.
        sources - The list of mappings that the attribute maps to.
        Returns:
        The attribute for the merged dataset.
      • compare

        protected int compare​(List<AbstractMerge.SourceAttribute> sources1,
                              List<AbstractMerge.SourceAttribute> sources2)
        Compares two lists of source attributes to determine the order in which their mapped attributes should appear in the merged dataset.
        Parameters:
        sources1 - The source attributes of the first mapped attribute.
        sources2 - The source attributes of the second mapped attribute.
        Returns:
        sources1 < sources2 => -1, sources1 > sources2 => 1, otherwise 0;
      • getMappedAttributeName

        protected String getMappedAttributeName​(AbstractMerge.SourceAttribute source)
        Gets the name of the attribute in the merged dataset that the given source attribute maps to.
        Parameters:
        source - The source attribute.
        Returns:
        The name of the mapped attribute in the merged dataset.
      • resetInternalState

        protected void resetInternalState​(weka.core.Instances[] datasets)
        Resets the internal state of the merge method when new datasets are supplied.
        Parameters:
        datasets - The datasets being merged.
      • getRowSetEnumeration

        protected abstract Enumeration<int[]> getRowSetEnumeration()
        Allows specific merge methods to specify the order in which rows are placed into the merged dataset, and which rows from the source datasets are used for the source data.
        Returns:
        An enumeration of the source rows, one row for each dataset.