Class JoinOnID

  • All Implemented Interfaces:
    Destroyable, GlobalInfoSupporter, LoggingLevelHandler, LoggingSupporter, OptionHandler, QuickInfoSupporter, SizeOfHandler, Serializable

    public class JoinOnID
    extends AbstractMerge
    Joins the datasets by concatenating rows that share a unique ID.

    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -class-finder <adams.data.weka.columnfinder.ColumnFinder> (property: classFinder)
        The column finder to use to find class attributes in the datasets.
        default: adams.data.weka.columnfinder.Class
     
    -dataset-names <adams.core.base.BaseString> [-dataset-names ...] (property: datasetNames)
        The list of dataset names to use in attribute renaming.
        default:
     
    -attr-renames-exp <adams.core.base.BaseRegExp> [-attr-renames-exp ...] (property: attributeRenamesExp)
        The expressions to use to select attribute names for renaming (one per dataset
        ).
        default:
        more: https://docs.oracle.com/javase/tutorial/essential/regex/
        https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
     
    -attr-renames-format <adams.core.base.BaseString> [-attr-renames-format ...] (property: attributeRenamesFormat)
        One format string for each renaming expression to specify how to rename
        the attribute. Can contain the {DATASET} keyword which will be replaced
        by the dataset name, and also group identifiers which will be replaced by
        groups from the renaming regex.
        default:
     
    -output-name <java.lang.String> (property: outputName)
        The name to use for the merged dataset.
        default: output
     
    -ensure-equal-values <boolean> (property: ensureEqualValues)
        Whether multiple attributes being merged into a single attribute require
        equal values from all sources.
        default: false
     
    -unique-id <java.lang.String> (property: uniqueID)
        The name of the attribute to use as the joining key for the merge.
        default:
     
    -complete-rows-only <boolean> (property: completeRowsOnly)
        Whether only those IDs that have source data in all datasets should be merged.
        default: false
     

    Performs a merge by using a unique ID attribute for each source dataset to concatenate rows with the same ID.

    Author:
    Corey Sterling (csterlin at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • m_UniqueID

        protected String m_UniqueID
        The name of the attribute to use as the merge key.
      • m_CompleteRowsOnly

        protected boolean m_CompleteRowsOnly
        Whether or not to skip IDs that don't exist in all source datasets.
    • Constructor Detail

      • JoinOnID

        public JoinOnID()
    • Method Detail

      • getUniqueID

        public String getUniqueID()
        Gets the name of the unique ID attribute that the merge is joining on.
        Returns:
        The name of the unique ID attribute.
      • setUniqueID

        public void setUniqueID​(String value)
        Sets the name of the unique ID attribute that the merge is joining on.
        Parameters:
        value - The name of the unique ID attribute.
      • uniqueIDTipText

        public String uniqueIDTipText()
        Gets the tip-text for the unique ID option.
        Returns:
        The tip-text as a String.
      • getCompleteRowsOnly

        public boolean getCompleteRowsOnly()
        Gets whether incomplete rows should be skipped.
        Returns:
        Whether incomplete rows should be skipped.
      • setCompleteRowsOnly

        public void setCompleteRowsOnly​(boolean value)
        Sets whether incomplete rows should be skipped.
        Parameters:
        value - Whether incomplete rows should be skipped.
      • completeRowsOnlyTipText

        public String completeRowsOnlyTipText()
        Gets the tip-text for the complete-rows-only option.
        Returns:
        The tip-text as a String.
      • checkAllDatasetsHaveIDAttribute

        protected String checkAllDatasetsHaveIDAttribute​(weka.core.Instances[] datasets)
        Checks that each of the given datasets has the unique ID attribute. Also checks that the unique ID attribute is the same data type for all datasets.
        Parameters:
        datasets - The datasets that are to be merged.
        Returns:
        Null if all datasets have the unique ID attribute, otherwise an error message.
      • isUniqueIDName

        protected boolean isUniqueIDName​(String attributeName)
        Whether the given attribute name is the name of the unique ID attribute.
        Parameters:
        attributeName - The attribute name to check.
        Returns:
        True if the given attribute name is the unique ID name, false otherwise.
      • findAttributeIndexOfUniqueID

        protected int findAttributeIndexOfUniqueID​(weka.core.Instances dataset)
        Finds the index of the unique ID attribute in the given dataset.
        Parameters:
        dataset - The dataset to search.
        Returns:
        The index of the unique ID attribute, or -1 if not found.
      • compare

        protected int compare​(List<AbstractMerge.SourceAttribute> sources1,
                              List<AbstractMerge.SourceAttribute> sources2)
        Compares two lists of source attributes to determine the order in which their mapped attributes should appear in the merged dataset.
        Overrides:
        compare in class AbstractMerge
        Parameters:
        sources1 - The source attributes of the first mapped attribute.
        sources2 - The source attributes of the second mapped attribute.
        Returns:
        sources1 < sources2 => -1, sources1 > sources2 => 1, otherwise 0;
      • getMappedAttributeName

        protected String getMappedAttributeName​(AbstractMerge.SourceAttribute source)
        Gets the name of the attribute in the merged dataset that the given source attribute maps to.
        Overrides:
        getMappedAttributeName in class AbstractMerge
        Parameters:
        source - The source attribute.
        Returns:
        The name of the mapped attribute in the merged dataset.
      • getRowSetEnumeration

        protected Enumeration<int[]> getRowSetEnumeration()
        Allows specific merge methods to specify the order in which rows are placed into the merged dataset, and which rows from the source datasets are used for the source data.
        Specified by:
        getRowSetEnumeration in class AbstractMerge
        Returns:
        An enumeration of the source rows, one row for each dataset.
      • check

        protected String check​(weka.core.Instances[] datasets)
        Hook method for performing checks before attempting the merge.
        Overrides:
        check in class AbstractMerge
        Parameters:
        datasets - the datasets to merge
        Returns:
        null if successfully checked, otherwise error message