Package adams.tools
Class CompareDatasets
-
- All Implemented Interfaces:
CleanUpHandler,Destroyable,GlobalInfoSupporter,FileWriter,LoggingLevelHandler,LoggingSupporter,OptionHandler,SizeOfHandler,Stoppable,StoppableWithFeedback,OutputFileGenerator,Serializable,Comparable
public class CompareDatasets extends AbstractTool implements OutputFileGenerator
Compares two datasets, either row-by-row or using a row attribute listing a unique ID for matching the rows, outputting the correlation coefficient of the numeric attributes found in the ranges defined by the user.
In order to trim down the number of generated rows, a threshold can be specified. Only rows are output which correlation coefficient is below that threshold.
Valid options are:
-D <int> (property: debugLevel) The greater the number the more additional info the scheme may output to the console (0 = off). default: 0 minimum: 0
-dataset1 <adams.core.io.PlaceholderFile> (property: dataset1) The first dataset in the comparison. default: .
-range1 <java.lang.String> (property: range1) The range of attributes of the first dataset. default: first-last
-row1 <java.lang.String> (property: rowAttribute1) The index for the attribute used for identifying rows to compare; if not provided, then the comparison is performed row-by-row (first dataset). default:
-dataset2 <adams.core.io.PlaceholderFile> (property: dataset2) The second dataset in the comparison. default: .
-range2 <java.lang.String> (property: range2) The range of attributes of the second dataset. default: first-last
-row2 <java.lang.String> (property: rowAttribute2) The index for the attribute used for identifying rows to compare; if not provided, then the comparison is performed row-by-row (second dataset). default:
-output <adams.core.io.PlaceholderFile> (property: outputFile) The file to save the comparison result in (CSV format). default: output.csv
-missing <adams.core.io.PlaceholderFile> (property: missing) The file to save the information about missing rows to (CSV format). default: missing.csv
-threshold <double> (property: threshold) The threshold for the correlation coefficient; only if the coefficient is below that threshold, it will get output; 0.0 turns the threshold off. default: 0.0 minimum: 0.0 maximum: 1.0
- Version:
- $Revision$
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.core.Instancesm_Data1the current dataset 1.protected weka.core.Instancesm_Data2the current dataset 2.protected PlaceholderFilem_Dataset1the first dataset.protected PlaceholderFilem_Dataset2the second dataset.protected int[]m_Indices1the indices for the first dataset.protected int[]m_Indices2the indices for the second dataset.protected Hashtable<String,Integer>m_Lookup2the lookup table of indices for the second dataset.protected PlaceholderFilem_Missingthe output file for missing tests (CSV format).protected PlaceholderFilem_OutputFilethe output file (CSV format).protected Rangem_Range1the first range of attributes.protected Rangem_Range2the second range of attributes.protected Indexm_RowAttribute1the optional attribute for matching up rows (dataset 1).protected Indexm_RowAttribute2the optional attribute for matching up rows (dataset 2).protected booleanm_RowAttributeIsStringwhether the row attribute is a string/nominal attribute or not.protected doublem_Thresholdthe threshold for listing correlations.protected Booleanm_UseRowAttributewhether to use the row attribute or not.-
Fields inherited from class adams.tools.AbstractTool
m_Stopped
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description CompareDatasets()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcleanUp()Cleans up data structures, frees up memory.Stringdataset1TipText()Returns the tip text for this property.Stringdataset2TipText()Returns the tip text for this property.voiddefineOptions()Adds options to the internal list of options.protected voiddoRun()Performs the comparison.protected doublegetCorrelation(weka.core.Instance first, weka.core.Instance second)Returns the correlation between the two rows.PlaceholderFilegetDataset1()Returns the first dataset for the comparison.PlaceholderFilegetDataset2()Returns the second dataset for the comparison.PlaceholderFilegetMissing()Returns the first dataset for the comparison.PlaceholderFilegetOutputFile()Returns the first dataset for the comparison.RangegetRange1()Returns the range of attributes of the first dataset.RangegetRange2()Returns the range of attributes of the second dataset.StringgetRowAttribute1()Returns the index of the attribute used for identifying rows to compare against each other (first dataset).StringgetRowAttribute2()Returns the index of the attribute used for identifying rows to compare against each other (second dataset).protected StringgetRowID(int index)Returns either the ID for the row, either the row index of the actual row attribute ID for that position.doublegetThreshold()Returns the threshold for the correlation coefficient.protected booleangetUseRowAttribute()Returns whether to use the row attribute or the order in the datasets for matching up the rows.StringglobalInfo()Returns a string describing the object.protected voidinitialize()Initializes the members.protected voidinitLookup()Initializes the lookup table of indices for the second dataset, if necessary.StringmissingTipText()Returns the tip text for this property.protected weka.core.Instance[]next(int index)Returns the next row pair to compare.protected weka.core.Instance[]nextByIndex(int index)Returns the next pair by simple index.protected weka.core.Instance[]nextByRowAttribute(int index)Returns the next pair by using the value of the row attribute.StringoutputFileTipText()Returns the tip text for this property.protected voidpreRun()Before the actual run is executed.Stringrange1TipText()Returns the tip text for this property.Stringrange2TipText()Returns the tip text for this property.StringrowAttribute1TipText()Returns the tip text for this property.StringrowAttribute2TipText()Returns the tip text for this property.voidsetDataset1(PlaceholderFile value)Sets the first dataset for the comparison.voidsetDataset2(PlaceholderFile value)Sets the second dataset for the comparison.voidsetMissing(PlaceholderFile value)Sets the first dataset for the comparison.voidsetOutputFile(PlaceholderFile value)Sets the first dataset for the comparison.voidsetRange1(Range value)Sets the range of attributes of the first dataset.voidsetRange2(Range value)Sets the range of attributes of the second dataset.voidsetRowAttribute1(String value)Sets the index of the attribute used for identifying rows to compare against each other (first dataset).voidsetRowAttribute2(String value)Sets the index of the attribute used for identifying rows to compare against each other (second dataset).voidsetThreshold(double value)Sets the threshold for the correlation coefficient.StringthresholdTipText()Returns the tip text for this property.-
Methods inherited from class adams.tools.AbstractTool
compareTo, destroy, equals, forCommandLine, forName, getTools, isStopped, postRun, run, runTool, stopExecution
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
-
-
-
Field Detail
-
m_Dataset1
protected PlaceholderFile m_Dataset1
the first dataset.
-
m_Range1
protected Range m_Range1
the first range of attributes.
-
m_RowAttribute1
protected Index m_RowAttribute1
the optional attribute for matching up rows (dataset 1).
-
m_Dataset2
protected PlaceholderFile m_Dataset2
the second dataset.
-
m_Range2
protected Range m_Range2
the second range of attributes.
-
m_RowAttribute2
protected Index m_RowAttribute2
the optional attribute for matching up rows (dataset 2).
-
m_OutputFile
protected PlaceholderFile m_OutputFile
the output file (CSV format).
-
m_Missing
protected PlaceholderFile m_Missing
the output file for missing tests (CSV format).
-
m_Data1
protected weka.core.Instances m_Data1
the current dataset 1.
-
m_Data2
protected weka.core.Instances m_Data2
the current dataset 2.
-
m_UseRowAttribute
protected Boolean m_UseRowAttribute
whether to use the row attribute or not.
-
m_RowAttributeIsString
protected boolean m_RowAttributeIsString
whether the row attribute is a string/nominal attribute or not.
-
m_Indices1
protected int[] m_Indices1
the indices for the first dataset.
-
m_Indices2
protected int[] m_Indices2
the indices for the second dataset.
-
m_Lookup2
protected Hashtable<String,Integer> m_Lookup2
the lookup table of indices for the second dataset.
-
m_Threshold
protected double m_Threshold
the threshold for listing correlations.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfoin interfaceGlobalInfoSupporter- Specified by:
globalInfoin classAbstractOptionHandler- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptionsin interfaceOptionHandler- Overrides:
defineOptionsin classAbstractOptionHandler
-
initialize
protected void initialize()
Initializes the members.- Overrides:
initializein classAbstractOptionHandler
-
setDataset1
public void setDataset1(PlaceholderFile value)
Sets the first dataset for the comparison.- Parameters:
value- the dataset
-
getDataset1
public PlaceholderFile getDataset1()
Returns the first dataset for the comparison.- Returns:
- the dataset
-
dataset1TipText
public String dataset1TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setDataset2
public void setDataset2(PlaceholderFile value)
Sets the second dataset for the comparison.- Parameters:
value- the dataset
-
getDataset2
public PlaceholderFile getDataset2()
Returns the second dataset for the comparison.- Returns:
- the dataset
-
dataset2TipText
public String dataset2TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRange1
public void setRange1(Range value)
Sets the range of attributes of the first dataset.- Parameters:
value- the range
-
getRange1
public Range getRange1()
Returns the range of attributes of the first dataset.- Returns:
- the range
-
range1TipText
public String range1TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRange2
public void setRange2(Range value)
Sets the range of attributes of the second dataset.- Parameters:
value- the range
-
getRange2
public Range getRange2()
Returns the range of attributes of the second dataset.- Returns:
- the range
-
range2TipText
public String range2TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRowAttribute1
public void setRowAttribute1(String value)
Sets the index of the attribute used for identifying rows to compare against each other (first dataset).- Parameters:
value- the index
-
getRowAttribute1
public String getRowAttribute1()
Returns the index of the attribute used for identifying rows to compare against each other (first dataset).- Returns:
- the index
-
rowAttribute1TipText
public String rowAttribute1TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRowAttribute2
public void setRowAttribute2(String value)
Sets the index of the attribute used for identifying rows to compare against each other (second dataset).- Parameters:
value- the index
-
getRowAttribute2
public String getRowAttribute2()
Returns the index of the attribute used for identifying rows to compare against each other (second dataset).- Returns:
- the index
-
rowAttribute2TipText
public String rowAttribute2TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setOutputFile
public void setOutputFile(PlaceholderFile value)
Sets the first dataset for the comparison.- Specified by:
setOutputFilein interfaceFileWriter- Parameters:
value- the dataset
-
getOutputFile
public PlaceholderFile getOutputFile()
Returns the first dataset for the comparison.- Specified by:
getOutputFilein interfaceFileWriter- Returns:
- the dataset
-
outputFileTipText
public String outputFileTipText()
Returns the tip text for this property.- Specified by:
outputFileTipTextin interfaceFileWriter- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setMissing
public void setMissing(PlaceholderFile value)
Sets the first dataset for the comparison.- Parameters:
value- the dataset
-
getMissing
public PlaceholderFile getMissing()
Returns the first dataset for the comparison.- Returns:
- the dataset
-
missingTipText
public String missingTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setThreshold
public void setThreshold(double value)
Sets the threshold for the correlation coefficient.- Parameters:
value- the threshold (0.0 turns it off)
-
getThreshold
public double getThreshold()
Returns the threshold for the correlation coefficient.- Returns:
- the threshold (0.0 means it is turned off)
-
thresholdTipText
public String thresholdTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
preRun
protected void preRun()
Before the actual run is executed.- Overrides:
preRunin classAbstractTool
-
getUseRowAttribute
protected boolean getUseRowAttribute()
Returns whether to use the row attribute or the order in the datasets for matching up the rows.- Returns:
- true if the row attribute is used for matching
-
getRowID
protected String getRowID(int index)
Returns either the ID for the row, either the row index of the actual row attribute ID for that position.- Parameters:
index- the index to get the ID for- Returns:
- the ID
-
nextByIndex
protected weka.core.Instance[] nextByIndex(int index)
Returns the next pair by simple index.- Parameters:
index- the index of the pair to retrieve- Returns:
- the row pair or null if not available
-
initLookup
protected void initLookup()
Initializes the lookup table of indices for the second dataset, if necessary.
-
nextByRowAttribute
protected weka.core.Instance[] nextByRowAttribute(int index)
Returns the next pair by using the value of the row attribute.- Parameters:
index- the index of the pair to retrieve- Returns:
- the row pair or null if not available
-
next
protected weka.core.Instance[] next(int index)
Returns the next row pair to compare.- Parameters:
index- the index of the pair to retrieve- Returns:
- the row pair or null if not available
-
getCorrelation
protected double getCorrelation(weka.core.Instance first, weka.core.Instance second)Returns the correlation between the two rows.- Parameters:
first- the first rowsecond- the second row- Returns:
- the correlation
-
doRun
protected void doRun()
Performs the comparison.- Specified by:
doRunin classAbstractTool
-
cleanUp
public void cleanUp()
Cleans up data structures, frees up memory.- Specified by:
cleanUpin interfaceCleanUpHandler- Overrides:
cleanUpin classAbstractTool
-
-