Package adams.tools
Class CompareDatasets
-
- All Implemented Interfaces:
CleanUpHandler
,Destroyable
,GlobalInfoSupporter
,FileWriter
,LoggingLevelHandler
,LoggingSupporter
,OptionHandler
,SizeOfHandler
,Stoppable
,StoppableWithFeedback
,OutputFileGenerator
,Serializable
,Comparable
public class CompareDatasets extends AbstractTool implements OutputFileGenerator
Compares two datasets, either row-by-row or using a row attribute listing a unique ID for matching the rows, outputting the correlation coefficient of the numeric attributes found in the ranges defined by the user.
In order to trim down the number of generated rows, a threshold can be specified. Only rows are output which correlation coefficient is below that threshold.
Valid options are:
-D <int> (property: debugLevel) The greater the number the more additional info the scheme may output to the console (0 = off). default: 0 minimum: 0
-dataset1 <adams.core.io.PlaceholderFile> (property: dataset1) The first dataset in the comparison. default: .
-range1 <java.lang.String> (property: range1) The range of attributes of the first dataset. default: first-last
-row1 <java.lang.String> (property: rowAttribute1) The index for the attribute used for identifying rows to compare; if not provided, then the comparison is performed row-by-row (first dataset). default:
-dataset2 <adams.core.io.PlaceholderFile> (property: dataset2) The second dataset in the comparison. default: .
-range2 <java.lang.String> (property: range2) The range of attributes of the second dataset. default: first-last
-row2 <java.lang.String> (property: rowAttribute2) The index for the attribute used for identifying rows to compare; if not provided, then the comparison is performed row-by-row (second dataset). default:
-output <adams.core.io.PlaceholderFile> (property: outputFile) The file to save the comparison result in (CSV format). default: output.csv
-missing <adams.core.io.PlaceholderFile> (property: missing) The file to save the information about missing rows to (CSV format). default: missing.csv
-threshold <double> (property: threshold) The threshold for the correlation coefficient; only if the coefficient is below that threshold, it will get output; 0.0 turns the threshold off. default: 0.0 minimum: 0.0 maximum: 1.0
- Version:
- $Revision$
- Author:
- fracpete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected weka.core.Instances
m_Data1
the current dataset 1.protected weka.core.Instances
m_Data2
the current dataset 2.protected PlaceholderFile
m_Dataset1
the first dataset.protected PlaceholderFile
m_Dataset2
the second dataset.protected int[]
m_Indices1
the indices for the first dataset.protected int[]
m_Indices2
the indices for the second dataset.protected Hashtable<String,Integer>
m_Lookup2
the lookup table of indices for the second dataset.protected PlaceholderFile
m_Missing
the output file for missing tests (CSV format).protected PlaceholderFile
m_OutputFile
the output file (CSV format).protected Range
m_Range1
the first range of attributes.protected Range
m_Range2
the second range of attributes.protected Index
m_RowAttribute1
the optional attribute for matching up rows (dataset 1).protected Index
m_RowAttribute2
the optional attribute for matching up rows (dataset 2).protected boolean
m_RowAttributeIsString
whether the row attribute is a string/nominal attribute or not.protected double
m_Threshold
the threshold for listing correlations.protected Boolean
m_UseRowAttribute
whether to use the row attribute or not.-
Fields inherited from class adams.tools.AbstractTool
m_Stopped
-
Fields inherited from class adams.core.option.AbstractOptionHandler
m_OptionManager
-
Fields inherited from class adams.core.logging.LoggingObject
m_Logger, m_LoggingIsEnabled, m_LoggingLevel
-
-
Constructor Summary
Constructors Constructor Description CompareDatasets()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
cleanUp()
Cleans up data structures, frees up memory.String
dataset1TipText()
Returns the tip text for this property.String
dataset2TipText()
Returns the tip text for this property.void
defineOptions()
Adds options to the internal list of options.protected void
doRun()
Performs the comparison.protected double
getCorrelation(weka.core.Instance first, weka.core.Instance second)
Returns the correlation between the two rows.PlaceholderFile
getDataset1()
Returns the first dataset for the comparison.PlaceholderFile
getDataset2()
Returns the second dataset for the comparison.PlaceholderFile
getMissing()
Returns the first dataset for the comparison.PlaceholderFile
getOutputFile()
Returns the first dataset for the comparison.Range
getRange1()
Returns the range of attributes of the first dataset.Range
getRange2()
Returns the range of attributes of the second dataset.String
getRowAttribute1()
Returns the index of the attribute used for identifying rows to compare against each other (first dataset).String
getRowAttribute2()
Returns the index of the attribute used for identifying rows to compare against each other (second dataset).protected String
getRowID(int index)
Returns either the ID for the row, either the row index of the actual row attribute ID for that position.double
getThreshold()
Returns the threshold for the correlation coefficient.protected boolean
getUseRowAttribute()
Returns whether to use the row attribute or the order in the datasets for matching up the rows.String
globalInfo()
Returns a string describing the object.protected void
initialize()
Initializes the members.protected void
initLookup()
Initializes the lookup table of indices for the second dataset, if necessary.String
missingTipText()
Returns the tip text for this property.protected weka.core.Instance[]
next(int index)
Returns the next row pair to compare.protected weka.core.Instance[]
nextByIndex(int index)
Returns the next pair by simple index.protected weka.core.Instance[]
nextByRowAttribute(int index)
Returns the next pair by using the value of the row attribute.String
outputFileTipText()
Returns the tip text for this property.protected void
preRun()
Before the actual run is executed.String
range1TipText()
Returns the tip text for this property.String
range2TipText()
Returns the tip text for this property.String
rowAttribute1TipText()
Returns the tip text for this property.String
rowAttribute2TipText()
Returns the tip text for this property.void
setDataset1(PlaceholderFile value)
Sets the first dataset for the comparison.void
setDataset2(PlaceholderFile value)
Sets the second dataset for the comparison.void
setMissing(PlaceholderFile value)
Sets the first dataset for the comparison.void
setOutputFile(PlaceholderFile value)
Sets the first dataset for the comparison.void
setRange1(Range value)
Sets the range of attributes of the first dataset.void
setRange2(Range value)
Sets the range of attributes of the second dataset.void
setRowAttribute1(String value)
Sets the index of the attribute used for identifying rows to compare against each other (first dataset).void
setRowAttribute2(String value)
Sets the index of the attribute used for identifying rows to compare against each other (second dataset).void
setThreshold(double value)
Sets the threshold for the correlation coefficient.String
thresholdTipText()
Returns the tip text for this property.-
Methods inherited from class adams.tools.AbstractTool
compareTo, destroy, equals, forCommandLine, forName, getTools, isStopped, postRun, run, runTool, stopExecution
-
Methods inherited from class adams.core.option.AbstractOptionHandler
cleanUpOptions, finishInit, getDefaultLoggingLevel, getOptionManager, loggingLevelTipText, newOptionManager, reset, setLoggingLevel, toCommandLine, toString
-
Methods inherited from class adams.core.logging.LoggingObject
configureLogger, getLogger, getLoggingLevel, initializeLogging, isLoggingEnabled, sizeOf
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface adams.core.logging.LoggingLevelHandler
getLoggingLevel
-
-
-
-
Field Detail
-
m_Dataset1
protected PlaceholderFile m_Dataset1
the first dataset.
-
m_Range1
protected Range m_Range1
the first range of attributes.
-
m_RowAttribute1
protected Index m_RowAttribute1
the optional attribute for matching up rows (dataset 1).
-
m_Dataset2
protected PlaceholderFile m_Dataset2
the second dataset.
-
m_Range2
protected Range m_Range2
the second range of attributes.
-
m_RowAttribute2
protected Index m_RowAttribute2
the optional attribute for matching up rows (dataset 2).
-
m_OutputFile
protected PlaceholderFile m_OutputFile
the output file (CSV format).
-
m_Missing
protected PlaceholderFile m_Missing
the output file for missing tests (CSV format).
-
m_Data1
protected weka.core.Instances m_Data1
the current dataset 1.
-
m_Data2
protected weka.core.Instances m_Data2
the current dataset 2.
-
m_UseRowAttribute
protected Boolean m_UseRowAttribute
whether to use the row attribute or not.
-
m_RowAttributeIsString
protected boolean m_RowAttributeIsString
whether the row attribute is a string/nominal attribute or not.
-
m_Indices1
protected int[] m_Indices1
the indices for the first dataset.
-
m_Indices2
protected int[] m_Indices2
the indices for the second dataset.
-
m_Lookup2
protected Hashtable<String,Integer> m_Lookup2
the lookup table of indices for the second dataset.
-
m_Threshold
protected double m_Threshold
the threshold for listing correlations.
-
-
Method Detail
-
globalInfo
public String globalInfo()
Returns a string describing the object.- Specified by:
globalInfo
in interfaceGlobalInfoSupporter
- Specified by:
globalInfo
in classAbstractOptionHandler
- Returns:
- a description suitable for displaying in the gui
-
defineOptions
public void defineOptions()
Adds options to the internal list of options.- Specified by:
defineOptions
in interfaceOptionHandler
- Overrides:
defineOptions
in classAbstractOptionHandler
-
initialize
protected void initialize()
Initializes the members.- Overrides:
initialize
in classAbstractOptionHandler
-
setDataset1
public void setDataset1(PlaceholderFile value)
Sets the first dataset for the comparison.- Parameters:
value
- the dataset
-
getDataset1
public PlaceholderFile getDataset1()
Returns the first dataset for the comparison.- Returns:
- the dataset
-
dataset1TipText
public String dataset1TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setDataset2
public void setDataset2(PlaceholderFile value)
Sets the second dataset for the comparison.- Parameters:
value
- the dataset
-
getDataset2
public PlaceholderFile getDataset2()
Returns the second dataset for the comparison.- Returns:
- the dataset
-
dataset2TipText
public String dataset2TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRange1
public void setRange1(Range value)
Sets the range of attributes of the first dataset.- Parameters:
value
- the range
-
getRange1
public Range getRange1()
Returns the range of attributes of the first dataset.- Returns:
- the range
-
range1TipText
public String range1TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRange2
public void setRange2(Range value)
Sets the range of attributes of the second dataset.- Parameters:
value
- the range
-
getRange2
public Range getRange2()
Returns the range of attributes of the second dataset.- Returns:
- the range
-
range2TipText
public String range2TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRowAttribute1
public void setRowAttribute1(String value)
Sets the index of the attribute used for identifying rows to compare against each other (first dataset).- Parameters:
value
- the index
-
getRowAttribute1
public String getRowAttribute1()
Returns the index of the attribute used for identifying rows to compare against each other (first dataset).- Returns:
- the index
-
rowAttribute1TipText
public String rowAttribute1TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setRowAttribute2
public void setRowAttribute2(String value)
Sets the index of the attribute used for identifying rows to compare against each other (second dataset).- Parameters:
value
- the index
-
getRowAttribute2
public String getRowAttribute2()
Returns the index of the attribute used for identifying rows to compare against each other (second dataset).- Returns:
- the index
-
rowAttribute2TipText
public String rowAttribute2TipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setOutputFile
public void setOutputFile(PlaceholderFile value)
Sets the first dataset for the comparison.- Specified by:
setOutputFile
in interfaceFileWriter
- Parameters:
value
- the dataset
-
getOutputFile
public PlaceholderFile getOutputFile()
Returns the first dataset for the comparison.- Specified by:
getOutputFile
in interfaceFileWriter
- Returns:
- the dataset
-
outputFileTipText
public String outputFileTipText()
Returns the tip text for this property.- Specified by:
outputFileTipText
in interfaceFileWriter
- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setMissing
public void setMissing(PlaceholderFile value)
Sets the first dataset for the comparison.- Parameters:
value
- the dataset
-
getMissing
public PlaceholderFile getMissing()
Returns the first dataset for the comparison.- Returns:
- the dataset
-
missingTipText
public String missingTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
setThreshold
public void setThreshold(double value)
Sets the threshold for the correlation coefficient.- Parameters:
value
- the threshold (0.0 turns it off)
-
getThreshold
public double getThreshold()
Returns the threshold for the correlation coefficient.- Returns:
- the threshold (0.0 means it is turned off)
-
thresholdTipText
public String thresholdTipText()
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the GUI or for listing the options.
-
preRun
protected void preRun()
Before the actual run is executed.- Overrides:
preRun
in classAbstractTool
-
getUseRowAttribute
protected boolean getUseRowAttribute()
Returns whether to use the row attribute or the order in the datasets for matching up the rows.- Returns:
- true if the row attribute is used for matching
-
getRowID
protected String getRowID(int index)
Returns either the ID for the row, either the row index of the actual row attribute ID for that position.- Parameters:
index
- the index to get the ID for- Returns:
- the ID
-
nextByIndex
protected weka.core.Instance[] nextByIndex(int index)
Returns the next pair by simple index.- Parameters:
index
- the index of the pair to retrieve- Returns:
- the row pair or null if not available
-
initLookup
protected void initLookup()
Initializes the lookup table of indices for the second dataset, if necessary.
-
nextByRowAttribute
protected weka.core.Instance[] nextByRowAttribute(int index)
Returns the next pair by using the value of the row attribute.- Parameters:
index
- the index of the pair to retrieve- Returns:
- the row pair or null if not available
-
next
protected weka.core.Instance[] next(int index)
Returns the next row pair to compare.- Parameters:
index
- the index of the pair to retrieve- Returns:
- the row pair or null if not available
-
getCorrelation
protected double getCorrelation(weka.core.Instance first, weka.core.Instance second)
Returns the correlation between the two rows.- Parameters:
first
- the first rowsecond
- the second row- Returns:
- the correlation
-
doRun
protected void doRun()
Performs the comparison.- Specified by:
doRun
in classAbstractTool
-
cleanUp
public void cleanUp()
Cleans up data structures, frees up memory.- Specified by:
cleanUp
in interfaceCleanUpHandler
- Overrides:
cleanUp
in classAbstractTool
-
-