|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectadams.core.ConsoleObject
adams.core.option.AbstractOptionHandler
adams.tools.AbstractTool
adams.tools.CompareDatasets
public class CompareDatasets
Compares two datasets, either row-by-row or using a row attribute listing a unique ID for matching the rows, outputting the correlation coefficient of the numeric attributes found in the ranges defined by the user.
In order to trim down the number of generated rows, a threshold can be specified. Only rows are output which correlation coefficient is below that threshold.
-D <int> (property: debugLevel) The greater the number the more additional info the scheme may output to the console (0 = off). default: 0 minimum: 0
-dataset1 <adams.core.io.PlaceholderFile> (property: dataset1) The first dataset in the comparison. default: .
-range1 <java.lang.String> (property: range1) The range of attributes of the first dataset. default: first-last
-row1 <java.lang.String> (property: rowAttribute1) The index for the attribute used for identifying rows to compare; if not provided, then the comparison is performed row-by-row (first dataset). default:
-dataset2 <adams.core.io.PlaceholderFile> (property: dataset2) The second dataset in the comparison. default: .
-range2 <java.lang.String> (property: range2) The range of attributes of the second dataset. default: first-last
-row2 <java.lang.String> (property: rowAttribute2) The index for the attribute used for identifying rows to compare; if not provided, then the comparison is performed row-by-row (second dataset). default:
-output <adams.core.io.PlaceholderFile> (property: outputFile) The file to save the comparison result in (CSV format). default: output.csv
-missing <adams.core.io.PlaceholderFile> (property: missing) The file to save the information about missing rows to (CSV format). default: missing.csv
-threshold <double> (property: threshold) The threshold for the correlation coefficient; only if the coefficient is below that threshold, it will get output; 0.0 turns the threshold off. default: 0.0 minimum: 0.0 maximum: 1.0
| Field Summary | |
|---|---|
protected weka.core.Instances |
m_Data1
the current dataset 1. |
protected weka.core.Instances |
m_Data2
the current dataset 2. |
protected PlaceholderFile |
m_Dataset1
the first dataset. |
protected PlaceholderFile |
m_Dataset2
the second dataset. |
protected int[] |
m_Indices1
the indices for the first dataset. |
protected int[] |
m_Indices2
the indices for the second dataset. |
protected Hashtable<String,Integer> |
m_Lookup2
the lookup table of indices for the second dataset. |
protected PlaceholderFile |
m_Missing
the output file for missing tests (CSV format). |
protected PlaceholderFile |
m_OutputFile
the output file (CSV format). |
protected Range |
m_Range1
the first range of attributes. |
protected Range |
m_Range2
the second range of attributes. |
protected Index |
m_RowAttribute1
the optional attribute for matching up rows (dataset 1). |
protected Index |
m_RowAttribute2
the optional attribute for matching up rows (dataset 2). |
protected boolean |
m_RowAttributeIsString
whether the row attribute is a string/nominal attribute or not. |
protected double |
m_Threshold
the threshold for listing correlations. |
protected Boolean |
m_UseRowAttribute
whether to use the row attribute or not. |
| Fields inherited from class adams.core.option.AbstractOptionHandler |
|---|
m_DebugLevel, m_OptionManager |
| Constructor Summary | |
|---|---|
CompareDatasets()
|
|
| Method Summary | |
|---|---|
void |
cleanUp()
Cleans up data structures, frees up memory. |
String |
dataset1TipText()
Returns the tip text for this property. |
String |
dataset2TipText()
Returns the tip text for this property. |
void |
defineOptions()
Adds options to the internal list of options. |
protected void |
doRun()
Performs the comparison. |
protected double |
getCorrelation(weka.core.Instance first,
weka.core.Instance second)
Returns the correlation between the two rows. |
PlaceholderFile |
getDataset1()
Returns the first dataset for the comparison. |
PlaceholderFile |
getDataset2()
Returns the second dataset for the comparison. |
PlaceholderFile |
getMissing()
Returns the first dataset for the comparison. |
PlaceholderFile |
getOutputFile()
Returns the first dataset for the comparison. |
Range |
getRange1()
Returns the range of attributes of the first dataset. |
Range |
getRange2()
Returns the range of attributes of the second dataset. |
String |
getRowAttribute1()
Returns the index of the attribute used for identifying rows to compare against each other (first dataset). |
String |
getRowAttribute2()
Returns the index of the attribute used for identifying rows to compare against each other (second dataset). |
protected String |
getRowID(int index)
Returns either the ID for the row, either the row index of the actual row attribute ID for that position. |
double |
getThreshold()
Returns the threshold for the correlation coefficient. |
protected boolean |
getUseRowAttribute()
Returns whether to use the row attribute or the order in the datasets for matching up the rows. |
String |
globalInfo()
Returns a string describing the object. |
protected void |
initialize()
Initializes the members. |
protected void |
initLookup()
Initializes the lookup table of indices for the second dataset, if necessary. |
String |
missingTipText()
Returns the tip text for this property. |
protected weka.core.Instance[] |
next(int index)
Returns the next row pair to compare. |
protected weka.core.Instance[] |
nextByIndex(int index)
Returns the next pair by simple index. |
protected weka.core.Instance[] |
nextByRowAttribute(int index)
Returns the next pair by using the value of the row attribute. |
String |
outputFileTipText()
Returns the tip text for this property. |
protected void |
preRun()
Before the actual run is executed. |
String |
range1TipText()
Returns the tip text for this property. |
String |
range2TipText()
Returns the tip text for this property. |
String |
rowAttribute1TipText()
Returns the tip text for this property. |
String |
rowAttribute2TipText()
Returns the tip text for this property. |
void |
setDataset1(PlaceholderFile value)
Sets the first dataset for the comparison. |
void |
setDataset2(PlaceholderFile value)
Sets the second dataset for the comparison. |
void |
setMissing(PlaceholderFile value)
Sets the first dataset for the comparison. |
void |
setOutputFile(PlaceholderFile value)
Sets the first dataset for the comparison. |
void |
setRange1(Range value)
Sets the range of attributes of the first dataset. |
void |
setRange2(Range value)
Sets the range of attributes of the second dataset. |
void |
setRowAttribute1(String value)
Sets the index of the attribute used for identifying rows to compare against each other (first dataset). |
void |
setRowAttribute2(String value)
Sets the index of the attribute used for identifying rows to compare against each other (second dataset). |
void |
setThreshold(double value)
Sets the threshold for the correlation coefficient. |
String |
thresholdTipText()
Returns the tip text for this property. |
| Methods inherited from class adams.tools.AbstractTool |
|---|
compareTo, destroy, equals, forCommandLine, forName, getTools, postRun, run |
| Methods inherited from class adams.core.option.AbstractOptionHandler |
|---|
cleanUpOptions, debug, debug, debugLevelTipText, finishInit, getDebugLevel, getOptionManager, isDebugOn, newOptionManager, reset, setDebugLevel, toCommandLine, toString |
| Methods inherited from class adams.core.ConsoleObject |
|---|
getDebugging, getSystemErr, getSystemOut, sizeOf |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
protected PlaceholderFile m_Dataset1
protected Range m_Range1
protected Index m_RowAttribute1
protected PlaceholderFile m_Dataset2
protected Range m_Range2
protected Index m_RowAttribute2
protected PlaceholderFile m_OutputFile
protected PlaceholderFile m_Missing
protected weka.core.Instances m_Data1
protected weka.core.Instances m_Data2
protected Boolean m_UseRowAttribute
protected boolean m_RowAttributeIsString
protected int[] m_Indices1
protected int[] m_Indices2
protected Hashtable<String,Integer> m_Lookup2
protected double m_Threshold
| Constructor Detail |
|---|
public CompareDatasets()
| Method Detail |
|---|
public String globalInfo()
globalInfo in class AbstractOptionHandlerpublic void defineOptions()
defineOptions in interface OptionHandlerdefineOptions in class AbstractOptionHandlerprotected void initialize()
initialize in class AbstractOptionHandlerpublic void setDataset1(PlaceholderFile value)
value - the datasetpublic PlaceholderFile getDataset1()
public String dataset1TipText()
public void setDataset2(PlaceholderFile value)
value - the datasetpublic PlaceholderFile getDataset2()
public String dataset2TipText()
public void setRange1(Range value)
value - the rangepublic Range getRange1()
public String range1TipText()
public void setRange2(Range value)
value - the rangepublic Range getRange2()
public String range2TipText()
public void setRowAttribute1(String value)
value - the indexpublic String getRowAttribute1()
public String rowAttribute1TipText()
public void setRowAttribute2(String value)
value - the indexpublic String getRowAttribute2()
public String rowAttribute2TipText()
public void setOutputFile(PlaceholderFile value)
setOutputFile in interface OutputFileGeneratorvalue - the datasetpublic PlaceholderFile getOutputFile()
getOutputFile in interface OutputFileGeneratorpublic String outputFileTipText()
outputFileTipText in interface OutputFileGeneratorpublic void setMissing(PlaceholderFile value)
value - the datasetpublic PlaceholderFile getMissing()
public String missingTipText()
public void setThreshold(double value)
value - the threshold (0.0 turns it off)public double getThreshold()
public String thresholdTipText()
protected void preRun()
preRun in class AbstractToolprotected boolean getUseRowAttribute()
protected String getRowID(int index)
index - the index to get the ID for
protected weka.core.Instance[] nextByIndex(int index)
index - the index of the pair to retrieve
protected void initLookup()
protected weka.core.Instance[] nextByRowAttribute(int index)
index - the index of the pair to retrieve
protected weka.core.Instance[] next(int index)
index - the index of the pair to retrieve
protected double getCorrelation(weka.core.Instance first,
weka.core.Instance second)
first - the first rowsecond - the second row
protected void doRun()
doRun in class AbstractToolpublic void cleanUp()
cleanUp in interface CleanUpHandlercleanUp in class AbstractTool
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||