weka.hadoop
Class HadoopExperiment

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by weka.hadoop.HadoopExperiment
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class HadoopExperiment
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool


Nested Class Summary
static class HadoopExperiment.Map
          Mapper class that read a whole file as input if its size is small, or read N lines as an inputsplit if input file is large.
static class HadoopExperiment.Reduce
          Collect text data from Mapper and write to output file.
 
Field Summary
protected  int folderCount
          record total input size, datasets * algorithms * repetition * folds
protected  int inputSize
          record total input size, datasets * algorithms * repetition * folds
protected  String[] m_AdditionalMeasures
           
protected  int m_attID
          Attribute index of instance identifier (default -1)
protected  Classifier m_Classifier
          The classifier used for evaluation
static HadoopExperiment m_Exp
          HadoopExperiment object to be used in Map/reduce classes
protected  int m_IRclass
          Class index for information retrieval statistics (default 0)
protected  int m_NumFolds
          The number of folds in the cross-validation
protected  boolean m_predTargetColumn
          Flag for prediction and target columns output.
protected  int m_Repetition
          Repetition number
protected  InstancesResultListener m_ResultListener
           
protected  CrossValidationResultProducer m_RP
          Default ResultProducer
protected  ClassifierSplitEvaluator m_SplitEvaluator
          two different Split Evaluators
protected  RegressionSplitEvaluator m_SplitEvaluator2
           
protected  ArrayList<String> measures
          Store additional measurements value
protected  String num
          String value of input lines split size for hadoop
protected static int NUM_IR_STATISTICS
          The number of IR statistics
protected static int NUM_UNWEIGHTED_IR_STATISTICS
          The number of unweighted averaged IR statistics
protected static int NUM_WEIGHTED_IR_STATISTICS
          The number of averaged IR statistics
protected static int Regression_RESULT_SIZE
           
protected static int RESULT_SIZE
          The length of a result
protected  String uniqueFile
           
protected  String uniqueFolder
           
 
Constructor Summary
HadoopExperiment()
           
 
Method Summary
static void determineLinesPerMap(int number)
          Method to determine how many lines to read per input split.
static Double getTimestamp()
          Get current time info.
static void main(String[] args)
          Main method to run Hadoop experiment.
 int run(String[] args)
          Setting up hadoop job and run
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

RESULT_SIZE

protected static final int RESULT_SIZE
The length of a result

See Also:
Constant Field Values

Regression_RESULT_SIZE

protected static final int Regression_RESULT_SIZE
See Also:
Constant Field Values

NUM_IR_STATISTICS

protected static final int NUM_IR_STATISTICS
The number of IR statistics

See Also:
Constant Field Values

NUM_WEIGHTED_IR_STATISTICS

protected static final int NUM_WEIGHTED_IR_STATISTICS
The number of averaged IR statistics

See Also:
Constant Field Values

NUM_UNWEIGHTED_IR_STATISTICS

protected static final int NUM_UNWEIGHTED_IR_STATISTICS
The number of unweighted averaged IR statistics

See Also:
Constant Field Values

m_NumFolds

protected int m_NumFolds
The number of folds in the cross-validation


m_Repetition

protected int m_Repetition
Repetition number


m_attID

protected int m_attID
Attribute index of instance identifier (default -1)


m_IRclass

protected int m_IRclass
Class index for information retrieval statistics (default 0)


m_RP

protected CrossValidationResultProducer m_RP
Default ResultProducer


m_ResultListener

protected InstancesResultListener m_ResultListener

m_predTargetColumn

protected boolean m_predTargetColumn
Flag for prediction and target columns output.


m_Classifier

protected Classifier m_Classifier
The classifier used for evaluation


m_SplitEvaluator

protected ClassifierSplitEvaluator m_SplitEvaluator
two different Split Evaluators


m_SplitEvaluator2

protected RegressionSplitEvaluator m_SplitEvaluator2

measures

protected ArrayList<String> measures
Store additional measurements value


m_AdditionalMeasures

protected String[] m_AdditionalMeasures

num

protected String num
String value of input lines split size for hadoop


uniqueFile

protected String uniqueFile

uniqueFolder

protected String uniqueFolder

inputSize

protected int inputSize
record total input size, datasets * algorithms * repetition * folds


folderCount

protected int folderCount
record total input size, datasets * algorithms * repetition * folds


m_Exp

public static HadoopExperiment m_Exp
HadoopExperiment object to be used in Map/reduce classes

Constructor Detail

HadoopExperiment

public HadoopExperiment()
Method Detail

getTimestamp

public static Double getTimestamp()
Get current time info.

Returns:
time information in double

run

public int run(String[] args)
        throws Exception
Setting up hadoop job and run

Specified by:
run in interface org.apache.hadoop.util.Tool
Parameters:
args - type of String[]
Returns:
0 if job runs successfully
Throws:
Exception

determineLinesPerMap

public static void determineLinesPerMap(int number)
Method to determine how many lines to read per input split.

Parameters:
number - an total amount of lines will occur in the final output file, given knowledge of datasets, algorithms, folds and runs number,.

main

public static void main(String[] args)
                 throws Exception
Main method to run Hadoop experiment. In the end, it will produced the required CSV file, and an arff file with same name.

Parameters:
args - type of String[]
Throws:
Exception


Copyright © 2013 University of Waikato, Hamilton, NZ. All Rights Reserved.