Class TesseractConfiguration

  • All Implemented Interfaces:
    AdditionalInformationHandler, CleanUpHandler, Destroyable, GlobalInfoSupporter, LoggingLevelHandler, LoggingSupporter, OptionHandler, QuickInfoSupporter, ShallowCopySupporter<Actor>, SizeOfHandler, Stoppable, StoppableWithFeedback, VariablesInspectionHandler, VariableChangeListener, Actor, ErrorHandler, Serializable, Comparable

    public class TesseractConfiguration
    extends AbstractStandalone
    Setup parameters for tesseract.
    For more information see:
    https://github.com/tesseract-ocr/tesseract

    Valid options are:

    -logging-level <OFF|SEVERE|WARNING|INFO|CONFIG|FINE|FINER|FINEST> (property: loggingLevel)
        The logging level for outputting errors and debugging output.
        default: WARNING
     
    -name <java.lang.String> (property: name)
        The name of the actor.
        default: TesseractConfiguration
     
    -annotation <adams.core.base.BaseText> (property: annotations)
        The annotations to attach to this actor.
        default: 
     
    -skip <boolean> (property: skip)
        If set to true, transformation is skipped and the input token is just forwarded 
        as it is.
        default: false
     
    -stop-flow-on-error <boolean> (property: stopFlowOnError)
        If set to true, the flow gets stopped in case this actor encounters an error;
         useful for critical actors.
        default: false
     
    -executable <adams.core.io.PlaceholderFile> (property: executable)
        The tesseract executable to use.
        default: /usr/bin/tesseract
     
    -config-file <adams.core.io.PlaceholderFile> (property: configFile)
        The (optional) config file for tesseract; ignored if pointing to a directory.
        default: ${CWD}
     
    Author:
    fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    Serialized Form
    • Field Detail

      • m_Executable

        protected PlaceholderFile m_Executable
        the executable to use.
      • m_ConfigFile

        protected PlaceholderFile m_ConfigFile
        the (optional) config file to use.
    • Constructor Detail

      • TesseractConfiguration

        public TesseractConfiguration()
    • Method Detail

      • getDefaultExecutable

        protected PlaceholderFile getDefaultExecutable()
        Returns the default executable to use.
        Returns:
        the exectuable
      • setExecutable

        public void setExecutable​(PlaceholderFile value)
        Sets the tesseract executable to use.
        Parameters:
        value - the executable
      • getExecutable

        public PlaceholderFile getExecutable()
        Returns the tesseract executable in use.
        Returns:
        the executable
      • executableTipText

        public String executableTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • setConfigFile

        public void setConfigFile​(PlaceholderFile value)
        Sets the config file, ignored if pointing to directory.
        Parameters:
        value - the config file
      • getConfigFile

        public PlaceholderFile getConfigFile()
        Returns the config file, ignored if pointing to directory.
        Returns:
        the config file
      • configFileTipText

        public String configFileTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the GUI or for listing the options.
      • getCommand

        public String[] getCommand​(String input,
                                   String outputbase,
                                   TesseractLanguage lang,
                                   TesseractPageSegmentation seg,
                                   boolean hocr)
        Assembles the tesseract command for the given input/output.
        Parameters:
        input - the input file to process
        outputbase - the output base to use
        lang - the language to use
        seg - the segmentation to use
        hocr - whether to output in hOCR format instead of ASCII
        Returns:
        the command