Name

adams.flow.transformer.TesseractOCR


Synopsis

Applies OCR to the incoming image file using Tesseract.
In case of successful OCR, either the file names of the generated files are broadcast or the combined text of the files.
NB: The actor deletes all files that have the same prefix as the specified output base. Something you need to be aware of when doing OCR in parallel or generate other files with the same prefix.

For more information on tesseract see:
https://github.com/tesseract-ocr/tesseract


For more information on hOCR see:
https://en.wikipedia.org/wiki/HOCR


Additional information

Flow input/output:
- input: java.lang.String, java.io.File, adams.data.image.AbstractImageContainer
- output: java.lang.String[]


Options