Available in PaperCut MF only.

Text-searchable scanning (OCR)

This topic covers:


Optical Character Recognition (OCR) is the process of taking an image, such as a scanned document, and reconstructing its text. This allows scanned documents to become searchable and/or editable.

Text-searchable documents have two major benefits over other scan outputs:

  • You can search for specific content within the document.

  • If the document has been added to a document management system, you can find the document by searching for its content.

Performing OCR is a resource intensive process that can add seconds or tens of seconds per page to the time it takes to deliver a document. For this reason, enable OCR on scan actions where it is most useful, not where fast delivery is more important.

Currently PaperCut MF supports the following text-searchable file types:

  • PDF (text-searchable)—PDF v1.5 with PDF/A-1B compliance according to the requirements defined by the PDF/A standard.

  • DOCX

OCR processing in the cloud or on-premise

PaperCut MF provides the ability to run the OCR process using the PaperCut MF Cloud OCR service (one of PaperCut's Cloud Services) or using your on-premise infrastructure:

Supported languages

OCR supports extracting text for approximately 100 languages. You can choose to use up to 10 of those languages, however for the best performance, limit your choices to a maximum of four languages.