You are here: Configuration > Integrated Scanning > Integrated Scanning overview > Text-searchable scanning (OCR)

Text-searchable scanning (OCR)

This topic covers:

Overview

Optical Character Recognition (OCR) is the process of taking an image, such as a scanned document, and reconstructing its text. This allows scanned documents to become searchable and/or editable.

Text-searchable documents have two major benefits over other scan outputs:

  • You can search for specific content within the document.
  • If the document has been added to a document management system, you can find the document by searching for its content.

Performing OCR is a resource intensive process that can add seconds or tens of seconds per page to the time it takes to deliver a document. For this reason, enable OCR on scan actions where it is most useful, not where fast delivery is more important.

Currently PaperCut MF supports the following text-searchable file types:

  • PDF (text-searchable)—PDF v1.5 with PDF/A-1B compliance according to the requirements defined by the PDF/A standard.
  • DOCX

OCR processing in the cloud or on-premise

PaperCut MF provides the ability to run the OCR process using the PaperCut MF Cloud OCR service (one of PaperCut's Cloud Services) or using your on-premise infrastructure:

Supported languages

OCR supports extracting text for approximately 100 languages. You can choose to use up to 10 of those languages, however for the best performance, limit your choices to a maximum of four languages.

FAQs


Comments

Share your findings and experience with other PaperCut users. Feel free to add comments and suggestions about this Knowledge Base article. Please don't use this for support requests.