Choose your language

Choose your login

Contact us

Set up self-hosted Document Processing

This page applies to:

To set up self-hosted Document Processing, you need to:

  1. Determine where to install Document Processing

  2. Install Document Processing

  3. Configure the host location and available languages

  4. Tuning Document Processing server performance

Document flow when using our self-hosted document processing

The below diagram shows how a scan document travels through the PaperCut MF system when using our self-hosted document processing.

Please note, documents delivered to a Cloud Storage endpoint (Dropbox, OneDrive, Google Drive, etc.) will continue to be delivered via our PaperCut Cloud Services

The flow of a scan document when using one or more self-hosted document processing servers

Step 1: Determine where to install Document Processing

For smaller environments, it makes sense to install Document Processing alongside the Application Server. In medium to larger environments, though, you can ensure optimum system and Application Server performance by setting up one or more dedicated Document Processing servers that the Application Server can contact.

See the table below for recommendations.

Environment sizeApprox. scan jobs per dayRecommended processors*Recommended installation locationBenefits
Small0 – 502Application ServerLess infrastructure cost.Great for smaller business with occasional Document Processing load
Medium50 – 2003Start on a well- resourced Application Server. Monitor and plan for a separate server on an as-needed basis.Balances resource use, system performance, and Document Processing performance.
Large200+4+One or more separate high performing Document Processing serversDedicated resources mean better handling of high scanning load, spikes, and multiple jobs. For example, in larger Enterprise or Education environments.Document Processing’s heavy resource requirements don’t interfere with the normal operation of the Application Server.

*Recommended available processors to use (to support parallel jobs).

Keep in mind that the more storage and processing power available, the better Document Processing performs—make as much available as you can. For any environment size, we recommend:

  • at least 10 GB available disk space

  • 512 MB available memory

  • running a 64-bit edition of Microsoft Windows.

For information about:

Step 2: Install Document Processing

  1. Download and install both of the following:

  2. Download the Document Processing (OCR) installer.

  3. On the Document Processing server, run the file. The Setup Wizard is displayed.

    Self-hosted document processing setup wizard
  4. Follow the prompts during the install.

    • If you intend to scan documents to PDF, ensure that the GhostTrap component is selected for installation.

    • If you intend to scan to DOCX, ensure that the Pandoc component is selected for installation.

    On Windows servers, the installer configures the Windows Firewall.

  5. If you are using a non-Windows Firewall, open port 9181 (inbound) to allow connections from the PaperCut MF Application Server.

  6. Repeat the process for each Document Processing server you wish to add.

Step 3: Configure the host location and available languages

  1. In the PaperCut MF Admin web interface, do one of the following:

    • If you’re already on the Capture page, refresh the page.

    • Click Options > Capture. The Capture page is displayed.

    Setting to switch to self-hosted document processing in the PaperCut MF admin interface
  2. In the Hosting area, select Use self-hosted Document Processing (requires additional setup).

  3. In the Add Document Processing Server area, in Hostname, type the hostname or the IP address of the server where you installed Document Processing.

  4. Click Add.

  5. If you want to set up multiple Document Processing servers, click Add new Document Processing Server; then repeat steps 3 and 4.
    Each Document Processing server is listed on the Capture tab.

    PaperCut MF admin interface with a self-hosted document processing server successfully connected and online
  6. Click Apply.

  7. Ensure that your scan actions have been configured with the desired Document Processing options enabled.

  8. Run a test job for each configured Document Processing option and check the output files.

Step 4: Tuning Document Processing server performance

The approach to tuning a Document Processing server’s performance depends on whether it’s on a standalone system or co-located with other services.

By default, a Document Processing server processes two jobs in parallel, and they are processed with a normal CPU priority. As described below, you can change the default number of parallel jobs by modifying the configuration file at:

[ocr-server-path]/data/config/config.toml

After making changes to the config file, you’ll need to restart the Windows service: PaperCut OCR Server.

Tuning for installation on a standalone system

For best performance when installing the Document Processing server on a standalone system, it’s a good idea to maximize the number of jobs that can be processed in parallel.

The ideal number to use depends on many factors, such as the type and size of the documents being processed and the system architecture. A reasonable starting point is to use the total number of virtual CPUs (or cores times threads on a “bare metal” system) minus two.

Put another way, if you want to process four jobs in parallel and you’re installing Document Processing on a virtual machine, give it six virtual CPUs and adjust the below configuration key accordingly.

To make this change:

  1. In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.

  2. Set the MaxJobsInParallel line to MaxJobsInParallel = 4

  3. Restart the Windows service: PaperCut OCR Server

Tuning for co-location with the Application Server

If your system has additional available processors (beyond what the Application Server is using), you might want to consider increasing the number of jobs that are processed in parallel from the default of two.

To make this change:

  1. In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.

  2. Set the MaxJobsInParallel = 3

  3. Restart the Windows service: PaperCut OCR Server

Comments