Scan PDF compression
When is scan compression useful?
Use scan compression to reduce a scan’s PDF file size. This is useful when you need to send PDFs by email, and also reduces the amount of file storage required. Many organizations impose file size restrictions on email attachments, so the scanned PDF files need to be an email-friendly file size.
Enabling any of these document processing features causes the PDF file sizes to increase:
- OCR (optical character recognition) to create searchable PDFs
- Blank page removal
- Batch splitting
When PaperCut Hive processes a document with any of the features above, it converts the PDF pages into PNG image files to analyze them. For example, it converts pages to a PNG to optically recognize the text to create the searchable OCR text layer, or to check how blank a page is to decide if it can be removed (if blank page removal is enabled).
After conversion to PNG, the pages are reassembled in a PDF. The conversions across different file types increase the final PDF file size.
Lossy compression technology
To prioritize file size reduction, PaperCut Hive uses lossy compression. Lossy compression suits a variety of situations where file size is a primary concern, and the quality of the document can tolerate some loss of detail or information.
Lossy image compression is a method of reducing an image’s file size by selectively discarding some of the image data. It removes information that is less important or perceptually less significant, such as high detail or color information that is outside the range of human vision.
Advantages of lossy image compression
- Smaller file size - Lossy compression can result in significantly smaller file sizes, which can be beneficial when storing large numbers of files.
- Faster transmission - Smaller file sizes also means faster transmission over networks or the internet, which can improve the user experience.
- Reasonable quality - Lossy compression can achieve good image quality with a reasonable degree of compression, making it suitable for most applications where highest-quality images are not essential.
Disadvantages of lossy image compression
- Loss of quality - The main disadvantage of lossy compression is that it can result in a loss of image quality. Depending on the degree of compression and the characteristics of the image, this loss of quality can be noticeable and may reduce the usefulness of the image.
- Irreversibility - After an image is compressed using a lossy method, it is not possible to recover the original image. This can be problematic if the original image needs to be restored because the compression method is no longer suitable for the intended use.
- Limited editing - Lossy compression can make it more difficult to edit the image, especially if it involves resizing or cropping. This is because the compressed image may contain artifacts or other irregularities that can affect the editing process.
- Poor performance on certain types of images - Lossy compression might not perform well on certain types of images, such as those with high contrast or fine details.
Compression level recommendations
Choose your required compression level based on the specific needs and requirements of your use case. If image quality is essential or if the image needs to be edited or manipulated, no compression or low compression might be the better choice. However, if file size and transmission speed are the main considerations, compression might be more suitable.
PaperCut Hive’s OCR Add-on offers these 3 compression levels for scan output PDFs:
- Low compression - minimize the downsides of lossy compression (detailed above),
- Medium compression - trade off between file size and image quality
- High compression - smallest file size, poorer quality text and images
All of these compression levels use lossy compression.
In general, choose Low compression to minimize the downsides of lossy compression detailed above, and Medium or High compression if file size is your greatest concern.
Here are some use cases:
- When the document contains images or graphics with fine details, select low compression to preserve the visual clarity of the document.
- When the document needs to be printed, select low compression to ensure that the document prints with high quality, especially if it contains graphics, images or text with small fonts. If a PDF with high compression is printed, the output quality may suffer.
- When the document needs to be edited, select low compression because the PDFs are often easier to edit than highly compressed PDFs, especially if the document contains images or graphics. High compression can lead to artifacts or errors when editing the document.
- Email attachments can have size limits, so using high compression PDFs can help you send large files as attachments more easily.
Overall, deciding on which PDF compression level to use for scanned documents depends on your specific requirements and intended use of the PDF document. Always consider factors such as image quality, print quality, editability, security, and archival value when making this decision.
Compression is not recommended if you need to preserve the quality of sensitive documents for future reference e.g. legal documents, patient records.
How to set the compression level
All document processing features belong to the OCR Add-on, which requires an additional paid subscription to the PaperCut Hive organization.
To subscribe to and enable the OCR Add-on, go to Add-ons > OCR Add-on.
To access and enable the OCR Add-on’s features:
- In the PaperCut Hive admin console, click Easy Print & Scan > Integrated Scanning.
- Either click Add Quick Scan or edit an existing one that shows the OCR ADD-ON IN USE label.
- Go to the PDF Post Process tab.
Under OCR Add-on Settings, click PDF Compression.