Common Scalability Questions
This is a difficult question to answer because it depends on a variety of factors like the speed of the hardware and the amount of printing volume performed by each user. Hardware is also always improving, so the answer is always changing.
PaperCut’s largest customer currently has approximately 350,000 users. They have been running PaperCut since 2008 without any problems by deploying PaperCut on a well resourced application server and DB server. The application server and Microsoft SQL Server database are running on separate machines. This customer has run PaperCut without performance issues since the original deployment.
Over this time hardware has continually improved. This would allow even more users/printers to be supported.
For sites deploying with more than 100,000 user we recommend booking in some time with the PaperCut development team to discuss the system architecture. We can pass on lessons learnt from other similarly sized deployments covering topics like:
- System architecture and design.
- Suggested hardware and software configurations.
- General system and print management tips for managing large users and print fleets.
PaperCut is developed to best-practice modern database principals. When backed by a leading database server like Microsoft SQL Server and Oracle, PaperCut will continue to operate without loss of performance for standard operations. Even when the database contains millions of print/transaction records operations like print job processing, access to the administration user interface will not reduce in performance.
Even standard reports over fixed date ranges like day/week/month are not impacted because all reports are optimized to make use of database indexes to efficiently. This means that when running a report for recent activity, the number of historical records will not impact performance.
One area impacted by the size of the database, is running reports over the entire database. In this case, the times taken to run reports are in proportion to the size of data being reported. This is not a significant issue, because it is not common to run reports over the entire system history. Most reports are run over recent periods like the last month or year and can take advantage of DB indexes.
In summary, we aim for O(1) performance for standard operations like print job processing, user administration and other day-to-day tasks. For reports across the entire database history will have O(N) performance.
How long does PaperCut take to analyze print jobs to determine the number of pages, color, paper size, etc?
The time taken to analyze print jobs is a function of the size of the print job spool file. For the typical print job the analysis can be performed within a few seconds (2–3 seconds).
PaperCut is used in Engineering firms and design colleges where the spool files of engineering diagrams or graphic posters can be 100s of MB, or even a Gb in size. Jobs of this size can take up to 30 seconds to analyze.
The biggest factor in determining the report generation speed is the period over which reports are generated. The longer the period, the more data must be processed to produce the report.
Running reports over common period ranges (e.g. a month of print activity) will perform quickly. Running reports over longer periods will take proportionally longer based on the amount of data processed.
Another important factor is using a well-resourced database server running a leading database (like Microsoft SQL Server, Oracle, etc). We’ve seen problems in the past with customers running PaperCut on overloaded or under-resourced DB server. We’ve seen this in situations where the DB server is shared amongst multiple applications (e.g. shared with a heavily used finance application) without adequate resources.
How long would it take to create a report of print and copies taken over 1 year where 4 million prints/copies were performed?
This depends a lot on what type of report is being run. There is a big difference between a report counting the total number of pages printed in a single year, when compared to a report that lists each individual print jobs. The performance can also be greatly impacted by the database server hardware.
PaperCut’s reports are all optimized to take advantage of DB indexes to improve report performance.
On good database hardware, a summary report over this data would typically take 1–2 minutes, whilst more detailed report may take several minutes to complete.
PaperCut does not recommend archiving print jobs or transaction logs. Print job data does not consume much disk space when compared to modern hard disk sizes. Many of our large customers have been running PaperCut for many years without archiving data. When running on good database hardware, the system will continue to run optimally even with a large amount of print log data (see above).
With all the historical print records kept in the PaperCut system, they can be easily viewed and reported without needing to access an alternative system.
Modern hard-drives can store 10’s of years of print log data. For some information on database sizing and planning your capacity requirements, please see the capacity planning chapter in the manual.
There is no technical reason to perform any archiving of print records.
By not archiving print records all data can be reported from a single location. This is much more convenient than accessing the data in an alternative system.
If archiving is a must then this can be achieved by:
- leveraging PaperCut’s option to delete old log files, and
- use your database servers backup procedure to save DB snaphots before deleting the old logs.
Yes. PaperCut supports and will work within multiple clustering environments/systems:
- Microsoft Cluster Services
- Microsoft Failover Cluster
- Veritas Cluster
- Novell Cluster Services
- Linux HA
The main features are:
- Ability to spread load across multiple services undertaking different tasks working together as one:
- Database server
- Application server
- Multiple print servers
- Support for high performance external databases such as MS SQL Server and Oracle.
- Support for clustering at both the application, database and print server layer.
- A modern SOA architecture and code base designed from day one with scalability in mind.
- Native support for 64-bit operating systems.
- Fault tolerance (transaction replay) in selected areas on the event of outages - e.g. connection issues between servers.