Category: The Document Scanning Blog

The Document Scanning Blog

The Document Scanning BlogDocument scanning (or document imaging, as some call it) isn’t as simple as it seems and a lot of thought needs to go into getting it right.

This blog is an attempt at sharing some of these insights for discussion.

Read more

Scanning Cost Estimation

How many Pages of Paper do I have?

High Filing Cabinet
In our previous article, we addressed some issues over whether there is a business case for considering document imaging / document scanning.

One of the first questions that needs to be answered when actually trying to calculate the return on investment for your backfile scanning, is how many pages need scanning?

This is the crux of your business case and is crucial for determining whether a return on investment is likely. Fortunately, calculating quantities of paper is not that difficult and just needs some key estimates and some simple arithmetic.

The basic formula for the calculation is to multiply the number of linear feet of documents you have by the average number of pages in a linear foot. Of course, if you don’t like using feet, you can swop it for any other unit of measurement you want, as the principle remains the same.

So, it seems fairly straight forward – you take your tape measure to your pile of papers to measure how many feet of documents you have, then get an average number of documents per foot from a few samples measuring a foot each, and then multiply the two numbers out. The secret here is to take samples from different positions in the pile, since the number of documents could easily vary between 500 and 2,000 per foot depending on how compressed the pages have become from the weight of the other documents on top of them.

The problem of course is that it is highly unlikely you are going to have piles of paper lying around to be measured using a tape measure (at least I hope not as they aren’t likely to survive for long!), so it is useful to have a few estimates available for the typical storage methods used in order to help speed up the process.

The following should help to get you started, although, as per the example above, the numbers can vary significantly from case to case, so it is important to test each estimate before committing them to your business case.

 – One inch of loose papers typically contains around 150 pages when reasonably compressed
 – A standard 4-drawer vertical filing cabinet usually contains around 11,000 pages
 – A horizontal filing cabinet stores around 5,000 pages per drawer
 – A normal 3-ring binder file holds around 100 pages
 – A lever-arch file has approximately 300 pages
 – A typical storage box stores between 1,800 and 2,500 pages
 – While the larger “banker’s box” stores around 4,000 pages

It is always worthwhile noting during the exercise which of your documents are single-sided and which are duplex, since it will make a difference in your cost calculations later.

Hopefully, these will give you a jump-start to developing that killer business case, but you should approach the document scanning experts if you require assistance.

Paper Scanning Techniques

Backlog Scanning vs Today Forward Scanning – Should we treat them differently?

Paper Scanning Techniques

Most companies that are considering implementing document image scanning receive paper documents (such as application forms) every business day and consequently have both “current” documents (still to be processed) and “historical” documents (that have already been processed). The question often asked is whether to treat the two sets of documents the same or not? More particularly, should we be using the same paper scanning techniques and methods to process these documents?  

To answer the question we should really look at the two scenarios in more detail.

Processing historical backlogs is usually fairly simple – in most cases all we need to do is tie the document image back to an existing transaction (in the company’s ERP, CRM or similar system) and to then archive the document image in such a way that it can be retrieved as and when necessary. Generally this entails someone, at the time of processing the paper-based transaction, to physically write a transaction identifier (or sticking a pre-generated barcoded label) onto the front of the paper document, or if such foresight didn’t happen at the time, to search through an internal system for the corresponding transaction at the time of indexing. Although I say “simple”, to do this task accurately can be quite difficult, labour intensive and time consuming (which is why outsourcing the task to a professional scanning services company is usually the best approach). A good solution would probably have the capability of doing this search automatically through database “lookups”.

On the other hand, scanning current documents, such as application forms, on a daily basis requires another set of company policy decisions to be made. The simplest method is to process the transaction manually by capturing the relevant data off the paper document and, once the process is complete, to send the document to the scanning department to be scanned; attached to the transaction via an index; and archived. This is very similar approach to scanning historical documents, with the only difference being the timing involved. Because of the minimal changes required to the company’s procedures, this is often the approach taken, but is not necessarily the most productive or efficient one. However, because of the similarities to historical backlog scanning, it does mean that a centralised scanning department can be set up that can easily cope with both.

If you are considering taking this approach, I would recommend introducing the barcoded label approach into the process as soon as possible, as it would certainly make indexing a lot simpler and more accurate when you eventually do decide to implement scanning.

However, to make the best use of the benefits of document scanning, as discussed in my previous blog, it would make more sense to use the image of the document as the input into creating the transaction. In other words, capture the data off the image instead of off the paper document. Applying our minds to this, we should realise that this will entail some fairly drastic changes to the way you might have processed these documents in the past.

To name a few, there is the very real issue of handling the fear of change within staff members that have probably been working with paper in a specific manner for many years. This is an issue that shouldn’t be underestimated and you should budget to spend close to 25% of your project budget on handling change management correctly.

Then there is a good possibility that your current infrastructure wouldn’t be able to handle the traffic of moving electronic images around your network, nor would your existing computer monitors likely be of the quality or size to optimally view the document images.

To do things properly your clerks should be able to view the document image in the same dimensions, or bigger, as the original paper document. Given that they also need to capture the information off the document into the company’s systems, it typically means that their screen size should also be large enough to display the capture screen at the same time. Many companies provide their data capture clerks with two monitors configured to work side by side, where they display the document image on the one and the capture screen on the other.

This brings us to the automated data capture tools which “reads” the document image and transfers the appropriate text directly into the data capture screen without human intervention. There are a number of technologies involved in this type of technique and will be the subject of another topic later in my blog series, given the advances that have happened in this area over the last few years. It is certainly worth considering these technologies, given the benefits to productivity and efficiency that can be obtained through minimising the amount of human interactions involved. However, for the sake of this discussion, I will treat them as a “black box”, where you simply provide them with a document image and they return a set of completed fields in your ERP or CRM system, from which you can kick off a transaction.

As you might expect, this is the domain of specialised software techniques and products, which are generally more expensive than your run of the mill scanning product and become significantly more expensive as you add the “bells and whistles”. More importantly, the vendors of these products generally charge for each document that is processed through their software. This actually makes sense because companies that process applications on a daily basis typically have calculated an average cost for processing an individual document and if the software can improve the speed and accuracy in capturing the document, it will reduce the company’s costs and generally improve the customer service it can deliver. Importantly, this could result in the company winning more business than their competitors if they can respond more quickly. Based on this, many companies can gain significant competitive benefits by implementing such solutions and are willing to pay to do so.

Now if you decide to go this route in order to process your Today Forward documents, can you use the same solutions to process your backlog?

The answer, in theory, is yes, but unfortunately the practical situation seems to be different. Theoretically, these solutions use the same scanning and indexing techniques as their cheaper counterparts and should therefore be able to process backlogs, but the problem lies in their licensing procedures. As discussed above, because they charge per document scanned, it becomes prohibitively expensive to use these solutions to tackle vast numbers of documents where all you want to do is to scan, index and archive them, which is typically the situation with backlogs.

So, the solution seems to be to consider the two situations as two separate projects and to evaluate exactly what you want to achieve with each. In many situations you could use the same solution, either separately for each project or simply to use the same infrastructure for backlog scanning as and when there are opportunities to do so, but in other cases two separate solutions might be more financially feasible. This is especially true if you consider using an outsourcing service provider, as suggested previously.

The Document Scanning Business Case

Why Scan at all?

The Document Scanning Business Case
So, now we know that document scanning / imaging is not as simple as we first thought – is there a document scanning business case / Return on Investment (RoI) at all?

In some cases there might not be. To determine if it is appropriate in your case, you would need to consider the inherent value of the information contained on your paper documents. It might be easier to simply keep a record of the relevant information somewhere in a database or spreadsheet and physically go look for the paper document in a filing system if and when you need it. However there are inherent problems with this approach, especially if you have a lot of documents.

Let’s take a look at some of these.

The obvious one is the speed of retrieval of your documents. A case I am particularly fond of is one described in The Imaging Product News Magazine which quoted a Price Waterhouse study of a paralegal being tasked with finding 20 documents out of 20,000. The task took him 67 hours and even then he could only find 15 documents. You can imagine the problems involved at sites with millions of unfiled documents! The result is that many companies are turning to document imaging simply to improve their searching capabilities. And, from my perspective of someone involved in providing scanning solutions, it is such a pleasure for me to see staff morale jump when they can respond to their customers’ queries virtually instantaneously because they have the documents electronically available at the touch of a few keys.

A side effect of creating document images is that it allows multiple people to have access to the same document at the same time while still controlling the versioning of the document. I remember, before I got into imaging, how we would make photocopies of a document and would quickly lose track of which was the master document and which were the copies. We would spend hours trying to reconcile the various copies. Electronic document management, on the other hand, has now progressed to a point where each user can have their own (or even multiple) annotation layer/s on a document on which to make their notes; while still being able to view other users’ annotations and, if security allows, to edit them, while everything is managed correctly by the system. Try doing that with a paper-based filing system!

The next is often the sheer space (and often weight) involved in storing paper. I have just seen a situation where a company has been forced to move towards document imaging because the weight of the paper stored in their fourth floor office had caused the windows to crack through the stress of the weight of the paper on the floor! My first thought was that I was glad I didn’t work on the floor beneath them!

Besides, storing paper is expensive. A recent Coopers & Lybrand study suggests it costs around $20 to simply file a document over its lifetime; $120 to find a misfiled document; and $250 to recreate a lost document. Compare that with the costs of scanning, and storing your documents electronically in a document management system and you will discover you can do so much more at a fraction of the cost. If you don’t want the asset expense of buying scanners, servers and software, then investigate the costs of using a Scanning Services Company like our own, where we will do the scanning for you, even on-site if you so wish. Many of these services companies will even host your documents on the web for you so that you don’t have to concern yourself with managing the storage of the documents at all. I’ll guarantee you that if you do your sums correctly you will quickly realise you could be saving your company a lot of money by moving to document imaging.

Of course, this brings up the issue of what to do with your documents after they have been converted to images. Can you destroy them once you have scanned them, or do you need to keep them for a period of time? This is a whole new topic which should be addressed by itself and of course I will do that in due course. There are many different views to take into consideration and it should be an interesting discussion.

Then there is the aspect that many people try their hardest to ignore, because the consequences are so dire and it just doesn’t bear thinking about – a natural disaster. How many of you heard about the fire at one of the document warehouses in Midrand a few years back and thought “thank goodness that wasn’t us”? But think about it – what if it were your documents that were destroyed? So many of us have had it drummed into us over the years to do backups of our PC / laptop / server, but what of our paper documents? They just don’t seem to be viewed in the same way. However, once we have gone to the effort of scanning our documents, the idea is for it to become standard practise to make a backup copy of our electronic images onto DVD or CD and to store them off-site with our backup copies of our electronic data. In this way we can recreate the documents if ever it is needed, which hopefully never happens.

Of course I could continue with this blog (it’s already longer than I thought it would be) by starting to talk about other aspects such as the environmental aspect of using 3.4 billion pages of paper every day in creating new documents and how that number is growing by 40% every year, but then you might call me a tree-hugger and I’m afraid I haven’t got there yet…..

If you need help developing a case for document scanning RoI, then give us a call

Introduction to Document Scanning / Indexing

A quick look into the World of Document Scanning & Indexing

Document Scanning Indexing
The idea here is to have a bash at introducing you to the world of document scanning (or document imaging, as it is sometimes called).

Many people have this notion that, in order to convert their paper documents to digital images, all they need to do is buy a scanner and connect it to their “computer”, use the “software” that comes with the scanner and, voila, they are off and running…

That is sometimes true when you are scanning your private documents or photographs, but it is seldom true when it comes to document management in business. Unfortunately, many only find this out once they have bought all the other components and are ready to begin capturing their documents. They then find out that they have no budget left to do it properly and so look for the cheapest approach. As usual, as with anything in life, you get what you pay for and the project is probably doomed to failure.

On the other hand, doing your research can be overwhelming, as you start encountering:
(And this isn’t close to being all of them…)

  • document logging and tracking, prep, post-prep, batch headers
  • structured, semi-structured and unstructured documents
  • dynamic, static and legal documents
  • 2-D and 3-D barcodes, patchcodes, separator sheets
  • forward scanning, back scanning, centralised scanning, distributed scanning, bureau scanning,
    outsourced scanning
  • flatbeds, ADFs, MFDs, handhelds, camera-based
  • ISIS, TWAIN, simplex, duplex
  • auto-rotation, deskew, despeckle, ACD, cropping, dithering, drop-out colours, endorsing
  • quality control, form recognition, data capture, forms processing, data entry, CADE, ODBC, data extraction, indexing
  • double keying, zonal locators, validation, verification, thresholds
  • OCR, ICR, IMR, MICR, full text, fuzzy logic, metadata
  • greyscale, bitmaps, pixels, bpi, bpp, thumbnails
  • blobs, BMPs, JPEGs, PDFs, TIFFs, compression ratios
  • SANs, NASs, WORMs, DASD, RAID, CD/R, CD-ROM, DVD 
  • document control, document management, content management, revisioning, document archiving, records management, retention policies, taxonomies
  • collaboration, BPA, BPM, workflow

Our intention is to try to shed some light on these and other issues (hopefully making it a little more interesting than it looks…)

So sit back and let’s see where this journey into Document Scanning takes us…