How OCR scanning helps with preservation
OCR scanning services are a vital step in the process of digitising written documents and not only preserving them for future use, but making that printed text digital and editable.
Optical Character Recognition is the ability to automatically detect printed characters on a scanned page and convert them into their digital equivalent.
It offers all of the usual benefits of document preservation, including the ability to duplicate backup copies of the document, store them in multiple secure locations and dispose of the originals.
But it also makes static documents dynamic, so that you can edit them – if appropriate – or to convert them into digital text that is static but searchable.
How OCR document scanning works
OCR scanning starts like any document scanning and digitisation process, by scanning a digital image of the page – just like making a photocopy, but with the image saved as a file instead of printed onto another sheet of paper.
From there, the software takes over, using sophisticated optical character recognition to detect individual letters, numbers and punctuation marks on the page.
These are then automatically converted to their digital equivalents and can be manually checked for errors or saved as a text document in an appropriate format such as .txt, Microsoft Word .docx or as a static but searchable PDF.
Why is OCR scanning important?
OCR document scanning services have several advantages when scanning and digitising text documents, or the text content of mixed media documents.
Some of the benefits include:
- Text can be edited once it has been digitised, allowing old documents to be brought up to date and saved to disk or reprinted as hardcopy.
- Text can be searched – either in editable format or in a PDF file – making retrieval of important information much faster.
- Text takes up much less disk space than an image of a page of text, even when compressed and saved as a monochrome image.
This all adds up to make your document archiving more streamlined, faster to use and more dynamic if you are digitising records that should be possible to edit in the future.
OCR document scanning for numbers and tables
OCR scanning can detect numbers and symbols as well as letters and punctuation, and as such this is also a way in which numbers and tables can be digitised from printed format.
Tables can be saved in formats including Microsoft Excel and .csv, an open format that stands for Comma Separated Values and saves tabular data as a plain text list that can be opened and understood by spreadsheet applications.
Again, this all makes data searchable and, if appropriate, editable, so that your printed tables are no longer static, but incorporated into your dynamic digital archives.
How accurate are OCR scanning services?
OCR document scanning services can be as high as 99% accurate even without any proofreading, providing the original printed document is in good condition with text that is unbroken and in good contrast to the page background.
Where the condition of the document is less than perfect, the percentage accuracy may start to drop.
However, at Microform we are also able to offer OCR proofreading services, so that once the text is scanned and digitised, we can manually compare it to the original printed version and correct any minor errors.
This is a much more labour-intensive task but it means we are able to provide a highly accurate digitised version of text and tabular data, for clients who need to know any numbers have been transferred correctly, and for other critical documents.