Converting Paper Documents into Electronic Files

The Document Imaging group at Headquarters utilizes high-quality document scanners, top-end Optical Character Recognition (OCR) systems, and maintains quality controls to provide a thorough and sustainable conversion of paper documents into electronic files.  The conversion results in Acrobat Image + Text format files, that contain the actual scans of the pages for viewing and printing purposes, and have a layer of searchable text behind the images to facilitate searching and indexing.

The Document Imaging Group at Headquarters operates under the Working Capital Fund (WCF):
   Link to the WCF on DOE's iPortal
   Link to the WCF on Powerpedia

Benefits and Advantages

Converting paper documents into electronic files helps us manage, store, access and archive the organizational information we have “locked up” in paper documents.  The resulting electronic files can be stored on computer systems and document storage systems, and accessed and distributed through standard electronic sharing and communications.

  • Convert Corporate Knowledge that is locked up in paper documents into electronic files, giving broader access to their information and resources.
  • Office space can be saved by converting cabinets of paper into electronic files that are available on-demand/as needed.
  • Recommended Acrobat Image Plus Text format (Acrobat Searchable Image - Exact) retains the scan of original document, no information is lost during the conversion.  The scanned page is always used for viewing and printing, while the OCR'd text is available for searching and even basic extraction (with limitations).

Preparation and Processing Steps

Please discuss the pre-scanning preparation process with a Document Imaging representative to coordinate the best method to organize documents and plan for file creation to meet your needs.

The basic steps to coordinate are:

  • Prepare your documents by separating them according to the electronic files to be created, i.e., do the 50 pages in this folder make one file or ten files.  This may already be done if the documents are in an organized filing system.
  • Specify, or have a system to determine what file name should be given to each file.
    • Document/file names should be no longer than 61 characters to allow for transfer and archival, and follow universal file naming rules.
  • Help identify if the documents need to be scanned in full color or black-and-white (B&W).  Full color allows for better readability even if the content is predominantly B&W, however if you have B&W-only content the file sizes will be significantly smaller.
  • Make sure the documents are nicely contained in boxes or other reliable containers for transportation between the customer's office and the Imaging office. If you are unable to do so, please contact our office for assistance.

The Imaging Group will:

  • Prepare documents for scanning by removing stapes, clips, and any binding including spiral or glue binding.
  • The documents will be prepared for feeding into a sheet-fed scanner.
  • Scan the documents. This includes Quality Control steps to ensure all the pages are scanned and as readable as possible.
  • Run Optical Character Recognition (OCR) software to create the text layer of the pages. DOE's standard format is Acrobat Image + Text, and other formats are also available.
  • Coordinate delivery of the files.  Due to the restrictions on removable media being used on DOECOE PC's the delivery may need further involvement to create electronic containers on customer data systems, and possibly ensure the proper security controls are instituted.  Traditional delivery on USB dives and DVD's is possible.
  • Bind, clip, or otherwise group the pages to make sure the documents stay in order after processing and during transport back to the owner. The staff will not re-staple documents or re-bind documents.  Storage, disposition, and/or destruction of the documents is the responsibility of the document owner.

File Format Information

DOE's standard file format for static document archival is Acrobat Image + Text / Acrobat Searchable Image - Exact. This is an Acrobat file that contains the actual scans of the pages for viewing and printing purposes, and has the recognized text behind the image for indexing and searching. By containing the actual scanned page no information is lost, all handwriting, charts, photos, etc. are viewed and printed.

Other Technical Details

Scanner Hardware: High-speed, scans in black-and-white and in color, capable of scanning pages up to 11"x17".

Use this link to the Contact Us page for the Document Imaging contacts.

Use this link to go to the FAQs for Document Imaging.

MAAdm updated 12/10/2021 - New Format