Document Scans
Documents that are created from scans do not automatically contain machine-readable text, which creates a problem for accessibility. These documents are basically just large images of text. Thankfully, there are options available to remediate such documents to make the text readable. This is called Optical Character Recognition (OCR).
OCR is a function built into Adobe Acrobat Pro and there are several steps you'll want to take to get from a scanned PDF to an accessible text file PDF. (Note: Your scanning software might already have the ability to output an OCRed text document when you scan it. You'll want to check that out to save you a step.)
How To: Use Optical Character Recognition in Adobe Acrobat
Illinois State University offers an article and video detailing the following steps.
- Open the document scan in Adobe Acrobat Pro.
- Go to the All Tools menu in the top left menu bar and select “Scan & OCR”
- In the Scan & OCR pane, select “In this file” underneath the “Recognize Text” heading, and then select the “Recognize Text” button.
- Choose where to save your OCRed file, and whether you want to replace the old file or keep both new and old.
The previous tasks create a machine-readable document version, but the text still needs to be tagged and ordered.
To tag and check the machine-readable document, follow the next two steps:
- Select the tag icon in the vertical toolbar on the right side of the screen.
- Right click on the text “No tags available” and select “Autotag document.”
The automatically generated tags will provide a good starting point for document tagging, but it is important to manually review the output to make sure that each item is tagged and in the correct order. Before finalizing the PDF, it is important to run the accessibility checker and address any problems that come up from the check even beyond tagging, including providing missing information for the file metadata or supplying alternative text for images.
Follow the How-To: Create an Accessible PDF from a Word or Google Doc guide beginning at Step 4 to organize tags and run the accessibility checkers.
Ultimately, the goal is to provide access to the information in the document so if remediating the document (or an archive of similar documents) isn't possible, then provide an easy way for folks to get an accommodation. That would look like, for example, providing a note where the document is accessed about how to contact someone that can provide the information that is contained in the inaccessible document within a reasonable amount of time.
Alternatives to Scanned Documents
While the Optical Character Recognition tool does allow scanned documents to be remediated to become accessible, it’s best to avoid scanned documents altogether.
When designing documents in the future and determining if they should be handled digitally or as a print copy, consider the following:
Accessibility: A digital document and signature is more accessible than a printed and scanned document.
- The formatting needed for screen readers is built into the digital document so there’s no need to remediate the signature within the document for accessibility.
- Remediating a new PDF after it has been OCRed is more time consuming than if you use a document from a Word or Google file then save it as a PDF with a signature field.
Security: Digital signatures are more secure than ink signatures.
- PDF readers can check that the document was written and signed by the same person, and that it has not been modified by an unknown person.
- PDFs can be encrypted and locked so they are only available to designated users, unlike printed documents, which can pose security risks when they exist as physical items.
Efficiency: Digital documents and signatures leverage technology to streamline and simplify the signing process by reducing excessive handling.
- One can avoid the process of printing, scanning, converting to text using OCR, and time-intensive remediation by making the document accessible and digital-first from the beginning.