Accessible Document Converter Solution

VerseOne's Accessible Document Converter (ADC) is designed to make it easier to convert PDF and Word documents to Accessible HTML web pages. ADC is not designed to faithfully display heavily designed documents as though viewing in a graphic design application or even a PDF — this is why the module includes a link to download the original document.

Although continually being improved, HTML is fundamentally more limited than even most Word Processors, e.g. there is no concept of "tiered numbering" in lists: we can make it look as though there are by using CSS, but this will not help those using screen-readers, for instance.

How the converter works

Our conversions are provided by two services: PDF to Word uses Adobe’s professional PDF Services API, and the Word to HTML is handled by a library called PanDoc. So, a PDF will go through Adobe first, and then through PanDoc: Word only goes through PanDoc.

Once we receive the HTML, we can do a number of transforms ourselves — to make up for some translations errors where possible, and to ensure that we don’t have multiple identical images, etc. This does allow us to finesse some elements of the HTML, provided we have some way of determining the original data.

Although we are constantly trying to improve the module, some elements are beyond our control. The below outlines known issues with the ADC, short-term workarounds, and any development progress that we have made or are researching.

Back to the list

Tables

Blank table cells aren't not rendered in the conversion, which can lead to incorrectly offset headings, etc.

Short-term fix

In the short term, as long as there is some content in a cell, then everything should line up correctly, e.g. if you put in just something like "Items" into an empty header cell, then that should then render correctly.

Alternatively, enter "Delete" into a blank table cell: then, once the document has been converted by ADC, then use the editor to remove the "Delete" from the relevant cell(s). The table will be properly structured, but with the blank cell.

A note on Accessibility

To make tables Accessible, no cell should be blank: even if there is no content, then the cell should state something like "No value". This especally applies to header cells, e.g. consider the below example:

Example of a table with a blank heading cell (th)

When ADC converts the document, the table gets transformed. The top left cell gets removed, transposing the headers one cell left, e.g. "White" becomes the heading for the column that starts with "UK Medical Graduate", Total becomes the heading for "18%", etc.

Actually, of course, the top left cell should have a heading such as "Job Role": this makes the table more discernable for everyone, but especially those using screen readers.