Accessible Document Converter Solution

VerseOne's Accessible Document Converter (ADC) is designed to make it easier to convert PDF and Word documents to Accessible HTML web pages. ADC is not designed to faithfully display heavily designed documents as though viewing in a graphic design application or even a PDF — this is why the module includes a link to download the original document.

Although continually being improved, HTML is fundamentally more limited than even most Word Processors, e.g. there is no concept of "tiered numbering" in lists: we can make it look as though there are by using CSS, but this will not help those using screen-readers, for instance.

How the converter works

Our conversions are provided by two services: PDF to Word uses Adobe’s professional PDF Services API, and the Word to HTML is handled by a library called PanDoc. So, a PDF will go through Adobe first, and then through PanDoc: Word only goes through PanDoc.

Once we receive the HTML, we can do a number of transforms ourselves — to make up for some translations errors where possible, and to ensure that we don’t have multiple identical images, etc. This does allow us to finesse some elements of the HTML, provided we have some way of determining the original data.

Although we are constantly trying to improve the module, some elements are beyond our control. The below outlines known issues with the ADC, short-term workarounds, and any development progress that we have made or are researching.

Back to the list

Tiered Numbered Lists

Although continually being improved, HTML is fundamentally more limited than even most Word Processors. For example, to address one of your issues, HTML has no concept of "tiered numbering"in lists, e.g. 1.1, 1.1.1: we can make it look as though there are by using CSS, but this will not help those using screen-readers, for instance (HTML renders sub-lists as per the below). So although this looks like a basic issue with ADC, this is actually a fundamental issue within HTML itself.

HTML and preview showing lack of tiered list numbers

We are going to introduce CSS to make it look like there are tiered numbers if there is an “ordered list” (<ol>) inside another “ordered list” (I use them frequently myself in spec documents), but it is important to stress that this will not necessarily provide semantic meaning to those using screen-readers: some screen-readers will read the tiered numbers, some will not.

We hope to have this in place by end of November 2023.