It's surprisingly difficult to find information on the use of PDF tags outside the deeply-technical PDF ISO standards. This list serves as a quick reference point on the use of common elements and some examples of how they should be nested to maximize accessibility.
What are PDF tags?
In a PDF document, tags provide information on the structure and content to assistive devices. Elements are assigned a tag that describe the type of content found within. Examples of semantic tags include headings, paragraphs, tables, and lists.
Tags are invisible to the end user but are a critical component of an accessible PDF document. The tag tree is used by many screen readers to not only provide structural information but also define the document's reading order.
List of common tags
The following table contains a curated list of the PDF tags you're likely to encounter in your documents. I've attempted to provide a short description of each element, though many are self-explanatory.
Some tags have been omitted because, even though they're common, they're either no longer supported or their usage is discouraged based on the PDF and PDF/UA Best Practices syntax guide.
|<Art>||Separates individual articles within the same document. Often found in the
|<Aside>||Content that is indirectly related to the current topic, like a sidenote or a tip.|
|<BlockQuote>||Block-level quotation. Can contain several paragraphs and a caption.|
|<Caption>||Used to title an element. Should ideally be the first child of its parent but can also be used as the last child or placed outside the parent.|
|<Code>||Inline text of programming code. Found in a block-level element.|
|<Div>||Semantically-empty container. Typically used to apply styles to grouped elements.|
|<Document>||The container of a complete document. A PDF may contain multiple documents. Can be left empty to indicate a blank page.|
|<Figure>||Images, charts, and other graphical elements. May contain various elements but will be interpreted as a single image by screen readers.|
|<Form>||Form elements. Can contain text when multiple fields are grouped but typically only contains an attributed object.|
|<Formula>||Mathematical or scientific formulas. Can be used inline or at block level.|
|<H(X)>||Section, document, or page titles. Should appear hierarchically, without skipping a level.|
|<Index>||Container for a subject index, usually found at the end of a publication.|
|<Lbl>||Labels for list markers such as bullets and numbers found in the
|<Link>||A link to a web page or another location in the document.|
|<L>||Parent list container. Contains
|<LI>||Individual list items found in a parent
|<LBody>||The contents of a list item, found in the
|<Note>||An explanatory note like a footnote or endnote. Typically found under
|<P>||An ordinary paragraph.|
|<Part>||Used to divide larger documents in parts.|
|<Quote>||Inline quote in a block-level parent.|
|<Reference>||A citation to text or data found elsewhere in the document. Can include a
|<Sect>||Used to divide a document into small sections. Often found in
|<Span>||Semantically-empty inline container, often wrapped around styled text.|
|<Table>||Table parent container. Contains
|<TBody>||Designates a section of the table as the content area. Optional.|
|<TFoot>||Designates a table section as the footer, typically a total row. Optional.|
|<THead>||Designates a table section as the header. Typically contains the table's
|<TOC>||Parent container for a table of contents that can be found at the root or in a
|<TOCI>||Individual table of contents items. Can contain another
|<TR>||Table row used to group
Syntax and hierarchy
The following are sample tag trees for commonly-nested elements. Note that some tags can be approached in various ways as its a rather flexible format in that regard, but presented below are the ideal layouts.
<Caption> element is used to provide a title to an element, commonly used with
It should ideally be placed as either the first element of its parent and can contain other tags. It can also be provided as the last child element, or outside the parent element if required.
<p>Table 1. Example of a caption
A three-column table with a caption, a row of column header cells, and a single row of data cells. Note that the
<TFoot> elements are optional.
<P>Row 1, column 1
<P>Row 1, column 2
<P>Row 2, column 1
<P>Row 2, column 2
<P>Row 3, column 1
<P>Row 3, column 2
Every input must have its own
<Form> element unless they are a group, like checkbox or radio sets. The
<Form> element should appear at the same level as the primary label, and both within a common parent element. The
OBJR notation is an Object Reference, which means the tag represents the actual field element.
Note that there is no mechanism to assign an input to a particular label. As such, the reading order and tooltips should be used.
A single parent element can contain multiple inputs.
Field Name - OBJR
Label text 2
Field 2 Name - OBJR
Checkboxes and radio buttons
Individal form labels should be found directly before or after their object.
Checkbox 1 Name - OBJR
Checkbox 1 label text
Checkbox 2 Name - OBJR
Checkbox 2 label text
Checkbox 3 Name - OBJR
Checkbox 3 label text
A simple nested list. Another way to do a list tree is to include the bullet character in the
<Lbody> and wrap the text in a non-semantic element like a
<LBody>List item text
<LBody>List item text
<LBody>List item text
Table of Contents
Table of contents can be nested or presented in a single, flat level. Both approaches are acceptable.
<TOC> element can be nested as a child of another
<TOC> or in a
Tables of contents can also be flattened and displayed linearly.
Tagged PDF Best Practice Guide: Syntax