Binder Document Specification, Version 1.0

OpenReader Consortium Preliminary Working Draft (13 May 2006)

This version:
http://openreader.org/spec/bnd10-2006-05-13.html
Latest version:
http://openreader.org/spec/bnd10.html
Previous version:
(archived)
Contributors:
Jon Noring (editor)

Table of Contents


1. Introduction

This specification details the structure, conformance requirements and recommendations of a Binder Document (“Binder”). It is a module specification within the suite of specifications used to define the OpenReader Publication Framework Specification.

The Binder is an XML document whose purpose is to organize a textual publication represented by multiple resources. The word “Binder” is used since the Binder Document is, in a loose sense, the digital equivalent of a book binder, or a three-ring binder, where paper pages (which can be thought of as discrete resource fragments) are ordered and “bound together” to create a single, coherent publication.

Although the Binder is primarily designed for the OpenReader Publication Framework (a framework centered on XML-conforming content documents and CSS), this specification has been authored in sufficiently generic fashion so it may be used for other similar publication frameworks.

The Binder Document is significantly influenced by the similar purpose Package Document of the innovative OEBPS Specification, first formulated in 1999. Since 1999, much has been learned about the shortcomings of OEBPS — expected for any first-generation technology — and new requirements identified. The Binder incorporates all the long-overdue improvements asked for by publishers, the accessibility community, and other digital publication stakeholders. It also enables new and powerful innovations which will greatly benefit both publishers and end-users.

1.1 Normative Edition

The normative edition of this specification is the XHTML 1.1 document located at http://openreader.org/spec/bnd10.html .

Other formatted editions may be offered besides the normative edition, but they will not be considered normative.

The XHTML 1.1 normative edition of this specification is authored so that the markup in the document body (that contained in the body element) conforms with the Basic Content Document Specification, Version 1.0.

1.2 Definitions

Several important words and terms used in this specification are defined in the Common Definitions Document, Version 1.0.

1.3 Requirement Levels

The following key words (“imperatives”) are used in this specification to denote requirement level consistent with RFC 2119:

  • must
  • must not
  • required
  • should
  • should not
  • recommended
  • may
  • not required
  • optional

1.4 Highlighting Conventions

To aid in readability and understandability, special text highlighting conventions are used in this specification (in addition to ordinary text emphasis) to emphasize important items.

1.4.1 Imperative Level

The requirement level imperatives described in Section 1.3 are highlighted based on three basic imperative levels: required, recommended, and optional.

1.4.2 Elements, Attributes and Attribute Values

The normative XHTML 1.1 edition of this specification includes special markup for every mention of elements, attributes, attribute values, and other related code. (For details, refer to the comment in the source document header.) This allows these markup constructs to be specially highlighted, using CSS, during presentation (including their status and requirement level) so they may be more easily recognized.

Since the normative edition of this specification may be rendered with different CSS style sheets, converted into other formats, rendered on visually limited hardware, or presented with text-to-speech engines, some or all of this highlighting may be lost. Care has been taken to assure that, in the absence of highlighting, every mention of these markup constructs will be clear and unambiguous.

Element Highlighting
Status Requirement Level
Required Cond. Req. Optional
Normal pubid title dublincore
Deprecated
Removed
Attribute Highlighting
Status Requirement Level
Required Cond. Req. Optional Fixed
Normal idns event comment
Deprecated
Removed

In the above tables, there are four requirement levels:

  1. “Required” means the element/attribute must appear, in some capacity, in all Binder documents.

  2. “Conditionally Required” means the element/attribute must appear under certain element usage situations, and is optional in other situations.

  3. “Optional” means the element/attribute is optional under all situations.

  4. “Fixed” (applicable only to attributes) means the attribute is fixed to a certain value in the DTD and there is no separate requirement the attribute must appear in the associated element.

Similarly, there are three status levels:

  1. “Normal” means the element/attribute has normal status in this specification.

  2. “Deprecated” means the element/attribute has been deprecated, and support for it may be removed in a future version of this specification.

  3. “Removed” means the element/attribute is no longer supported in this specification, but is nevertheless mentioned.

An empty cell in the tables means there is no mention in this specification of an element/attribute having the associated status and requirement level.

Attribute values are highlighted as en-US.

Other types of “code” are highlighted as PCDATA.

1.5 Referenced Specifications and Standards

This specification is built upon a wide and stable base of compatible open specifications and standards. Following are the various specifications and standards referenced in some manner by this specification.

OpenReader Specifications:

W3C Specifications and Notes:

Internet Engineering Task Force (IETF):

International Organization for Standardization (ISO):

National Information Standards Organization (NISO)

Internet Assigned Numbers Authority (IANA)

Others:

2. Binder Document: MIME Media Type “application/x-orp-bnd1+xml

The MIME Media Type of a conforming Binder Document is “application/x-orp-bnd1+xml”. This MIME media type is not IANA registered.

Other specifications and applications using or referencing Binder Documents by MIME media type should use “application/x-orp-bnd1+xml”, rather than one based on the “text/” media type name. The reason is that Binder Documents may be encoded in UTF-16 (see Section 3.1) and “application/” media type names are more appropriate when both UTF-8 and UTF-16 encodings are allowed (RFC 3023).

3. Binder Document: General Requirements

The Binder Document is an XML document valid to the Binder Document DTD, Version 1.0. The Binder, as an XML document, may use the various vocabulary-independent constructs that are specified in XML 1.0.

In addition to XML conformance, the Binder Document must meet a set of general requirements that go beyond what XML specifies. This section outlines these general requirements, as well as provides an overview of the more important XML provided vocabulary-independent constructs, useful to both Binder Document authors and user agent developers.

3.1 General Conformance Requirements

A conformant Binder Document must meet all of the following general and top-level requirements:

  1. Fully conforms to XML 1.0 (e.g., it is well-formed)

  2. Text encoding is UTF-8 or UTF-16 as specified in the latest Unicode standard.

  3. Includes an XML declaration with a text encoding declaration:

    <?xml version="1.0" encoding="UTF-8" ?>
    

    or

    <?xml version="1.0" encoding="UTF-16" ?>
    
  4. Valid to the Binder Document DTD, Version 1.0, which is externally referenced by public identifier as follows:

    <!DOCTYPE binder PUBLIC
         "-//OpenReader//DTD Binder Document 1.0//EN"
         "http://openreader.org/dtd/bnd10.dtd">
    
  5. Does not include a DTD internal subset.

  6. For the document root element binder, the default namespace is explicitly declared to be http://openreader.org/namespace/orp-binder/1.0/. Two prefixed namespace declarations are also required: Dublin Core, and XHTML. The required attribute xml:lang (see Section 4.2.1.1) specifies the default language of the Binder Document (but not of the Publication itself, also discussed in Section 4.2.1.1.)

    Example where the default language of the Binder Document is U.S. English:

    <binder xmlns="http://openreader.org/namespace/orp-binder/1.0/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xhtml="http://www.w3.org/1999/xhtml"
            xml:lang="en-US">
    
  7. Does not declare any other namespaces, whether default or prefixed.

  8. Conforms with all the specific requirements and constraints described elsewhere in this specification.

3.2 Binder Document Template

Based on the general conformance requirements in Section 3.1, the following Binder Document template is constructed. This template includes the required:

  1. XML and DOCTYPE declarations,
  2. root and top-level elements,
  3. attributes and attribute values.

Binder Document authors will find it a useful template (or “boilerplate”) to use as a starting point to build conforming Binder Documents.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE binder PUBLIC
     "-//OpenReader//DTD Binder Document 1.0//EN"
     "http://openreader.org/dtd/bnd10.dtd">
<binder xmlns="http://openreader.org/namespace/orp-binder/1.0/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:xhtml="http://www.w3.org/1999/xhtml"
        xml:lang="en-US">

     <pubid>
          <!-- pubid content goes here -->
     </pubid>

     <resources>
          <!-- resources content goes here -->
     </resources>

     <userset usid="1" mode="oeb" lang="en-US">
          <!-- userset content goes here. The <userset>
               attribute values are example only. -->
     </userset>

     <!-- Note: There may be more than one <userset> in <binder>. For
          multiple <userset>, the 'title' attribute is also required. -->

</binder>

Notes on the above example:

  1. The text encoding declaration in the XML declaration may either be UTF-8 or UTF-16.

  2. The value of the xml:lang attribute in binder will vary depending upon the default language of the Binder Document (see Section 4.2.1.1).

  3. All the elements shown may include certain optional attributes.

  4. The required element userset, which may appear more than once, must include the three required attributes as shown. However, the given attribute values are example only and may vary as allowed by this specification.

3.3 Some XML Requirements and Vocabulary-Independent Constructs

This specification is not intended to be a tutorial on how to author XML-conforming Binder Documents. Nevertheless, to aid in the authoring of Binder Documents, which must be well-formed XML, and which may include the useful vocabulary-independent constructs that XML allows, this section presents a sampling of the most important markup-related XML requirements, vocabulary-independent constructs, and related useful topics.

Note: A few Binder Document and user agent requirements (beyond what XML requires) are specified in this section.

3.3.1 Important XML Markup Requirements

As specified in Section 3.1, all Binder Documents must be conforming XML 1.0 documents, which means, for example, they are well-formed. Following is a list of several specific XML markup requirements, but this list is by no means exhaustive. These requirements are mentioned since, when not followed, they contribute to a large fraction of encountered XML well-formedness and validation errors.

  • Element and attribute names are case-sensitive. For example <binder> and <BINDER> are different elements.

  • Attribute values must be enclosed in either single or double straight quotes. For example, lang="en-US" and lang='en-US' are conforming, but lang=en-US, lang="en-US' and lang='en-US" are not.

  • All non-empty elements must have properly formed starting and closing tags.

  • All declared empty elements must be properly formed (see Section 3.3.5).

  • All elements must properly nest.

  • Depending upon the circumstance, certain markup characters, when used literally (e.g. & and <), must be escaped. Refer to the next Section 3.3.2 for details.

3.3.2 Character Entity and Numeric Character References

For XML 1.0 documents (and this includes Binder Documents), each individual character within character data is represented in one of two ways:

  1. directly in the document’s text encoding, and

  2. indirectly using a numeric character reference, or by a predefined or declared character entity reference which points to a numeric character reference.

For example, it is convenient to use numeric character references and allowed character entity references when the tool used to create an XML document is limited to ASCII encoding (UTF-8 conformant but limited to the Basic Latin script), and some characters fall outside of the ASCII range.

In certain circumstances, five of the characters used to define XML markup constructs (specifically & < > " and '), when used literally, must be represented (or “escaped”) by their numeric character references, or by their declared character entity reference equivalents. For this purpose, XML predefines entity references for these five characters which all XML processors must recognize:

Predefined XML Character Entity References
Character Predefined Entity Numeric Reference (hex) Numeric Reference (dec)
& &amp; &#x0026; &#38;
< &lt; &#x003C; &#60;
> &gt; &#x003E; &#62;
" &quot; &#x0022; &#34;
' &apos; &#x0027; &#39;

Listed below are the circumstances when the five markup characters, used literally and not as part of markup, must be escaped:

  1. The & and < characters, except when used within CDATA sections and Comments.

  2. The " and ' characters when they appear within an attribute value and match the attribute value delimiting quote mark. It is recommended that both always be escaped in attribute values.

  3. The > character in the very rare instance it appears in the string “]]>” when that string is not marking the end of a CDATA section. It is considered good practice to escape the > character wherever it is used literally.

Example of Binder Document markup with both required and optional numeric and character entity references:

<title comment='Jane&apos;s AT&amp;T R&#x00E9;sum&#x00E9;'>Jane's AT&amp;T R&#x00E9;sum&#x00E9;</title>

A user agent will render the above content as:

Jane's AT&T Résumé

3.3.3 CDATA Sections

CDATA sections may be used in XML documents (which includes Binder Documents) to escape blocks of text containing markup characters (e.g. “<” and “&”) when used literally. This is an alternative to individually escaping each markup character (see Section 3.3.2).

A CDATA section starts with “<![CDATA[” and terminates with “]]>”.

CDATA sections may be used anywhere character data may occur, except that they must not appear within an attribute value. They must not nest; the text content within a CDATA section must not contain the literal character sequence “]]>”.

Example:

<xhtml:span>Insert the following: &lt;h1&gt;Greetings!&lt;/h1&gt;</xhtml:span>

is equivalent to

<xhtml:span>Insert the following: <![CDATA[<h1>Greetings!</h1>]]></xhtml:span>

A user agent will render both of the above as:

Insert the following: <h1>Greetings!</h1>

3.3.4 Comments

Comments may appear anywhere in an XML document (including a Binder Document) except before the XML declaration and within other markup. Comments are not part of the character data; they are primarily intended for Binder Document authors to insert private commentary (notes) within the document.

A comment starts with “<!--” and terminates with “-->”; the comment text is between these two delimiters. The comment text must not contain the string “--” (two hyphens), but otherwise may include, without escaping, all the Unicode characters recognized in XML 1.0, including the XML markup characters. A comment must not terminate with the literal string “--->”.

Examples of valid comments:

<!-- This is a comment -->
<!-- & < > " ' -->

To conform to this specification, user agents must not:

  1. render the contents of comments,
  2. execute any script or code (such as JavaScript) contained in comments,
  3. take any action based on the content in comments.

3.3.5 Empty Elements and Empty Content

Some elements in a DTD may be declared EMPTY. When used in an XML document, these elements must not contain any content and must use the empty-element syntax (also known as “minimized form”) as specified in XML 1.0.

Example of correct usage of declared empty element syntax (the element item is declared EMPTY in this specification):

<item resid="css1" resource="style1.css" media-type="text/css"/>

Note: a sequence of one or more white space characters may appear before the closing “/” in empty-element syntax.

In this specification, the empty-element syntax must only be used for declared empty elements; it must not be used for declared non-empty elements when they contain no content.

Example of correct and incorrect usage when a declared non-empty element contains no content (the element dc:title is declared non-empty in this specification):

<dc:title/>             <!-- not allowed! -->
<dc:title></dc:title>   <!-- correct usage -->

When declared non-empty elements contain no content, or only a sequence of one or more white space characters, this occurrence is referred to as “empty content.”

3.3.6 White Space Handling

White space characters and their handling by user agents is an important consideration to both Binder Document authors and user agent developers.

In XML, the white space characters are:

  • space (&#x0020;)

  • tab (&#x0009;)

  • carriage return (&#x000D;)

  • line feed (&#x000A;)

The rules for white space handling of both character data and attribute values by XML processors are addressed in Sections 2.10, 3.2.1 and 3.3.3 of the XML 1.0 Specification.

User agent requirements:

  1. For character data, XML processors are required to pass to the user agent all characters in a document that are not markup. This includes white space characters.

    Except where the XML attribute xml:space is specifically set to the value of preserve, or a similar override mechanism is applied (e.g., the CSS white-space property), in this specification user agents must normalize the character data of an element as follows:

    • Replace all sequences of two or more white space characters with a single space character (&#x0020;), and

    • Remove all leading and trailing spaces.

    Example:

    <dc:title>
       <xhtml:em> This
       </xhtml:em>    is a
               Title
    </dc:title>
    

    and

    <dc:title> <xhtml:em> This    </xhtml:em>   is   a Title </dc:title>
    

    are both equivalent to:

    <dc:title><xhtml:em>This</xhtml:em> is a Title</dc:title>
    
  2. For white space in attribute values, XML requires that all XML processors normalize attribute values before sending the attribute value data to the user agent. Note that this normalization process treats attribute values not of type CDATA differently from those of type CDATA.

    To conform to this specification, user agents must normalize CDATA attribute values as if they were not of type CDATA. That is, for attribute values of type CDATA, user agents must replace a sequence of space (&#x0020;) characters with a single space (&#x0020;) character, and remove any leading and trailing space (&#x0020;) characters.

    Example (the comment attribute is of datatype CDATA):

    <style cssrefs="css1" comment=" This is   a
    
        style sheet "/>
    

    is equivalent to:

    <style cssrefs="css1" comment="This is a style sheet"/>
    

3.3.7 Unicode Space Characters and Related Topics

Binder Document authors are free to use all the Unicode characters in character data, except those disallowed by XML and this specification (refer to "Unicode in XML and other Markup Languages" for recommendations on the Unicode characters not suitable for use in XML, and related topics.) This flexibility allows for the richest content in Binder Documents, meeting nearly all international needs, but in certain situations will create a few complexities for Binder Document authors and user agent developers.

One of the more complex topics concerns the spacing characters used for inter-word separation. Because the concept of a “word” in most languages plays a fundamental role in various word-related operations, such as text searching, line breaking, etc., Binder Document authors and user agent developers need to understand how the Unicode space characters are used to enable inter-word separation, plus the related topics of line breaking (primarily for the purpose of visual presentation), and soft hyphens.

3.3.7.1 Unicode Space Characters

The Unicode Space Characters set (see Section 6.2 in the Unicode 4.1.0 specification) includes:

(For more details on these spacing characters, and other space-like characters, refer to Section 6.2 in the Unicode 4.1.0 standard. This specification does not specify how user agents are to exactly render these different space characters.)

3.3.7.2 Inter-word Separation

In this specification, user agents must treat any sequence of Unicode Space Characters and/or XML white space characters within character data as an inter-word separator.

3.3.7.3 Line Breaking

The related topic of line breaking is important for the purpose of visual rendering. This topic is covered in detail in the Unicode Standard Annex #14 Technical Report: Line Breaking Properties, which provides a comprehensive set of guidelines. User agents should follow, as closely as possible, the line break recommendations in this Unicode technical report.

In general, line breaking is allowed between words except where one or more no-break space characters are used between the words. The no-break space characters include:

  • No-break space (&#x00A0;, or &nbsp;, as defined in XHTML and in the OpenReader Character Entity References Common Set — this is the preferred character to use for no line breaking between words)

  • Figure space (&#x2007;)

  • Narrow no-break space (&#x202F;)

  • Zero width no-break space (&#xFEFF;)

User agents should not line break between two words separated by a sequence of one or more no-break space characters.

Binder Document authors should not use a no-break space character for any purpose other than indicating that no line break should occur between words.

[Informative Commentary] In XML document authoring, particularly XHTML, some authors inappropriately use a no-break space (primarily &nbsp;) to “pad” spacing in order to force a desired visual presentation, thereby working against the reflowability and adaptability of the content to various hardware, applications and end-user presentation settings.

Regarding line breaking within a word, user agents may do so per the allowance and the conventions of the language as detailed in the above referenced Unicode technical document on line breaking.

3.3.7.4 Soft Hyphen

Binder Document authors may insert within a word the “soft hyphen” character (&#x00AD; or &shy; as defined in XHTML and the OpenReader Character Entity References Common Set) to signal that the user agent may line break the word at that point.

In this specification, user agents must not render the soft hyphen character but may add the appropriate end-of-line character(s) (and other necessary text adjustments, depending upon language and conventions) for a line break placed after a soft hyphen. For all other purposes, such as word searching, user agents must ignore the soft hyphen character since it is technically not part of the word.

Note: The soft hyphen is not the same character as the plain hyphen (&#x002D;.) The plain hyphen character is considered a part of the word, and user agents must process it like any other character in the word.

Example of the use of a soft hyphen:

<dc:title>How to Insert a Soft Hy&shy;phen Within a Word</dc:title>

Should a user agent, in presenting the contents to the end-user, line break before the word “Hyphen”, it will render the above example as follows:

How To Insert a Soft
Hyphen Within a Word

If the user agent line breaks at the soft hyphen, it will render the above example as follows (using the common English language convention for hyphenation):

How To Insert a Soft Hy-
phen Within a Word

4 Binder Document: Vocabulary Description

This section describes the Binder Document vocabulary. As noted in Section 3.1, a Binder Document must be valid to the Binder Document DTD. The Binder DTD is the normative vocabulary reference with respect to:

  • allowed elements, attributes and attribute values, and

  • element content model.

The root element binder is described in Section 3.1, item 6.

Although the overarching framework which references this specification is the ultimate authority with respect to user agent processing of Binder Documents, this section (as well as elsewhere in this specification) includes a number of user agent requirements and recommendations deemed necessary to the integrity of the Binder Document paradigm. All these user agent requirements and recommendations are authoritative in the overarching framework except where explicitly overridden or amended by the framework.

4.1 Structural Overview

The Binder Document is divided into functional parts, each of which performs some function in the organization and/or use of the associated Publication. Some functional parts are vital and thus are required; others are optional and provide for enhancements to the end-user experience.

The following table provides an overview of the various Binder parts, with links to full descriptions. Note that each functional part is represented in markup with a head element (shown in the second column). In the Binder Document, the order of the head elements is important as listed in the table; except for the User Set, each functional part’s head element must not appear more than once.

Binder Document Functional Parts

Functional Part Name

Head Element Name

Description

Requirement Level

Publication Identifier

pubid

Primary identifier of the Publication

required

Resource Manifest

resources

List of resources that make up the Publication

required

Publication Metadata

metadata

Dublin Core metadata for the Publication

optional but recommended

User Set

userset

Provides for multiple, end-user selectable “views” of the Publication

at least one required

User Set Functional Parts

Publication Title

pubtitle

Title of the Publication

required

Character Manifest

characters

Unicode characters and character blocks appearing in the content documents associated with the User Set

optional but recommended

Spine

spine

Primary reading order (similar to OEBPS Spine), or home page for “web” option

required

Linear Order

linear

Linearization of the content documents in the User Set for printing and related purposes

optional

Composite Documents

composites

Merging of selected out-of-spine content documents into a “virtual” composite document

optional

Element Substitution

elemsub

Associating substitute non-text content media resources to specific content document elements.

optional

Non-Text Content Media Equivalents

equivs

Associating non-text content media resources with each other

optional

Cover

cover

Designating non-text content media cover resources

optional but recommended

Thumbnail

thumbnail

Designating thumbnail images

optional but recommended

Styling

styling

Assigning cascading style sheets (such as CSS) to content document resources

optional

Navigation

navigation

Assigning the primary navigation index (required), and optional alternative navigation indices

required

Resource Descriptions

resdescs

Providing textual descriptions for non-text content media resources

required for non-text content media resources

4.2 General Constructs

This specification supports certain vocabulary-specific constructs that may be used in various functional parts of Binder Documents. (Refer to Section 3.3 for vocabulary-independent constructs.)

4.2.1 The Common Attribute Set

The Binder Document vocabulary defines three [Common] attributes that may be applied to most elements. They are xml:lang, xml:id, and comment.

4.2.1.1 xml:lang

The xml:lang attribute, specially defined in XML 1.0, may be used to specify the default language of the element’s content and attribute values. This attribute must not be used to specify Publication-related languages, which are instead specified using the lang attribute (see Section 4.2.2.)

The value of xml:lang must comply with RFC 3066, or its successor on the IETF Standards Track. Thus, the value will also conform to the separate requirement that xml:lang be an XML Name. (Language Codes) (Country Codes)

While xml:lang is optional for most elements, it is required for the root element binder (see Section 3.1, item 6), specifying the default language of the Binder Document.

4.2.1.2 xml:id

The xml:id attribute, based on the W3C Recommendation xml:id Version 1.0, is used to give a unique identifier to an element. Its value must:

  1. be unique across all elements in a Binder Document,

  2. be an XML Name,

  3. not contain the “:” character,

  4. start with a Letter as defined in Appendix B of the XML 1.0 Specification — it cannot start with an underscore (“_”),

  5. not start with the string “xml” (and all its case variants), since this is reserved in XML 1.0 for possible future standardization, and

  6. not start with the string “orp” (and all its case variants), since this is reserved for possible use in future versions of this specification.

This specification assigns no special meaning or purpose to xml:id (although a future version of this specification may do so); Binder Document authors and authoring systems may freely use xml:id for special identification of elements. However, xml:id must not be used as a “back door” to actuate special features or functions in user agents.

Note that xml:id must not be applied to the item and userset elements, since these elements already require attributes with datatype ID (see Sections 4.4 and 4.6, respectively.) XML forbids an element having more than one ID.

4.2.1.3 comment

The comment attribute may be used by Binder Document authors to provide commentary within an element. The attribute value of comment is datatype text (CDATA).

The allowed Unicode character range for the value of the comment attribute, as with all attribute values of datatype text (CDATA), is given in XML 1.0, with the following constraints:

  1. The literal < character must not appear. If this character is to be used literally, it must be escaped.

  2. The literal & character must not appear except as part of a predefined or declared character entity reference. If this character is to be used literally (not part of an entity reference), it must be escaped.

  3. The literal " and ' characters must not appear when they match the attribute delimiter quote marks; they must be escaped. It is recommended that both of these characters always be escaped when they appear in attribute values.

Similar to XML comments (see Section 3.3.4), the comment attribute is intended for private use by Binder Document authors. User agents must not use the value of the comment attribute.

4.2.2 lang Attribute

The lang attribute may be applied to the required resources, item, and userset elements. Its purpose is to specify Publication-related languages.

The lang attribute differs from the xml:lang attribute (see Section 4.2.1.1) in that xml:lang is intended only to specify the language of element character data and attribute values within a Binder Document.

lang also differs from xml:lang in that lang may contain multiple language assignments which are separated by one or more white space characters. Each assigned language follows the rules given for xml:lang in Section 4.2.1.1.

Example:

lang="en-US de-DE"

It is possible for both xml:lang and lang to appear in the same element, and their assigned language values may differ.

Example:

<item resid="image1"
      resource="image1.png"
      media-type="image/png"
      lang="fr-FR"
      xml:lang="en-US"
      comment="Photograph of Paris (caption in French)"/>

In the above example, xml:lang is assigned the value en-US since the value of the comment attribute is in English. The lang attribute is assigned the value fr-FR since the Publication resource, image1.png, is a photograph with a caption in French.

4.2.3 residrefs Attribute

The required attribute residrefs is applied to several elements to reference one or more resource identifiers assigned in the Resource Manifest (see Section 4.4.3.) It is of datatype IDREFS, and must be a white space separated list of resource identifiers.

(XML 1.0 discussion of datatype IDREFS.)

4.2.4 title Element

The required title element is used in the Publication Title, Styling, and Navigation functional parts of the User Set.

Each title element contains character data (#PCDATA) representing a title description, and may also contain the XHTML Namespace Inline Elements (see Section 4.2.5).

Example:

<title>Title of <xhtml:em>This</xhtml:em> Book</title>

4.2.5 XHTML Namespace Inline Elements

All of the elements in the Binder Document vocabulary which may contain character data (#PCDATA), with the exception of the id element, may also contain inline elements drawn from the XHTML Namespace.

The nine supported inline elements allow Binder Document authors to add semantic richness to the character data. When these inline elements are used in character data, user agents should apply appropriate styling when presenting the character data.

As in XHTML, the inline elements may be nested. The Binder Document Common Attributes may be applied to these elements, as well as the optional class attribute which is drawn from XHTML. (Description) This specification does not specify how user agents are to use the value of the class attribute when present. However, a future version of this specification may add support for author-supplied CSS for styling the XHTML Inline elements, and the class attribute will be useful for CSS styling purposes.

The following table lists the supported XHTML Namespace Inline elements. Each supported element is linked to the general description in the HTML 4.01 specification, providing a good overview of the purpose and use of the element.

Supported XHTML Inline Elements

Element

Short Description

xhtml:code

Computer Code Fragment

xhtml:em

Emphasis

xhtml:kbd

Text Entered by the User

xhtml:samp

Program, Script, and Similar Output

xhtml:span

Generic Inline Level Container

xhtml:strong

Strong Emphasis

xhtml:sub

Subscript

xhtml:sup

Superscript

xhtml:var

Instance of a Variable or Program Argument

Example of the use of XHTML Namespace Inline elements:

<pubtitle>
   <title>Title of <xhtml:em>This</xhtml:em> Book</title>
</pubtitle>

A user agent may visually present the above character data as:

Title of This Book

4.2.6 “Mnemonic” Character Entity References

The Binder Document DTD declares the 253 character entity references specified in the Character Entity References Common Set Specification, Version 1.0. These character entity references are identical to those supported in XHTML 1.1 (which, in turn, are inherited from HTML 4.01.) They include the five XML predefined character entity references (see Section 3.3.2.)

Binder Document authors may use these “mnemonic” character entities instead of the equivalent numeric character references, as explained in Section 3.3.2. User agents must recognize these character entities.

Example using numeric character references:

<title>Jane&#x2019;s AT&#x0026;T R&#x00E9;sum&#x00E9;</title>

The same example using “mnemonic” character entity references:

<title>Jane&rsquo;s AT&amp;T R&eacute;sum&eacute;</title>

Both the above examples will render as:

Jane’s AT&T Résumé

Future versions of this and related Binder Document specifications may support an expanded common set of “mnemonic” character entity references derived from other document markup vocabularies such as TEI and DocBook.

4.3 Publication Identifier

The required Publication Identifier functional part, headed by the element pubid, is used to assign a unique, primary identifier to the Publication. Since the primary identifier may be used for public identification purposes, such as for external linking into the Publication, it should be globally unique.

The element pubid must contain one and only one id element. The id element is declared non-empty and must contain character data (#PCDATA), which gives the actual value of the primary identifier — id must not be empty.

Three attributes are required for the id element: type, idns and ver.

4.3.1 Character Restrictions

Since the primary Publication Identifier may be used with Internationalized Resource Identifiers (RFC 3987) as part of an absolute path segment, the characters used should be restricted, whenever possible, to the ipchar production of RFC 3987 (page 7). (This is not always possible with certain formalized identifier namespaces. An example is the Digital Object Identifier namespace, DOI, which includes the “/” character in its identifier — the “/” character is not included in the ipchar production.)

When the Binder Document author devises their own unique primary identifier namespace, it is recommended that their namespace restricts the allowed primary identifier characters to some subset of the iunreserved production of RFC 3987 (page 7).

The primary identifier must not contain any white space characters. Also, the primary identifier must not contain any percent encodings except when the primary identifier namespace requires it. (Note that applications which extract the primary identifier from a Binder Document and embed it into an IRI or URI may convert certain characters, as necessary, to their percent encoded equivalents.)

4.3.2 type Attribute

The required type Attribute for the id element must be given the value of primary.

This requirement is for future compatibility when the Publication Identifier functional part is expanded to include alternate, antecedent, and other types of Publication identifiers.

4.3.3 idns Attribute

The required idns attribute for the id element specifies the scheme and namespace associated with the primary identifier. It must take one of the following forms:

  1. Uniform Resource Name (URN) Scheme

    The current list of registered URN Scheme Namespaces is maintained by IANA, and governed by RFC 3406.

    The syntax for the value of idns, following RFC 2141, is given as

    <IDNS> ::= "urn:" <NID>
    

    Where “urn:” is the scheme, and <NID> is one of the formally registered URN Namespaces. Note that the leading “urn:” and the Namespace Identifier <NID> are case-insensitive per RFC 2141 — however, for consistency, Binder Document authors must use all lower case.

    Not all of the registered URN Scheme Namespaces are suitable for use as a primary identifier. At least two of them, ISBN and UUID, are suitable, and these two are the most likely to be used of those currently registered.

    Example use of the URN Scheme with a UUID Namespace for the idns attribute value:

    idns="urn:uuid"
    
  2. “info” Scheme

    The “info” URI Scheme, managed by OCLC, currently supports a number of identifier namespaces that may be usable for Publication Identifiers. Notably, these include Digital Object Identifiers (“doi”), Fedora Digital Objects and Disseminations (“fedora”), and Serial Item and Contribution Identifiers (“sici”).

    The syntax for the value of idns is given as

    <IDNS> ::= "info:" <NID>
    

    Where “info:” is the scheme, and <NID> is one of the formally registered “info” Namespaces. Note that the leading “info:” and the Namespace Identifier <NID> are case-insensitive — however, for consistency, Binder Document authors must use all lower case. The trailing “/” character, specified by the “info” URI Scheme, must not be included in the idns attribute value.

    Example use of the “info” Scheme with a DOI Namespace for the idns attribute value:

    idns="info:doi"
    
  3. “x-other:” Scheme

    The “x-other:” Scheme is intended for unregistered, private identifier namespaces, such as one designed by the Binder Document author. The syntax essentially follows that of the formal schemes already described (see example below.)

    It is strongly recommended that instead of using the “x-other:” Scheme, Binder Document authors should generate a globally unique Publication Identifier using the freely usable UUID Namespace with the URN Scheme. There are a number of free UUID generators, as well as UUID registration services.

    Example use of the “x-other:” Scheme for the idns attribute:

    idns="x-other:my-own-pubid-namespace"
    

4.3.4 ver Attribute

The required ver attribute for the id element provides for versioning of a given Publication Identifier. Its value is an integer, and for its first use with a particular Publication Identifier must be assigned the value of “1”.

When small or minor changes are made to any of the resources of a Publication, but not enough to warrant a change in Publication Identifier, the publisher should increment the version number by one.

Changes which qualify as small or minor include any edits that break few, if any, already-established links into the Publication based on resource and element IDs. However, if the Publication is significantly edited, such as substantive changes to its markup structure or major content revisions, then the publisher should consider assigning a new Publication Identifier (with a reset of ver back to “1”.)

4.3.5 Markup Example

Example markup of the Publication Identifier:

<pubid>
   <id type="primary"
       idns="urn:uuid"
       ver="1">6a2014b0-87a2-11da-a72b-0800200c9a66</id>
</pubid>

4.4 Resource Manifest

The required Resource Manifest functional part of the Binder Document provides a list of all the resources (excluding the Binder Document itself) which make up the Publication. It assigns one or more unique resource identifiers to each resource.

The Resource Manifest may declare resources that remain unused (“orphaned”) by the Publication. However, all declared resources must exist in the resource pool associated with the Publication.

The overarching framework which references the Binder Document Specification, such as the OpenReader Publication Framework Specification, defines what resources are allowed in the Resource Manifest, their MIME media types, the resource locator scheme, user agent error handling in the event of missing resources, and related requirements. In this section (and elsewhere in this specification), all the markup examples are based on the OpenReader Publication Framework Specification requirements.

The Resource Manifest is headed by the required element resources, which must contain one or more empty item elements. The order of multiple item elements is not significant.

Three attributes are required for the item element: resource, media-type, and resid.

One optional attribute, lang, may be applied to both the resources and item elements. It is used to assign the language(s) associated with the resources.

4.4.1 resource Attribute

The required resource attribute for the item element gives the path and name for a Publication resource. The overarching publication framework, which references this Binder Document specification, defines the resource path/filename scheme, and allowed characters.

Example:

resource="documents/chapter1.xml"

4.4.2 media-type Attribute

The required media-type attribute for the item element gives the MIME media type of the resource.

Example:

media-type="application/x-orp-bcd1+xml"

4.4.3 resid Attribute

The required resid attribute for the item element assigns a resource identifier for the resource. As noted in Section 4.4.5, a resource may be assigned multiple resource identifiers by applying multiple instances of the item element to that resource.

The datatype of the resid attribute value is ID, and the allowed characters are the same as those specified for xml:id (see Section 4.2.1.2). Each resid must be unique across all ID values in the Binder Document.

Example:

resid="chap1"

Two important benefits of assigning resource identifiers to resources are:

  1. Provides greater flexibility to the publishing work flow by allowing resource paths and names to change without disrupting the Publication organization, and

  2. Preserves the integrity of both inter-document links and external links into Publications, provided the link addressing is enabled with resource identifiers rather than resource names.

4.4.4 lang Attribute

The optional but recommended lang attribute may be applied to both the resources and item elements. Section 4.2.2 provides an overview of the general purpose and requirements of this attribute.

The specific purpose for the lang attribute in the Resource Manifest is to assign the primary and/or significant language(s) to resources. Since this attribute may contain more than one language assignment, lang may assign multiple languages to a single resource. In this instance, the order of the languages in lang is significant, from highest to lowest priority or significance, but otherwise this specification does not distinguish the relative significance between the multiple languages.

For example, a content document may contain portions of text whose languages differ from the primary language. In this case, the primary language is listed first in the attribute value, followed by the other languages.

When lang is applied to the resources element, it globally applies to all resources in the Resource Manifest except where overridden for particular resources by applying the lang attribute to the associated item elements. The lang attribute may be applied to an item element even when lang is not applied to the resources element.

Although assigning languages to resources in the Resources Manifest is optional, Binder Document authors should do so. A future version of this specification may elevate this recommendation to a requirement.

Example of assigning languages to resources:

<resources lang="en-US">
   <item resid="chap1"
         resource="chapter1.xml"
         media-type="application/x-orp-bcd1+xml"
         lang="en-US de-DE"/>
   <item resid="chap2"
         resource="chapter2.xml"
         media-type="application/x-orp-bcd1+xml"
         lang="fr-FR"/>
   <item resid="chap3"
         resource="chapter3.xml"
         media-type="application/x-orp-bcd1+xml"/>
   ...
</resources>

In the above example, all the resources in the Resource Manifest are globally assigned English (US). For the content document resource with resource identifier “chap1”, the global language has been overridden, with a primary language of English, but with some text in German. For the content document resource “chap2”, the global language has been overridden with the primary language of French. For the content document resource “chap3”, since the lang attribute does not appear, its primary language is the globally assigned language, English.

4.4.5 Assigning Multiple Resource Identifiers To a Resource

As noted in Section 4.4.3, a resource may be assigned multiple resource identifiers by using multiple item elements.

This is a powerful feature which allows Publication authors to efficiently reuse resources when they appear in different contexts in their Publications.

For example, the same content document may appear in various portions of the Publication, and each appearance may use a different set of style sheets.

This mechanism may simplify some publishing work flows where having multiple copies of the same resource, each given a different resource (or file) name, adds an unnecessary complication, particularly in the document editing process.

This mechanism also makes it feasible for unambiguously linking to the correct spot within a Publication when a resource is used multiple times, provided the link addresses the resource identifier and not the resource name.

Example of applying two resource identifiers to one resource:

<item resid="intr1"
      resource="document/intro.xml"
      media-type="application/x-orp-bcd1+xml"
<item resid="intr2"
      resource="document/intro.xml"
      media-type="application/x-orp-bcd1+xml"/>

In the example above, the content document resource “intro.xml” is assigned two resource identifiers: intr1 and intr2.

4.4.6 Markup Example

A fairly complex and lengthy markup example of the Resource Manifest:

<resources lang="en-US">
   <item resid="intr1"
         resource="intro.xml"
         media-type="application/x-orp-bcd1+xml"
         comment="Special introduction written by Jane Doe"/>
   <item resid="intr2"
         resource="intro.xml"
         media-type="application/x-orp-bcd1+xml"/>
   <item resid="chap1"
         resource="chapter1.xml"
         media-type="application/x-orp-bcd1+xml"/>
   <item resid="chap2"
         resource="chapter2.xml"
         media-type="application/x-orp-bcd1+xml"/>
   <item resid="note1"
         resource="note1.xml"
         media-type="application/x-orp-bcd1+xml"/>
   <item resid="note2"
         resource="note2.xml"
         media-type="application/x-orp-bcd1+xml"/>
   <item resid="css-a"
         resource="cssdir/a.css"
         media-type="text/css"/>
   <item resid="css-b"
         resource="cssdir/b.css"
         media-type="text/css"/>
   <item resid="css-c"
         resource="cssdir/c.css"
         media-type="text/css"/>
   <item resid="css-d"
         resource="cssdir/d.css"
         media-type="text/css"/>
   <item resid="css-e"
         resource="cssdir/e.css"
         media-type="text/css"/>
   <item resid="imag1"
         resource="images/image1.png"
         media-type="image/png"
         lang="fr-FR"
         xml:lang="fr-FR"
         comment="La Tour Eiffel"/>
   <item resid="imag2-jpeg"
         resource="images/image2.jpg"
         media-type="image/jpeg"/>
   <item resid="imag2-png"
         resource="images/image2.png"
         media-type="image/png"/>
   <item resid="imag2-tiff"
         resource="images/image2.tiff"
         media-type="image/tiff"/>
   <item resid="imag3"
         resource="images/image2-thumb.png"
         media-type="image/png"/>
   <item resid="tabl1"
         resource="images/table1.png"
         media-type="image/png"/>
</resources>

4.5 Publication Metadata

The optional, but recommended Publication Metadata functional part of the Binder Document assigns metadata for the Publication. In this specification, only the Dublin Core Metadata Element Set, Version 1.1 is supported.

The Publication Metadata functional part is headed by the optional metadata element, which may contain one dublincore element.

The dublincore element may contain any of the fifteen Dublin Core Metadata Elements (prefixed with the Dublin Core namespace, see Section 3.1, requirement 6), in any number (including zero), and in any order. It is required that the attribute xml:lang (see Section 4.2.1.1) always be applied to the dublincore element to designate the default language of the content in the Dublin Core metadata. This allows the Dublin Core metadata to be extracted from, and used independent of, the Binder Document.

The overall structure of the Publication Metadata functional part, with some example Dublin Core metadata element markup, is shown as follows:

<metadata>
   <dublincore xml:lang="en-US">
      <!-- The 15 Dublin Core Elements in any number and order. For example: -->
      <dc:identifier idns="urn:uuid">6a2014b0-87a2-11da-a72b-0800200c9a66</dc:identifier>
      <dc:title>The Excellent Adventures of the Markup Kid</dc:identifier>
      <dc:creator role="aut ill" file-as="Doe, John">John “Markup Kid” Doe</dc:creator>
      <dc:language usage="primary">en-US</dc:language>
   </dublincore>
</metadata>

The following table lists the fifteen supported Dublin Core elements (in alphabetical order), along with their definitions taken from the Dublin Core Metadata Element Set specification, and attributes supported in this specification beyond the Common Attribute Set.

Dublin Core Metadata Element Set

Dublin Core Element

Dublin Core Definition

Attributes (Besides Common)

dc:contributor

Entity responsible for making contributions to the content of the resource

file-as, role

dc:coverage

Extent or scope of the content of the resource

dc:creator

Entity primarily responsible for making the content of the resource

file-as, role

dc:date

Date associated with an event in the life cycle of the resource

event

dc:description

Account of the content of the resource

dc:format

Physical or digital manifestation of the resource

dc:identifier

Unambiguous reference to the resource within a given context

idns

dc:language

Language of the intellectual content of the resource

usage

dc:publisher

Entity responsible for making the resource available

dc:relation

Reference to a related resource

dc:rights

Information about rights held in and over the resource

dc:source

Reference to a resource from which the present resource is derived

dc:subject

Topic of the content of the resource

scheme

dc:title

Name given to the resource

dc:type

Nature or genre of the content of the resource

4.5.1 General Usage of the Dublin Core Metadata Element Set

The fifteen Dublin Core metadata elements share the same content model. Each Dublin Core element may contain character data (#PCDATA) representing the metadata information, and may also contain the XHTML Namespace Inline Elements (see Section 4.2.5) for enhancing user agent presentation of the metadata information — especially useful for metadata of a “prose” nature, such as that designated by dc:description and dc:title.

Several of the more important Dublin Core metadata elements, and those with specialized attributes, are described in greater detail in sibling sections; for information on the other elements, refer to the Dublin Core Metadata Element Set, Version 1.1 specification and related documents. The OEBPS 1.2 Specification, also discusses the use of the Dublin Core metadata elements (this specification adopts several of the OEBPS innovations in the use of Dublin Core metadata.) Binder Document authors should use the Dublin Core metadata elements consistent with Dublin Core recommendations, except where they conflict with the requirements of this specification.

The Binder Document already requires three critical metadata items which are designated elsewhere in the Binder: the Publication Identifier, the Publication Title (required for each User Set), and the Publication primary and secondary languages. Although redundant, Binder Document authors should replicate this information in the Dublin Core Metadata — the details are discussed in the relevant sibling sections.

Binder Document authors should specify the language, using xml:lang (see Section 4.2.1.1), for the “prose” containing metadata elements, such as dc:description and dc:title, when their language differs from the default language assigned to dublincore.

User agents should provide a mechanism allowing users, on demand, to access and review the Dublin Core metadata information.

4.5.2 dc:identifier

The dc:identifier element designates an identifier for the Publication. It should be included in the Dublin Core metadata. The first instance of use of this element in the Dublin Core metadata must replicate the primary identifier assigned in the Publication Identifier functional part.

The dc:identifier element supports two identifier-related attributes required by the id element in the Publication Identifier functional part: idns and ver.

The value of the required idns attribute, which specifies the identifier namespace, must follow the requirements in Section 4.3.3. Likewise, the value of the sometimes required (as noted below) ver attribute, which specifies the versioning of the identifier, must follow the requirements in Section 4.3.4.

In the first dc:identifier instance, which replicates the primary Publication Identifier, the idns and ver attributes must assume the values identical to their counterparts for the id element.

For the Publication Identifier markup example in Section 4.3.5, the dc:identifier equivalent is:

<dc:identifier idns="urn:uuid" ver="1">6a2014b0-87a2-11da-a72b-0800200c9a66</dc:identifier>

4.5.3 dc:title

The dc:title element designates the Publication title. It should replicate, whenever possible, the title given in the Publication Title functional part of the User Set.

A Publication title may comprise multiple lines, as the markup example in Section 4.7 illustrates. Each line in the Publication title will be expressed with its own dc:title element, and the order of appearance of dc:title elements is significant (they are not required to be adjacent to each other, although that is recommended for readability.) For the same primary language (as explained below), the dc:title elements form the complete Publication title by their order of appearance in the Dublin Core metadata.

A complication arises when there are multiple User Sets and the Publication title varies between them (which is allowed.) The title variation may be in the same primary language, or in two or more primary languages.

For multiple User Sets with variations of the Publication title, the Binder Document author should include in the Dublin Core metadata an appropriate Publication title (which may comprise one or more lines) for each primary language represented by the User Set titles. When there are multiple primary languages, the Binder Document author must apply the xml:lang attribute (designating the primary language) to all instances of dc:title in the Dublin Core metadata.

For the Publication Title markup example given in Section 4.7, the Dublin Core equivalent is:

<dc:title>Dr. Strangelove</dc:title>
...
<dc:title>Or: How I Learned to Stop Worrying and <xhtml:em>Love the Bomb</xhtml:em></dc:title>

In the above example, since both the dc:title elements are of the same language (inherited from a containing element where xml:lang is assigned US English), they form the two line Publication title:

Dr. Strangelove
Or: How I Learned to Stop Worrying and Love the Bomb

Note in the markup example above that even though appearance order is significant, the two dc:title elements need not be adjacent.

Markup example of dc:title with multiple lines and in multiple languages mixed together:

<dc:title xml:lang="en-US">A Trip To Paris:</dc:title>
...
<dc:title xml:lang="fr-FR">Un Voyage Vers Paris:</dc:title>
...
<dc:title xml:lang="en-US">Visiting The Eiffel Tower</dc:title>
...
<dc:title xml:lang="fr-FR">Visiter La Tour D'Eiffel</dc:title>

In the above markup example, the Dublin Core metadata designates two language-specific Publication titles (each comprising two lines.) The English version of the Publication title is:

A Trip To Paris:
Visiting The Eiffel Tower

The French version of the Publication title is:

Un Voyage Vers Paris:
Visiter La Tour D'Eiffel

4.5.4 dc:language

The dc:language element designates a significant language used in the Publication. For multiple Publication languages, use one instance of dc:language for each language. Binder Document authors should designate at least one primary language (see below) for the Publication.

Similar to the requirements for xml:lang (see Section 4.2.1.1), the character content of dc:language must comply with RFC 3066, or its successor on the IETF Standards Track. (Language Codes) (Country Codes)

The required attribute usage must be assigned either the value of primary or secondary, which indicates whether the language is considered a primary language of the Publication, or a secondary language. (Note that there may be more than one primary language of the Publication.)

The designated primary languages of the Publication should be all the unique languages specified in the required lang attribute for all instances of the userset element (see Section 4.6.3.)

The designated secondary languages of the Publication should be all the unique languages (other than those designated as primary), specified in the lang attribute for the resources and all the item eleme