Docutils To Do List

Author: David Goodger (with input from many); open to all Docutils developers
Contact: goodger@python.org
Date: 2005-12-23
Revision: 4229
Copyright: This document has been placed in the public domain.

Contents

Priority items are marked with "@" symbols. The more @s, the higher the priority. Items in question form (containing "?") are ideas which require more thought and debate; they are potential to-do's.

Many of these items are awaiting champions. If you see something you'd like to tackle, please do! If there's something you'd like to see done but are unable to implement it yourself, please consider donating to Docutils: Support the Docutils project!

Please see also the Bugs document for a list of bugs in Docutils.

Release 0.4

We should get Docutils 0.4 out soon, but we shouldn't just cut a "frozen snapshot" release. Here's a list of features (achievable in the short term) to include:

Anything else?

Once released,

Minimum Requirements for Python Standard Library Candidacy

Below are action items that must be added and issues that must be addressed before Docutils can be considered suitable to be proposed for inclusion in the Python standard library.

General

Documentation

User Docs

  • Add a FAQ entry about using Docutils (with reStructuredText) on a server and that it's terribly slow. See the first paragraphs in <http://article.gmane.org/gmane.text.docutils.user/1584>.
  • Add document about what Docutils has previously been used for (web/use-cases.txt?).

Developer Docs

  • Complete Docutils Runtime Settings.
  • Improve the internal module documentation (docstrings in the code). Specific deficiencies listed below.
    • docutils.parsers.rst.states.State.build_table: data structure required (including StringList).
    • docutils.parsers.rst.states: more complete documentation of parser internals.
  • docs/ref/doctree.txt: DTD element structural relationships, semantics, and attributes. In progress; element descriptions to be completed.
  • Document the pending elements, how they're generated and what they do.
  • Document the transforms (perhaps in docstrings?): how they're used, what they do, dependencies & order considerations.
  • Document the HTML classes used by html4css1.py.
  • Write an overview of the Docutils architecture, as an introduction for developers. What connects to what, why, and how. Either update PEP 258 (see PEPs below) or as a separate doc.
  • Give information about unit tests. Maybe as a howto?
  • Document the docutils.nodes APIs.
  • Complete the docs/api/publisher.txt docs.

How-Tos

  • Creating Docutils Writers
  • Creating Docutils Readers
  • Creating Docutils Transforms
  • Creating Docutils Parsers
  • Using Docutils as a Library

PEPs

  • Complete PEP 258 Docutils Design Specification.

    • Fill in the blanks in API details.

    • Specify the nodes.py internal data structure implementation?

      [Tibs:] Eventually we need to have direct documentation in there on how it all hangs together - the DTD is not enough (indeed, is it still meant to be correct? [Yes, it is. --DG]).

  • Rework PEP 257, separating style from spec from tools, wrt Docutils? See Doc-SIG from 2001-06-19/20.

Python Source Reader

General:

Miscellaneous ideas:

reStructuredText Parser

Also see the ... Or Not To Do? list.

Directives

Directives below are often referred to as "module.directive", the directive function. The "module." is not part of the directive name when used in a document.

  • Make the directive interface object-oriented (http://article.gmane.org/gmane.text.docutils.user/1871).

  • Allow for field lists in list tables. See <http://thread.gmane.org/gmane.text.docutils.devel/3392>.

  • Unify table implementations and unify options of table directives (http://article.gmane.org/gmane.text.docutils.user/1857).

  • Allow directives to be added at run-time?

  • Use the language module for directive option names?

  • Add "substitution_only" and "substitution_ok" function attributes, and automate context checking?

  • Change directive functions to directive classes? Superclass' __init__() could handle all the bookkeeping.

  • Implement options or features on existing directives:

    • Add a "name" option to directives, to set an author-supplied identifier?

    • All directives that produce titled elements should grow implicit reference names based on the titles.

    • Allow the :trim: option for all directives when they occur in a substitution definition, not only the unicode directive.

    • images.figure: "title" and "number", to indicate a formal figure?

    • parts.sectnum: "local"?, "refnum"

      A "local" option could enable numbering for sections from a certain point down, and sections in the rest of the document are not numbered. For example, a reference section of a manual might be numbered, but not the rest. OTOH, an all-or-nothing approach would probably be enough.

      The "sectnum" directive should be usable multiple times in a single document. For example, in a long document with "chapter" and "appendix" sections, there could be a second "sectnum" before the first appendix, changing the sequence used (from 1,2,3... to A,B,C...). This is where the "local" concept comes in. This part of the implementation can be left for later.

      A "refnum" option (better name?) would insert reference names (targets) consisting of the reference number. Then a URL could be of the form http://host/document.html#2.5 (or "2-5"?). Allow internal references by number? Allow name-based and number-based ids at the same time, or only one or the other (which would the table of contents use)? Usage issue: altering the section structure of a document could render hyperlinks invalid.

    • parts.contents: Add a "suppress" or "prune" option? It would suppress contents display for sections in a branch from that point down. Or a new directive, like "prune-contents"?

      Add an option to include topics in the TOC? Another for sidebars? The "topic" directive could have a "contents" option, or the "contents" directive" could have an "include-topics" option. See docutils-develop 2003-01-29.

    • parts.header & parts.footer: Support multiple, named headers & footers? For example, separate headers & footers for odd, even, and the first page of a document.

      This may be too specific to output formats which have a notion of "pages".

    • misc.class:

    • misc.include:

      • Option to select a range of lines?

      • Option to label lines?

      • How about an environment variable, say RSTINCLUDEPATH or RSTPATH, for standard includes (as in .. include:: <name>)? This could be combined with a setting/option to allow user-defined include directories.

      • Add support for inclusion by URL?

        .. include::
           :url: http://www.example.org/inclusion.txt
        
    • misc.raw: add a "destination" option to the "raw" directive?

      .. raw:: html
         :destination: head
      
         <link ...>
      

      It needs thought & discussion though, to come up with a consistent set of destination labels and consistent behavior.

      And placing HTML code inside the <head> element of an HTML document is rather the job of a templating system.

    • body.sidebar: Allow internal section structure? Adornment styles would be independent of the main document.

      That is really complicated, however, and the document model greatly benefits from its simplicity.

  • Implement directives. Each of the list items below begins with an identifier of the form, "module_name.directive_function_name". The directive name itself could be the same as the directive_function_name, or it could differ.

    • html.imagemap

      It has the disadvantage that it's only easily implementable for HTML, so it's specific to one output format.

      (For non-HTML writers, the imagemap would have to be replaced with the image only.)

    • parts.endnotes (or "footnotes"): See Footnote & Citation Gathering.

    • parts.citations: See Footnote & Citation Gathering.

    • misc.language: Specify (= change) the language of a document at parse time.

    • misc.settings: Set any(?) Docutils runtime setting from within a document? Needs much thought and discussion.

    • misc.gather: Gather (move, or copy) all instances of a specific element. A generalization of the "endnotes" & "citations" ideas.

    • Add a custom "directive" directive, equivalent to "role"? For example:

      .. directive:: incr
      
         .. class:: incremental
      
      .. incr::
      
      "``.. incr::``" above is equivalent to "``.. class:: incremental``".
      

      Another example:

      .. directive:: printed-links
      
         .. topic:: Links
            :class: print-block
      
            .. target-notes::
               :class: print-inline
      

      This acts like macros. The directive contents will have to be evaluated when referenced, not when defined.

      • Needs a better name? "Macro", "substitution"?
      • What to do with directive arguments & options when the macro/directive is referenced?
    • Docutils already has the ability to say "use this content for Writer X" (via the "raw" directive), but it doesn't have the ability to say "use this content for any Writer other than X". It wouldn't be difficult to add this ability though.

      My first idea would be to add a set of conditional directives. Let's call them "writer-is" and "writer-is-not" for discussion purposes (don't worry about implemention details). We might have:

      .. writer-is:: text-only
      
         ::
      
             +----------+
             |   SNMP   |
             +----------+
             |   UDP    |
             +----------+
             |    IP    |
             +----------+
             | Ethernet |
             +----------+
      
      .. writer-is:: pdf
      
         .. figure:: protocol_stack.eps
      
      .. writer-is-not:: text-only pdf
      
         .. figure:: protocol_stack.png
      

      This could be an interface to the Filter transform (docutils.transforms.components.Filter).

      The ideas in adaptable file extensions above may also be applicable here.

      SVG's "switch" statement may provide inspiration.

      Here's an example of a directive that could produce multiple outputs (both raw troff pass-through and a GIF, for example) and allow the Writer to select.

      .. eqn::
      
         .EQ
         delim %%
         .EN
         %sum from i=o to inf c sup i~=~lim from {m -> inf}
         sum from i=0 to m sup i%
         .EQ
         delim off
         .EN
      
    • body.example: Examples; suggested by Simon Hefti. Semantics as per Docbook's "example"; admonition-style, numbered, reference, with a caption/title.

    • body.index: Index targets.

      See Index Entries & Indexes.

    • body.literal: Literal block, possibly "formal" (see object numbering and object references above). Possible options:

      • "highlight" a range of lines

      • include only a specified range of lines

      • "number" or "line-numbers"

      • "styled" could indicate that the directive should check for style comments at the end of lines to indicate styling or markup.

        Specific derivatives (i.e., a "python-interactive" directive) could interpret style based on cues, like the ">>> " prompt and "input()"/"raw_input()" calls.

      See docutils-users 2003-03-03.

    • body.listing: Code listing with title (to be numbered eventually), equivalent of "figure" and "table" directives.

    • colorize.python: Colorize Python code. Fine for HTML output, but what about other formats? Revert to a literal block? Do we need some kind of "alternate" mechanism? Perhaps use a "pending" transform, which could switch its output based on the "format" in use. Use a factory function "transformFF()" which returns either "HTMLTransform()" instance or "GenericTransform" instance?

      If we take a Python-to-HTML pretty-printer and make it output a Docutils internal doctree (as per nodes.py) instead of HTML, then each output format's stylesheet (or equivalent) mechanism could take care of the rest. The pretty-printer code could turn this doctree fragment:

      <literal_block xml:space="preserve">
      print 'This is Python code.'
      for i in range(10):
          print i
      </literal_block>
      

      into something like this ("</>" is end-tag shorthand):

      <literal_block xml:space="preserve" class="python">
      <keyword>print</> <string>'This is Python code.'</>
      <keyword>for</> <identifier>i</> <keyword
      >in</> <expression>range(10)</>:
          <keyword>print</> <expression>i</>
      </literal_block>
      

      But I'm leaning toward adding a single new general-purpose element, "phrase", equivalent to HTML's <span>. Here's the example rewritten using the generic "phrase":

      <literal_block xml:space="preserve" class="python">
      <phrase class="keyword">print</> <phrase
       class="string">'This is Python code.'</>
      <phrase class="keyword">for</> <phrase
       class="identifier">i</> <phrase class="keyword">in</> <phrase
       class="expression">range(10)</>:
          <phrase class="keyword">print</> <phrase
           class="expression">i</>
      </literal_block>
      

      It's more verbose but more easily extensible and more appropriate for the case at hand. It allows us to edit style sheets to add support for new formats, not the Docutils code itself.

      Perhaps a single directive with a format parameter would be better:

      .. colorize:: python
      
         print 'This is Python code.'
         for i in range(10):
             print i
      

      But directives can have synonyms for convenience. "format:: python" was suggested, but "format" seems too generic.

    • pysource.usage: Extract a usage message from the program, either by running it at the command line with a --help option or through an exposed API. [Suggestion for Optik.]

Interpreted Text

Interpreted text is entirely a reStructuredText markup construct, a way to get around built-in limitations of the medium. Some roles are intended to introduce new doctree elements, such as "title-reference". Others are merely convenience features, like "RFC".

All supported interpreted text roles must already be known to the Parser when they are encountered in a document. Whether pre-defined in core/client code, or in the document, doesn't matter; the roles just need to have already been declared. Adding a new role may involve adding a new element to the DTD and may require extensive support, therefore such additions should be well thought-out. There should be a limited number of roles.

The only place where no limit is placed on variation is at the start, at the Reader/Parser interface. Transforms are inserted by the Reader into the Transformer's queue, where non-standard elements are converted. Once past the Transformer, no variation from the standard Docutils doctree is possible.

An example is the Python Source Reader, which will use interpreted text extensively. The default role will be "Python identifier", which will be further interpreted by namespace context into <class>, <method>, <module>, <attribute>, etc. elements (see pysource.dtd), which will be transformed into standard hyperlink references, which will be processed by the various Writers. No Writer will need to have any knowledge of the Python-Reader origin of these elements.

  • Add explicit interpreted text roles for the rest of the implicit inline markup constructs: named-reference, anonymous-reference, footnote-reference, citation-reference, substitution-reference, target, uri-reference (& synonyms).

  • Add directives for each role as well? This would allow indirect nested markup:

    This text contains |nested inline markup|.
    
    .. |nested inline markup| emphasis::
    
       nested ``inline`` markup
    
  • Implement roles:

    • "raw-wrapped" (or "raw-wrap"): Base role to wrap raw text around role contents.

      For example, the following reStructuredText source ...

      .. role:: red(raw-formatting)
         :prefix:
             :html: <font color="red">
             :latex: {\color{red}
         :suffix:
             :html: </font>
             :latex: }
      
      colored :red:`text`
      

      ... will yield the following document fragment:

      <paragraph>
          colored
          <inline classes="red">
              <raw format="html">
                  <font color="red">
              <raw format="latex">
                  {\color{red}
              <inline classes="red">
                  text
              <raw format="html">
                  </font>
              <raw format="latex">
                  }
      

      Possibly without the intermediate "inline" node.

    • "acronym" and "abbreviation": Associate the full text with a short form. Jason Diamond's description:

      I want to translate `reST`:acronym: into <acronym title='reStructuredText'>reST</acronym>. The value of the title attribute has to be defined out-of-band since you can't parameterize interpreted text. Right now I have them in a separate file but I'm experimenting with creating a directive that will use some form of reST syntax to let you define them.

      Should Docutils complain about undefined acronyms or abbreviations?

      What to do if there are multiple definitions? How to differentiate between CSS (Content Scrambling System) and CSS (Cascading Style Sheets) in a single document? David Priest responds,

      The short answer is: you don't. Anyone who did such a thing would be writing very poor documentation indeed. (Though I note that somewhere else in the docs, there's mention of allowing replacement text to be associated with the abbreviation. That takes care of the duplicate acronyms/abbreviations problem, though a writer would be foolish to ever need it.)

      How to define the full text? Possibilities:

      1. With a directive and a definition list?

        .. acronyms::
        
           reST
               reStructuredText
           DPS
               Docstring Processing System
        

        Would this list remain in the document as a glossary, or would it simply build an internal lookup table? A "glossary" directive could be used to make the intention clear. Acronyms/abbreviations and glossaries could work together.

        Then again, a glossary could be formed by gathering individual definitions from around the document.

      2. Some kind of inline parameter syntax?

        `reST <reStructuredText>`:acronym: is `WYSIWYG <what you
        see is what you get>`:acronym: plaintext markup.
        
      3. A combination of 1 & 2?

        The multiple definitions issue could be handled by establishing rules of priority. For example, directive-based lookup tables have highest priority, followed by the first inline definition. Multiple definitions in directive-based lookup tables would trigger warnings, similar to the rules of implicit hyperlink targets.

      4. Using substitutions?

        .. |reST| acronym:: reST
           :text: reStructuredText
        

      What do we do for other formats than HTML which do not support tool tips? Put the full text in parentheses?

    • "figure", "table", "listing", "chapter", "page", etc: See object numbering and object references above.

    • "glossary-term": This would establish a link to a glossary. It would require an associated "glossary-entry" directive, whose contents could be a definition list:

      .. glossary-entry::
      
         term1
             definition1
         term2
             definition2
      

      This would allow entries to be defined anywhere in the document, and collected (via a "glossary" directive perhaps) at one point.

Unimplemented Transforms

HTML Writer

PEP/HTML Writer

LaTeX writer

HTML SlideShow Writer

Add a Writer for presentations, derivative of the HTML Writer. Given an input document containing one section per slide, the output would consist of a master document for the speaker, and a slide file (or set of filess, one (or more) for each slide). Each slide would contain the slide text (large, stylesheet-controlled) and images, plus "next" and "previous" links in consistent places. The speaker's master document would contain a small version of the slide text with speaker's notes interspersed. The master document could use target="whatever" to direct links to a separate window on a second monitor (e.g., a projector).

Ideas:

Below, "[S5]" indicates that S5 already implements the feature or may implement all or part of the feature. "[S5 1.1]" indicates that S5 version 1.1 implements the feature (a preview of the 1.1 beta is available in the S5 testbed).

Features & issues:

Here's an example that I was hoping to show at PyCon DC 2005:

========================
 The Docutils SlideShow
========================

Welcome To The Docutils SlideShow!
==================================

.. pause::

David Goodger

goodger@python.org

http://python.net/~goodger

.. (introduce yourself)

   Hi, I'm David Goodger from Montreal, Canada.

   I've been working on Docutils since 2000.
   Time flies!

.. pause::

Docutils

http://docutils.sourceforge.net

.. I also volunteer as a Python Enhancement Proposal (or PEP)
   editor.

.. SlideShow is a new feature of Docutils.  This presentation was
   written using the Docutils SlideShow system.  The slides you
   are seeing are HTML, rendered by a standard Mozilla Firefox
   browser.


The Docutils SlideShow System
=============================

.. The Docutils SlideShow System provides

Easy and open presentations.


Features
========

* reStructuredText-based input files.

  .. reStructuredText is a what-you-see-is-what-you-get
     plaintext format.  Easy to read & write, non-proprietary,
     editable in your favourite text editor.

  .. Parsers for other markup languages can be added to Docutils.
     In the future, I hope some are.

  .. pause:: ...

* Stylesheet-driven HTML output.

  .. The format of all elements of the output slides are
     controlled by CSS (cascading stylesheets).

  .. pause:: ...

* Works with any modern browser.

  .. that supports CSS, frames, and JavaScript.
     Tested with Mozilla Firefox.

  .. pause:: ...

* Works on any OS.


Etc.
====

That's as far as I got, but you get the idea...

Front-End Tools