:author: Dave Kuhlman :address: dkuhlman@rexx.com \\\\ http://www.rexx.com/~dkuhlman :revision: 1.0a :date: July 22, 2003 :copyright: Copyright (c) 2003 Dave Kuhlman. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. :abstract: This document describes extract_doc.py which is a program for extracting documentation from Python source code files and producing reStructuredText output. ============================================================= *extract_doc.py* --- Extract Python Source Code Documentation ============================================================= Description =========== *extract_doc* extracts documentation embedded in Python source code files. Currently, it generates reStructuredText. An extension that generates LaTeX for the Python LaTeX documentation system is being investigated. *extract_doc* is derived from and uses code in *pydoc.py* from the Python standard library. One goal of *extract_doc* is to provide code that is simple enough so that the implementation and the output it produces can be customized for specific applications or by specific users. Where to Get It =============== You can find a distribution file for ``extract_doc`` at: http://www.rexx.com/~dkuhlman/extract_doc.zip How to Use extract_doc ====================== Here is the usage information from `extract_doc`:: Usage: python extract_doc.py [options] Options: -h, --help Display this help message. -r, --rest Extract to decorated reST. -l, --latex Extract to Python LaTeX (module doc type). Not implemented. -p, --pager Use a pager; else write to stdout. -o, --over Use over *and* under title adornment, else only under. Example: python extract_doc.py -r mymodule1 python extract_doc.py -p -o -r mymodule2 Command line flag descriptions ------------------------------ -r, --rest Generate reStructuredText output. -l, --latex Generate LaTeX for the Python LaTeX documentation system. Not yet implemented. -p, --pager Use a pager, else write to stdout. Selects a pager and pushes generated output through the pager. On my system it selects *less*. -o, --over Generate over *and* under title adornment, else generate under title adornment only. How to Modify *extract_doc* =========================== *extract_doc* contains one important class: ReSTDoc. It is a subclass of class Doc in module pydoc in the Python standard library. As such, it should have followed other sub-classes of class Doc closely. However, it does not. ReSTDoc is a fairly radical re-write of TextDoc. This re-write had these goals: - Produce reStructuredText (rather than text or HTML). - Provide code that is simple, consistent, and clear enough so that others can understand and modify it. Basically, I want it to produce reStructuredText and to enables others to customize the reStructuredText that it produces for their individual needs. The current class ``ReSTDoc`` produces reStructuredText. You can try it for yourself. Here is a bit of guidance for the second aspect of the goal, i.e. modifiability: - Output is accumulated by calling ``self.push(line)`` for each line of text to be produced. - There are four functions that produce output. They are as follows: - ``docmodule`` is called for the module. It is responsible for producing the documentation for a module. - ``docclass`` is called for each class. It is responsible for producing the documentation for a class. - ``docroutine`` -- Called for each method (in a class) and each function (at top level in a module). It is responsible for producing the documentation for a method or a function. - ``docother`` -- Called for data members. It is responsible for producing the documentation for a data member. - Module ``inspect`` from the Python standard library is used to obtain the internals of an object such as its members, to determine the type of an object (e.g. method or function), format the arguments for a function, etc. - Function getdoc in module ``pydoc`` is called to get the documentation for an object, for example the documentation for a module, a class, a method, or a function. - There is a method (emphasize) to emphasize a piece of text. It adds asterisks around the text. In order to produce your own customized documentation extraction capability, you might want to do the following: - Copy class ``ReSTDoc``. - Modify methods ``docmodule``, ``docclass``, ``docroutine``, and ``docother`` in class ``ReSTDoc``. - copy function ``extract_to_rest``. - Modify function ``extract_to_rest``: - Add your own title, preferatory stuff, etc. Note where method ``genTitle`` is called and where the "Generated by ..." content is added. - Add your own end-of-doc content. Add this after the call to ``formatter.document()``. Related Work ============ *PySource* -- Python Source Reader ---------------------------------- This documentation extractor takes a very different approach. It is *not* modelled on pydoc in the Python standard library. It does not use the inspect module from the Python standard library. (I grepped for "inspect" in ``sandbox/davidg/pysource_reader``.) The documentation says that it: "... scans a parsed Python module, and returns an ordered tree containing the names, docstrings (including attribute and additional docstrings), and additional info ..." The approach followed by ``PySource`` appears more complex than that of ``extract_doc, but also more powerful. I'm going to guess that the start-up time for a simple-minded programmer (like me) to begin modifying and customizing ``PySource`` for user specific needs would be longer than for ``extract_doc``. I'd appreciate any comments and comparisons that others might have. Credits ======= Thanks to the developers of Docutils, in particular, David Goodger, project lead. Thanks to Ka-Ping Yee for *pydoc*. See Also ======== `Docutils: Python Documentation Utilities`_ .. _`Docutils: Python Documentation Utilities`: http://docutils.sourceforge.net/ `pydoc -- Documentation generator and online help system`_ .. _`pydoc -- Documentation generator and online help system`: http://www.python.org/doc/current/lib/module-pydoc.html