.. include:: .. footer:: Lea Wiemann ================ Document Trees ================ ------------------------------------------ Patterns and anti-patterns in your code. ------------------------------------------ .. raw:: html
— Lea Wiemann —
.. role:: incremental .. default-role:: incremental PART A: How to Convert ====================== .. class:: incremental Format A (Wiki) —> Format B (XHTML) DocBook —> LaTeX OpenDocument —> CHM RTF —> Troff .. class:: handout Just examples, could also be DocBook to LaTeX. .. |event| image:: graphics/parsing-event-small.png :class: bordered Input ===== .. image:: graphics/downscale/input.png Output ====== .. image:: graphics/downscale/output.png 1. Direct Conversion ==================== .. container:: incremental .. image:: graphics/downscale/direct.png (e.g. used in MoinMoin wiki) 1. Direct Conversion ==================== .. image:: graphics/downscale/direct.png .. class:: incremental 1. Parser encounters: |event| 2. Parser calls ``writer.visit_emphasis()``. 3. Writer writes ``self.output.append('')``.

\ `This is` ``\ `emphasized`\ `` `text.`\ `

` 1. Direct Conversion ==================== .. class:: incremental Inflexible, but fast. 2. Using Document Tree ====================== .. container:: incremental .. image:: graphics/downscale/indirect.png 1. Input format —> internal document tree. 2. Internal document tree —> output format. (e.g. used in Docutils) 2. Using Document Tree ====================== .. class:: incremental Slow, but can do transformations: .. image:: graphics/downscale/indirect-plus-transform.png Which Approach? =============== | **Performance** (direct conversion) | vs. **features** (using document tree). Choose carefully at the start (hard to refactor). \ = | Assume we have a document tree | how do we write it out? PART B ====== Writing out the Document Tree ----------------------------- .. image:: graphics/downscale/part-b-writing-out.png The Bad Way =========== .. class:: handout I'll show you later why this is bad. Obvious approach: For each node, call ``visit_`` and ``depart_`` methods:: def visit_emphasis(): self.output.append('') def depart_emphasis(): self.output.append('') The Bad Way =========== .. class:: incremental :: This is emphasized text. .. class:: handout So this document tree gets rendered using this call sequence: The Bad Way =========== .. class:: incremental :: Call Output ----------------------- ---------- visit_paragraph()

visit_Text() This is visit_emphasis() visit_Text() emphasized depart_emphasis() visit_Text() text. depart_paragraph()

The Bad Way =========== .. container:: incremental We can push and pop from a stack: :: def visit_reference(): if use_superscript: self.output.append('') self.stack.append('') else: # use brackets self.output.append('[') self.stack.append(']') def depart_reference(): self.output.append(self.stack.pop()) .. class:: handout Still very legible. The Bad Way =========== .. class:: incremental Works well for a long time... Until output structure != document tree structure. E.g.: The Bad Way =========== Consider how footnotes are rendered: .. container:: incremental .. [1] Unreferenced footnote. .. [2] Referenced (one backlink). .. [3] Referenced (3 backlinks). .. class:: handout This is only a difference in *rendering*. In the document tree, they look the same. The Bad Way =========== ::