Friday, March 18, 2011

Working on API docs

I've spent pretty much the last week (full time - I took a week of vacation) working on something that has needed to be done for a long time, which is the API documentation for the Tart standard library.

Unfortunately, the first two days I spent chasing a dead end and had to backtrack. The issue revolves around the use of Sphinx for generating docs. I've been using it to generate the Tart language docs, and I've very happy with it, both in the authoring process and in the way it looks.

The question was whether I should use Sphinx for the API documentation as well. Now, I've been adding doc comments to classes from the beginning, so all I need is a way to extract and format those docs. While Sphinx does have an auto-doc facility, it is pretty closely tied to the Python language - the various autodoc directives import Python modules and introspect them.

My first approach was to try and use the existing autodoc directives. My plan was to parse Tart source code into Python objects, and then subclass the Sphinx documentor classes to look at the various attributes of those objects to generate the documentation. I essentially re-coded my entire Tart parser in Python to do this.

However, I ran into a number of problems with this approach. The first was that this part of Sphinx is very complicated, and not at all documented. Another problem is that Sphinx is not able to generate multiple files from a single input file, where by "input file" I mean a .rst file. This is more of a limitation of DocUtils than it is of Sphinx. So for example, if I put an "autopackage" directive in a file that tells it to extract documentation from all of the modules in a package, it will put all of the documentation in a single HTML file, rather than having one HTML file per class, which is what I wanted. To get that, I would have to create a separate .rst file for each class. One way to do this would be to auto-generate the input files, but by that time I was starting to run into so many problems that I started to question this whole approach.

A third problem was that my Python-based parser was not as smart as the real Tart compiler. Unless I was willing to re-code the entire frontend in Python (and not just the parser), it was never going to be able to do things like proper template argument resolution (i.e. being able to hyperlink to the proper page when you click on a template argument.)

My next thought was to write a separate program in C++ that used the real compiler to parse the code, and then replace the code-generation module with a backend that would generate .rst source files. The advantage here is that I wouldn't be dependent on any of the internals of Sphinx. However, I decided that before I do that, I should really think more about what I actually wanted - that is, what I wanted the API docs to look like. I decided to manually write some API docs as .rst files, run them through Sphinx, and tweak them until they looked like I wanted.

However, as I did this, I started to realize run into some fairly fundamental limitations of Sphinx. For example, I was never going to get Sphinx to generate the kind of class index I wanted - Sphinx tends to format things as hierarchical tables of contents, and what I wanted was something more like JavaDoc - in particular I wanted to separate out the classes that were exceptions and attributes into their own category. Also, it's difficult to embed arbitrary styles in the docs, you have to fit everything into the ReST document model.

So at this point I'm thinking whether I want to use Sphinx for my API docs at all. This is not something I take lightly, because there's a lot of good stuff that you get from Sphinx that would be hard to get otherwise, such as having both your intro/language docs sharing the same namespace as your API docs, making it easy to cross-link between them. Also, you get ReST-style inline markup, which is not bad.

Now, I have my own markup language which I have been using. Most doc-comment markup languages (JavaDoc, doxygen, etc.) are based around the principle of maximizing the readabilty of the generated documentation, but at the expense of adding a lot of syntax into the source-code comments, making the source code less readable. However, it's been my experience that these days programmers are more likely to go to the source code than to a web page for documentation - assuming the source code is available. Most modern IDEs and even some "smart" text editors allow you to navigate to a particular declaration with one or two keystrokes, whereas going to a webpage and looking up the symbol you are interested in is generally an order of magnitude more complex. So it makes sense that the doc comments should not impact the source-code readability if possible.

I was inspired by Doc-o-matic, a commercial system in use at EA, which has a very lightweight syntax that doesn't detract from the readability of the source, and my markup is based on that. If I were to continue to use Sphinx, I would want to translate my markup language into the ReST equivalent.

In any case, I switched gears last Wednesday, and decided to go down the path of generating the API documents with my own programs. I would divide the problem into two stages. The first stage would extract all the doc comments and generate XML files containing all of the comments (with the markup converted to XML), as well as all of the class, function, and type declarations, all of which would be have fully-qualified names. This is similar to the previous scheme, but outputting XML instead of ReST. I refactored parts of the Compiler class into an AbstractCompiler which parses and analyzes input files but doesn't generate any code, and then created a new DocExtractor subclass to output the XML. The output looks something like this:

  <module name="tart.annex.GenerateStackTrace" visibility="public">
    <typedef type="class" name="GenerateStackTrace" visibility="public">
      <method name="apply" visibility="public">
        <param name="t">
          <type>
            <typename kind="class">tart.reflect.Method</typename>
          </type>
        </param>
      </method>
      <method name="construct" visibility="public"/>
    </typedef>
  </module>

The second pass is to load in this XML file and run it through some HTML templates. I noticed that both Jinja and Genshi now work in Python3, so I grabbed a copy of Genshi (since I prefer it's style of directives). I have not actually written the templates yet.

Note that the XML approach means that I still have the option of translating the XML into ReST if I decide to use Sphinx after all. So I haven't burned any bridges yet.

No comments:

Post a Comment