Thursday, August 11, 2005

Transforming Bibliographies

Among the useless things one can do in life, maintaining one or more publication lists ranks high. My tendency to waste time on my publication list probably dates back to my days as a PhD student when I badly needed publications to put on a list. On the other hand, as publications are the measure of achievement in research, more researchers may have this problem.

Anyway, maintaining a list of publications can be quite tedious, in particular if you want to provide multiple views on the publications. For example, a listing with most recent publications first, one providing the most important ones first, one organized by research topic, and finally a separate list for each project. Also your department may require regular submission of lists. On the web version the entries should come with links to the pdf files and/or the webpage of the publisher, but these links should not be displayed in the version for printing, since they are quite useless there.

Being a computer scientist, I elevated the activity of maintaining content to maintaining a program for generating the various lists. This is still a waste of time, of course, but the excuse is that it will save me time in future. Another excuse is that I developed my program as a case study for the transformation language Stratego.

In fact, the bibtex-tools package has emerged over a long time, starting with a syntax definition for BibTeX first written in 1999. It turns out that BibTeX has quite an intricate syntax that is not so easily formalized with a traditional approach based on a separate lexical analyzer and context-free parser. With the scannerless approach of SDF this poses no problems at all.

Also, the use of the Stratego to perform transformations on a structured representation of a BibTeX file is a definite improvement over directly transforming its text representation. Moreover, these transformations can be expressed quite concisely. For example, the following strategy definitions define an inliner for BibTeX that replaces occurrences of string identifiers with their body. (BibTeX allows the definition of strings such as @string{LNCS={Lecture Notes in Computer Science}}, which can then be quoted in entry fields using the identifier, e.g., series = LNCS.)

  bib-inline = 
    bottomup(try(DeclareInlineString + InlineString + FoldWords))

  DeclareInlineString =
    ?String(_, StringField(key, value))
    ; rules( InlineString : Id(key) -> value )

  FoldWords :
    ConcValue(Words(ws1), Words(ws2)) -> Words((ws1, ws2))
After having developed my own set of BibTeX tools using the Stratego transformation language over the last couple of years, I decided to make them into a proper software package that could be used by others, complete with a manual that explains the LaTeX/BibTeX/Hevea techniques used to get a publication list into HTML. The currently availabe version is a pre-release of the first official release 0.2. I'm waiting for a new stable version of Stratego/XT and for some feedback from users before I make the release official.

So if you don't want to waste time on editing your publication list webpage, but instead want to wast time learnig to use my tools, you now know where to find them.

1 comment:

Unknown said...

Your parenthetical remark `or you must have the list already in bibtex format' is the key here. As I point out in the paper/manual BibTeX is a legacy format in multiple respects:

(1) Indeed I have lots of files with BibTeX entries. I could convert these to some other format, for instance, to an XML format. As we have recently started using DocBook for the Stratego/XT manual, I am indeed contemplating a conversion from BibTeX to the DocBook bibliography format. (But for that I need to transform BibTeX files again.)

(2) But using XML is not a solution if you write your papers using the LaTeX typesetting system, as is customary for lots of computer scientists. BibTeX works great in combination with LaTeX.

(3) A final reason for using BibTeX is that there is a large number of style files for formatting BibTeX entries. With a single data source one can produce a large variety of typeset bibliographies. Of course, this is possible with XML as well in principle, but I don't know of the actual style files for doing this with an XML format, currently. They may already exist (please let me know), and otherwise they will undoubtely be created in the future. In the meantime I stick with BibTeX.