Saturday, May 07, 2005

When the open-source model breaks down

I have a fairly simple requirement: I want to write a long technical document, using DocBook as the markup language, and I want to conveniently cite references to other publications. And I want it to run on Linux. That's a simple requirement to state; I've discovered, to a reasonable standard of proof, that it's by no means easy to achieve. In fact, I've spent an immensely frustrating few days just trying - and failing - to cobble together a working solution.

There are tools out there. I've been looking mostly at two: refDB, and JReferences. Both are open-source reference managers, that import a variety of formats, and can produce DocBook marked-up bibliographies for a given input document. In theory. RefDB is the more polished and complete of the two tools. JReferences is a one-man effort that seems to be moribund for the last couple of years.

I actually got a fairly long way with refDB. I have installed the software, imported my references in RIS format, can query for particular refs and generate output in DocBook format. The tool I really want to use, however, would allow me to submit a document with <citation> elements in it, and the tool would comb my database for matching references and selectively generate formatted output. This process depends on having styles imported into refDB, and this is where the problems started. When I import a style, the refDB client goes into a busy-wait state, soaking 100% of the CPU and not terminating. I tried in verbose mode, to see what the problem might be, and got a meaningless cryptic three character error code. I surmised that there might be a problem with the libdbi database drivers, so I got the latest libdbi CVS head and tried to build it. After a bit of a false start, I can build the drivers but can't install them - the make install target breaks when installing the documentation, because it doesn't recognise the docbook schema for the SGML files, for wholly non-obvious reasons. It's at this point that the "many eyeballs make all bugs shallow" maxim falls apart. For my particular configuration, there is only my pair of eyes to look at the problem. At this point, I would gladly pay for either a licensed product, or a support contract, if I felt confident that I was going to get working functionality at the end of the day. The notion that I have the source so I can fix the problem myself doesn't work here: I have limited time to spare for the task of getting references into my document. There quickly comes a point when it's easier just to code the citatations by hand than to spend time grokking someone else's code and fixing problems.

The other tool I looked at briefly was JReferences. It's a Java program, so I felt comfortable that I could ascend the learning curve a bit more quickly if I needed to. It has the right features on paper: a RIS importer (among others) and DocBook exporter (among others), and a simple editor for viewing and updating stored references. So, task one: convert my RIS-formatted collection to BibTexML, which is JReferences' preferred internal format. There's a command-line utility to do just that. Run it, and it produces ... nothing. No output and no errors. OK, so it's a program that hasn't been touched for at least two years according to the CVS log at SourceForge. Maybe running it inside a Java debugger will reveal a simple fix. So, I get the source, drop it into Eclipse and .... and it won't compile. Not even close. There seems to be two different package layouts competing with each other. Lots of inconsistencies that Eclipse barfs on. Worse, many of the problems are incompatibilities with bibtexml.jar, which is only distributed in binary form that I can find. I can't imagine how this program ever worked properly. I can't see any test code at all. There's not actually that much code all told, and I'm fairly sure I could, if I wanted, fix it up.

But why bother? It would be significantly easier for me to start over with an empty project in Eclipse than spend time understanding, fixing, and extending the existing codebase. Open source, code re-use, and so on only helps if the code is functional, comprehensible and working. Lose that, and you have worse than nothing.

No comments: