xmlroff 0.5.4

February 12th, 2008

xmlroff 0.5.4 is at http://xmlroff.org/download/xmlroff-0.5.4.tar.gz.

This release fixes some table bugs and adds linefeed-treatment and white-space-collapse properties (actually added in 0.5.3, but that release was only announced on the xmlroff-list).

Customer Oppressions Department?

February 11th, 2008

I received a politely worded letter from Eircom telling me how happy they are to spam me with “special offers, price reductions and new products and services”. In fact, they’re so happy to do it that they’re going to do it even if I stop using Eircom.

Since I want to contribute to Eircom’s bottom line on my terms, not their’s, I’m sending back the opt-out form to the curiously named “Customer Suppressions Department”.

What, then, do they call the people who do the spamming? The “Customer Oppressions Department”?

Building xmlroff on Ubuntu 7.10

February 11th, 2008

Building xmlroff on Ubuntu 7.10 is straightforward once you install some build tools and the required ‘-dev’ packages.

Starting with a clean installed system, install the following packages (and their dependencies):

  • libtool
  • autoconf
  • automake1.9
  • libglib2.0-dev
  • libxslt1-dev
  • libcairo2-dev and/or libgnomeprint2.2-dev
  • libpango1.0-dev
  • libgtk2.0-dev (not libgdk-pixbuf-dev)

The Commonwealth of Thieves

February 1st, 2008

This is the second time that I’ve picked up a Tom Keneally book at Sydney Airport for the long flight out of Australia. The other was The Great Shame, and the great shame there is that I didn’t get around to reading it until several years later. By that time I was living in Ireland, so at least I could then better understand the Irish aspects of that account.

The Commonwealth of Thieves covers the period just before to just after the arrival of the First Fleet in Botany Bay. I enjoyed it, and I look forward to the likely future editions covering the next stages in Australia’s history.

Emacs at 100% CPU usage with Semantic

December 23rd, 2007

Another computer, another instance of the Emacs 22 snapshot heading towards 100% CPU usage when Semantic is active.

The solution I tried before makes the Synaptic package manager complain about the Semantic version whenever I install any package. This time, I followed advice from the Emacs wiki and applied a patch to semantic-idle.el in place, byte-compiled the file, and moved the new .elc file over the existing .elc file (in a different directory).

It’s working fine, and Synaptic isn’t complaining.

Slides for two XML 2007 training courses

December 14th, 2007

Slides for my two XML 2007 training courses are now available on the Menteith Consulting website:

A year out of Sun

October 22nd, 2007

I was going to do this on the day of the anniversary of my leaving Sun but, you know, I was too busy at the time.

In the past year I have:

  • Helped companies and organisations in the USA, England, and France with their XSLT, XSL, and XML, including:
    • Making transforming to HTML go faster for an online retailer
    • Reviewing XSLT stylesheets and suggesting improvements for a major library
    • Writing XSLT for XML-XML transformations for a journal publisher
    • Specifying and implementing XSLT transformations for an archive service
    • Augmenting a XSLT-based automated schema documentation system that produces both HTML and PDF
    • Providing expert help to get a Perl XML::LibXSLT project off the ground
  • Presented on XSLT profiling and unit testing at XTech 2007 in Paris in May
  • Been selected to present training sessions on transitioning to XSLT 2.0 and on testing XSLT at XML 2007 in Boston in December
  • Rejoined the W3C XSL FO subgroup as an invited expert
  • Made four xmlroff releases, with another happening any day now
  • Learned more about VAT and PRSI than I ever wanted to know (okay, maybe that’s not such a high point)
  • Participated in the Workshop of the W3C Japanese Layout Taskforce in Tokyo in September
  • Helped kids by completing two projects with the International Telementor Program

This is also the point at which I retire the “RIF” blog category as it has become irrelevant.

BOM in UTF-8: good, bad, or ugly?

October 3rd, 2007

The usefulness or otherwise of U+FEFF (ZERO WIDTH NON-BREAKING SPACE and BYTE ORDER MARK) in UTF-8 has been subject to reinterpretation over the years. It wasn’t mentioned in the original XML 1.0 Recommendation but was added later, rather like how its use was added to the Unicode Standard.

In the Unicode Standard 2.0, there was no mention of U+FEFF with UTF-8, either in the section on the BOM or in the appendix defining UTF-8.

In the Unicode Standard 3.0, section 13.6, “Specials”, includes:

Although there are never any questions of byte-order with UTF-8 text, this sequence can serve as signature for UTF-8 encoded text where the character set is unmarked.

In the Unicode Standard 5.0, section 3.10, “Unicode Encoding Schemes”, includes:

While there is obviously no need for a byte order signature when using UTF-8, there are occasions when processes convert UTF-16 or UTF-32 data containing a byte order mark into UTF-8. When represented in UTF-8, the byte order mark turns into the byte sequence <EF BB BF>. Its usage at the beginning of a UTF-8 data stream is neither required nor recommended by the Unicode Standard, but its presence does not affect conformance to the UTF-8 encoding scheme. Identification of the <EF BB BF> byte sequence at the beginning of a data stream can, however, be taken as a near-certain indication that the data stream is using the UTF-8 encoding scheme.

So in the Unicode Standard it’s gone from irrelevant to useful to “Oh, if you must”.

(BTW, in other reinterpretations, “Unicode Encoding Scheme” results from splitting the meaning of “UTF”, and the use of U+FEFF to indicate non-breaking is deprecated these days.)

The Unicode FAQ both lists its use as a signature and says to avoid its use where “byte oriented protocols expect ASCII characters at the beginning of a file“. However, I don’t think that XML necessarily counts as one such byte oriented protocol.

Two training sessions at XML 2007

September 26th, 2007

I have been selected to present two back-to-back training sessions at the XML 2007 conference in Boston in December: except, for, some, ne, \w, xsl:function: XSLT 2.0 for XSLT 1.0 practitioners and Testing XSLT.

The first one will, as the title says, be for people who know XSLT 1.0 and want to transition to using XSLT 2.0, and the second one will be a more practical expansion of the material covered in my XTech 2007 talk, My Stylesheet Runs, But….

Windows drive names with Cygwin xsltproc & xmllint

September 24th, 2007

Cygwin may be the only way to stay sane while using Windows, but it has its own Unix-like notion for drive names, e.g., “/cygdrive/c/” instead of “c:“. Which is fine, except when you want to use both Java XML tools, which understand only the “c:” form, and Cygwin tools, which tend to understand only the “/cygdrive/c/” form.

The Cygwin xsltproc and xmllint complain when you use them with files containing Windows drive names in system identifiers, so the second time it happened, I wrote a simple XML catalog file to map the Windows drive names to the Cygwin paths.

Put this as the contents of /etc/xml/catalog (not catalog.xml!) and the Cygwin xsltproc, etc., will handle Windows drive names:

<?xml version="1.0"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<rewriteSystem
systemIdStartString="file:///C:/"
rewritePrefix="file:///cygdrive/c/"/>
</catalog>

You will have to add a suitable rewriteSystem for each additional drive that you use.