Beginning ThML

Loutzenhiser's picture

There are a number of books on Project Gutenberg (and elsewhere) that would make good additions for the CCEL. They have to be converted to ThML (the CCEL's XML-based markup language) in order to be added. In fact, there are some books already at the CCEL that still haven't been converted to ThML.

This process requires some knowledge of XML, but it isn't too difficult for someone who knows something about XML and can use a text editor and XML parser (or XML editor). It would be helpful if they can also run a couple of perl scripts that are already written.

This work is the main bottleneck in adding books to the CCEL.

hplantin's picture

How to convert a book to THML

This is a large topic, and volunteers who are willing to help will doubtless have lots of questions. I'll be happy to outline the process and answer your questions on this forum.

By way of introduction:

ThML is an XML markup language for electronic books and digital libraries, with added support for scripture references and commentary, hymns, etc.

Books must be converted to this markup language in order to be added to the CCEL and take advantage of the CCEL's capabilities, such as the search engine, annotation, different formats that are automatically created, etc.

You can think of ThML as XHTML with added information about footnotes, table of contents, scripture references, page breaks, metadata, and the like.

There is an (old) page describing ThML. Some of that information is out of date; I'll update that information and give up-to-date information where needed and give up-to-date information in this forum.

The first homework assignment for volunteers: read the paper that describes ThML.

Harry Plantinga
CCEL Director

Harry Plantinga
CCEL Director