ThML errors (some systematic)

AdamB924's picture

In converting ThML files to ePub format I have been using epubcheck, which does a very nice systematic examination the epub book. Most of the errors I've found are directly attributable to my own XSLT errors, but I've also discovered miscellaneous ThML errors, and also some systematic ones. Having written all this out, it seems a bit negative. Obviously this is a tiny subset of the possible errors there could be; I happen to have found a lot because I've checked the output of all the ThML files systematically. Also, I apologize if there is a place that I could have fixed these myself that I missed.

For some of these, the definition of "error" depends on the HTML DTD you're using. I've been assuming that XHTML Strict, really just because that's what epubcheck holds me to, but I realize that's not an explicit part of the ThML defintion. So these may seem nitpicky. I'm not aware that any of them cause display errors, for instance.

-- Miscellaneous errors --
ccel/schaff/encyc13: in a couple of places the attribute "class" is mistyped as "clas"
bernard/letters: a <colgroup> tag is placed in an inappropriate place
walker/harmony2: many instances of unescaped double-quotes

--- Errors having to do with footnotes ---
ccel/bayly/piety: there are nested <note> elements... <note n="12" id="iv.i-p19.3"><note n="13" id="iv.i-p19.4">
ccel/burgon/mark: Note 127 only has an empty p element: <note n="127" id=""><p class="normal" id=""/></note>
xml2epub ccel/hewitt/gerhardt: there are nested <note> elements... <note n="38" id="p1_3-p1.4"><note n="39" id="p1_3-p1.5">

--- Files that don't have <DC.Identifier> tags ---

--- Places where different tags were given identical @id values --
ccel/wace/biodict (tp-p0.1)
ccel/walker/harmony2 (home-p3.16)

It is fairly common for there to be plain text directly in <blockquote> elements, instead of having an intervening <p> (blockquote > p error)

--- blockquote > p error ---
xml2epub ccel/farrar/clouds
xml2epub ccel/herbermann/cathen02
xml2epub ccel/herbermann/cathen03
xml2epub ccel/herbermann/cathen04
xml2epub ccel/herbermann/cathen05
xml2epub ccel/herbermann/cathen06
xml2epub ccel/herbermann/cathen13
xml2epub ccel/therese/autobio
xml2epub ccel/luckock_h/studies
xml2epub ccel/luther/smallcat
xml2epub ccel/manton/manton01
xml2epub ccel/murray/new_life
xml2epub ccel/palgrave/sacredsong
xml2epub ccel/renan/antichrist

It also happened in a couple of places that <ul> tags would be children to <ul> tags, without an intervening <li> (likewise for ol)

--- ol > ol, ul > ul ---

-- There are link errors on this page (sorry, I didn't write the targets down) --

General thml.html.xsl issues (may be ThML issues, depending on one's perspective)
- lots of useless (and non-compliant) xmlns attributes because namespaces are not made explicit
- many non-XHTML attributes are included (liberal use of <xsl:copy-of select ="@*"/>
- <center> element has been deprecated [I just transformed these into appropriate tags, so I don't have records of where these are]
- width attribute has been deprecated [I just transformed these into appropriate tags, so I don't have records of where these are]
- p > p structures are generated (e.g., ccel/whyte/pray)

Further nested notes by AdamB924

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.