We're making big changes. Please try out the beta site at beta.ccel.org and send us feedback. Thank you!

Adding Page Breaks to .xml Documents (Easy to Learn!)

An easy way to start learning .XML is to add ‘page break’ tags to books.

This is a relatively easy task, and it normally doesn’t take too long (although it depends on the length of the book). An average size book can usually be completed in an hour or two.

Any book without pages numbers in the .pdf format is a candidate to have pb tags added (page numbers are generally in maroon colored boxes at the top left of page).

Loutzenhiser's picture

Step 1: Determine book title

Step 1: Determine book title (any book that does not have page numbers in maroon boxes in the .pdf format of the book). I’ve also listed books we’d specifically to have completed in the Volunteers forum.
Step 2: Obtain the .xml file of this book. Generally speaking, this is the THML file listed under “other formats” for ALL of our books on CCEL. Email me if you have questions or if you’d like me to email you the .xml file as an email attachment.
Step 3: Open up the page images of the book on which you are working from the CCEL web site. The page images are also listed under ‘other formats’ for most CCEL books. In some cases, we don’t have the page images. If we don’t have the page images, you will need to either choose a different book, or locate either a copy of the book yourself, or pages images of the book from another source (you might try Google Books). We do have page images for MOST of our books.
Step 4: Open the .XML file in MS Word or another similar program and arrange your computer screen so that you can see the .xml file AND the page images (unless you are using an actual book).
Step 5: Add <pb n="i"/> or <pb n="1"/> before the ‘text’ of the first page. 1 or ‘ii’ or is determined by the page image that you have open to the first page of the book—typically a title page.
Step 6: Go to the next page image of the book. I usually then use the search feature of MS Word to search for the first 2 or 3 words that are on the top of this next page. Directly before these words, insert the next page break tag, say <pb n="2"/&gt. Repeat this process inserting <pb n="3"/&gt all the way to the end of the book.
Additional notes. Sometimes a page will ‘start’ in the middle of a tag or command. Say, for example, a page ends and begins in italics. My experience is that placing these page break tags in the middle of other ‘tags’ doesn’t work. So, I always place the page break before or after the other command. Email me if you have questions. There are all sorts of little quirks that you might encounter, but we can fix these. For the most part, it’s pretty straight forward.
Step 7: When you are finished, save the file (as a text file using the save as feature of MS WORD; if you just save it, it will be saved as a Word document, and we may lose some of the formatting.
Step 8: Email me the .xml file.
Thanks for helping with this!

The gentle soft word is heard over the shout.
Robert Loutzenhiser, Senior Moderator
Chat with Loutzenhiser

Forum Moderator