Adding Page Break Tags Task Description
- Determine book title (any book that does not have page numbers in maroon boxes in the .pdf format of the book). I’ve also listed books we’d specifically to have completed in the Volunteers forum.
Obtain the .xml file of this book.
Generally speaking, this is the THML file listed under “other formats” for ALL of our books on CCEL.Email us if you are interested in obtaining the .xml file and would like me to email you the .xml file as an email attachment or if you have other questions.
- Open up the page images of the book on which you are working from the CCEL Web site. The page images are also listed under ‘other formats’ for most CCEL books. In some cases, we don’t have the page images. If we don’t have the page images, you will need to either choose a different book, or locate either a copy of the book yourself, or pages images of the book from another source (you might try Google Books). We do have page images for MOST of our books.
- Open the .XML file in MS Word or another similar program and arrange your computer screen so that you can see the .xml file AND the page images (unless you are using an actual book).
<pb n="1"/>before the ‘text’ of the first page. 1 or ‘ii’ or is determined by the page image that you have open to the first page of the book—typically a title page.
Go to the next page image of the book. I usually then use the search feature of MS Word to search for the first 2 or 3 words that are on the top of this next page. Directly before these words, insert the next page break tag, say
<pb n="2"/>. Repeat this process inserting
<pb n="3"/>all the way to the end of the book.
Sometimes a page will ‘start’ in the middle of a tag or command. Say, for example, a page ends and begins in italics. My experience is that placing these page break tags in the middle of other ‘tags’ doesn’t work. So, I always place the page break before or after the other command. Email me if you have questions. There are all sorts of little quirks that you might encounter, but we can fix these. For the most part, it’s pretty straight forward.
- When you are finished, save the file (as a text file using the save as feature of MS WORD; if you just save it, it will be saved as a Word document, and we may lose some of the formatting.
- Email me the .xml file.
Thanks for helping with this!