Book Installation: Introductory Thoughts by Charles Bowen
Charles Bowen has is a volunteer who has installed hundreds and hundreds of books for CCEL. He certainly has more experience and knowledge than I do in this process. As a result, I sent him an email asking him several questions. What follows is his response to these questions. This is not meant to be a complete instructional manual, but it might be useful to our volunteers in several ways. First, it might answer questions that volunteers have. Second, it might give enough of a feel for the process of book installation to give volunteers an idea whether or not books installations are something they would like to try. Third, this post represents an area where others can post questions and answers about book installations as they arise. (KTV)
Laptop computer: using adobe reader, or adobe acrobat to view PDF files of the book page images.
Desktop computer: using Microsoft FrontPage. This program allows me to view simultaneously the
a. html version of the document I am working on (which is basically a text file with the html markups for paragraphs, span groupings for Latin and Greek language, and scripture references) and
b. a plain file view (with symbols indicating markup codes) in which I can view the text as I am editing in larger font size (e.g. 18pt) and double space) for easier reading. I find standard text files are hard on the eyes. It is also easier to edit if one uses margins in the area of 25% on each size, thus reducing the line length being scanned as you go along.
c. When the document is completed I copy the html document into and xml format to make sure it is well-formed. Then I can view this document on the Internet to do a last check for appearance and any gross errors missed.
d. On occasion, I use OmniPage Pro 16 together with a scanner to acquire a missing page. If need be, this .PNG image can be edited with Jasc Paint Shop Pro 9.
1. I use macros which can be inserted in the text:
a. <span lang=“LA”> </span>
b. <span lang=“GR” class=“Greek”> </span>
c. <scripRef passage=“ ”> </scripRef>
d. <note> </note>
2. The Tip sheet on special characters (http://home.earthlink.net/~awinkelried/keyboard_shortcuts.html) is useful as an occasional reference for special characters.
3. A crib sheet of the Book names of the Bible is useful occasionally as a reminder. I originally downloaded this sheet from (http://www.ccel.org/ss/bookID.xml) which at that time was a simple listing of the book abbreviations without all the intervening info presently at this site.
4. For Hebrew characters in the text '', I use a crib sheet of Hebrew characters arranged by general shape for ease in finding and entering. See .PNG page at the end of this page.
5. Greek characters and usually entered by toggling the keyboard to the Greek alphabet. The diacritical marks a manually entered as I go along ἄ is α + ALT + 0787 + ALT + 0769.
a. ALT + 0787 is ̓
b. ALT + 0788 is ̔
c. ALT + 0769 is ́
d. ALT + 0768 is ̀
e. ALT + 8125 is ᾽
f. ῖ is ALT + 8150 using the Greek keyboard.
g. ῦ is ALT + 8166
h. ῆ is ALT + 8134
i. Other Greek characters can be found on the site for Greek and Coptic (http://unicode.org/charts/PDF/U0370.pdf) and Greek Extended (http://unicode.org/charts/PDF/U1F00.pdf)
j. Some unusual Greek characters may be found on Greek Letter Combinations (http://www.fordham.edu/halsall/ikon/greekligs.html); Greek Abbreviations (http://www.fordham.edu/halsall/ikon/greekabb.html)
To check to be sure the document is well-formed. Microsoft FrontPage program allows an edit of an xml document. It detects invalid codes e.g. lack of back-slash e.g. instead of ; or instead of ; missing paragraph endings or failure to close a division with , , etc. The same kind of edit can also be done using Microsoft Development Environment [Design] which is included in Microsoft Visual Studio.NET. I use this program only occasionally.
SPELL-CHECKING. After completion of a document, the spell-check routing is run. (as an aside, I also keep the spell check active while I am editing to help quick detection of a problem). The spell check will not catch various errors in punctuation, an incorrect space before or after a parenthesis, etc.
Sometimes the OCR interprets the letter I as 1 or ! as an I. The only way to detect these is by a search and replace.
DEALING WITH BADLY GARBLED TEXT FILES IN A DOCUMENT:
Frequently another version of the corrupted page may be found in another edition of the book in Google Books Full View (http://books.google.com/books?um=1&q=&btnG=Search+Books) or the Web Archive (http://www.archive.org/advancedsearch.php)
PAGE IMAGES FOR A BOOK.
Often times, the books I work with are taken from the Web Archive. The page images are derived from the .PDF copy of the particular book. The .PDF file can be edited with Adobe Acrobat Profession deleting unwanted pages and renumbering the image page numbering and creating a PNG file of the images.
I take books less frequently from the Google Book files because there are a higher frequency of these page images being incomplete or sometimes badly damaged. Google files need to be run through an OCR program to get a text file, unless you are willing to copy and download the text files that Google now provides along with their .pdf files.
DEALING WITH OLD DOCUMENTS WITH ORTHOGRAPHY WHICH NEED TO BE MODERNIZED
The only way to deal with is by find and replace -- usually you will find letter combinations which can be easily changed in this way, but experience in recognizing how OCR programs mistake reading helps.