to Home Page

Ten Music Lessons for Authors of Text Databases

by Urs APP


Abstract

This article (which is also published in the Electronic Bodhidharma No. 3) takes up some issues related to the "recording" of text and emphasizes the importance of, among other things, distinguishing between master data and user data.


The era of digital recording and reproduction has barely begun, yet it has taken the world by storm. In particular, two forms of digital recording and reproduction have broken all records: digital text and digital music. Digital text is created (ÒrecordedÓ) by typing into a computer, and it can be reproduced in various ways: on a computer screen, through a printer, by telephone line, on a floppy or hard disk, on a CD-ROM disk, etc. Today, there are an estimated 300 million personal computers in use all over the world, and the market for software that helps creating and handling digital text is booming.

Digital music recording is even younger than digital text input: the first such recordings were made at the beginning of the eighties, and digital music reproduction machines (CD players) became available just about ten years ago. However, the digital music boom may be one of the greatest information revolutions in history: within ten years, well over a billion CD players have been sold, and readers may figure out the total number of pressed CDs by looking at their own collection and adding eight to nine ze roes.

The recording and handling of sound may seem quite different from that of text. However, there are enough similarities to make one stop and think. Indeed, I think that creators of large text databases had better study the music business and learn a few lessons.

  1. Lesson number one: standardization pays off. Take any CD from anywhere, say Japan, put it into your player, and listen to the music. Now insert a computer disk from Japan into your computer and see what happens. The message is clear: we need computers and system software that can handle any language. What we have at present is, in musical terms, like a CD player which takes only disks from certain countries and plays only music by certain composers. Unicode, and even more so ISO 10646, will be a firs t step in the right direction: standard computer codes for all languages of the world. Then we will need computers that read Unicode data, system software that doesnÕt choke on it, and application programs that handle Unicode data. Still a long way to go.

  2. Lesson number two: ease of use pays off. Look what happened to all the LPs in just ten years! Just press a numbered button and relax: gone are the anxious moments of watching a sharp needle hovering over a scratchable plastic surface. The functions are simple: play, pause, stop, repeat, and so on. Players with too many functions do not sell well, and one should be able to use a player without first having to read a book. Computers are catching up, particularly Apple computers and now Windows systems in AppleÕs wake. But hard- and software manufacturers still have a long way to go before we can dispense of the userÕs manuals.

  3. Lesson number three: quality pays off. What music lovers want is good music, well recorded and well reproduced. They insist on having the whole audible range of frequencies covered. Compare that to the standard Japanese, Korean, or Chinese character codes which exclude input and reproduction of some percent of any classical text and distort the rest in various ways (for example by ignoring variant forms of characters).

  4. Lesson number four: professional work pays off. One does not get good quality by chance but by employing a variety of specialists who know their trade, have them use good equipment, and make them work together. In contrast to the database business, the tasks are well defined: composer, performing artist, musical director, sound engineer (Tonmeister), recording engineers, music editors, mass production specialists, marketing people, distributors, etc. As a matter of course, the recording engineer reco rds, the editor edits, etc. In database projects it still is common to have recording engineers (computer experts) meddle in matters of data content, scope, and structure. Often, computer experts make decisions about databases whose language they cannot even read. Data production is in general still an amateur business, pursued with little planning and foresight. For example, Sony had musical engineers edit the data for the K™jien dictionary on CD-ROM. They had no idea how to deal with text data. The result : if you look for ÒJapanÓ and ÒreligionÓ in a combined search of this database, you get only five entries of which three have no connection. One of them is ÒItalyÓ (because Italy was an ally of ÒJapanÓ in the second World War and is a center of the catholic ÒreligionÓ).

  5. Lesson number five: Distinguishing between master recordings and consumer products pays off. For the master recordings, music companies use equipment of the highest quality, regardless of the consumer format. The master recordings are designed to be used for many decades or even centuries, and their quality should be so high that they can be used in a variety of formats in the present and future. For example, good master tapes allowed these companies to transfer many old recordings to CDs, a medium they never even dreamed of at the time of those recordings. In contrast, text data producers often think only of the hardware- and software environment they are currently using. It is as if a sound engineer reasoned: ÒMost users use Walkmans, therefore I make the master tape on a Walkman.Ó In the case of Far Eastern text data, many data producers use the present insufficient character codes, mix full and simplified forms, and perform various other tricks which will render much of the text data useless in a few years or decades. Master data of Chinese data, for example, must be automatically convertible to present national character codes as well as Unicode and future, even larger codes. Furthermore, they must be convertible for use on any hardware and software platform, present and future.

  6. Lesson number six: Careful planning pays off. Musical recordings are carefully planned, from the edition of the score used in performing to the final distribution. Distinguishing clearly between the various production stages (planning, negotiating, recording, editing, mastering, end product pressing, packaging, distributing, etc.) is beneficial. In text database production, planning is often haphazard and shortsighted, recording (input) is confused with editing, master data are mixed up with the end product, data are produced in formats few users can handle, etc.

  7. Lesson number seven: Recording and editing are two different things. Music companies take pride in recording exactly what is played, and every detail of it; they go to great lengths to assure that their recording equipment meets the highest quality standards. It is not the job of the recording engineer to filter out information; that may later be done during editing, which is performed by another specialist versed both in musical and technical matters. Applied to the input of Chinese text data, this means: people who input text should not have to make decisions whether a character they see in the printed original corresponds to a slightly differently shaped (or simplified) character on screen. They should input what is printed, and one must make sure that they can do so. If decisions are needed, they should be made by a data editor who is versed both in scholarly and technical matters. Leaving editing decisions to recording personnel or technicians is a sure way to bad text data at the master level.

  8. Lesson number eight: Good data editing is crucial, but one must hold on to the originals. Music companies adopt various measures to insure that the initial recording tapes are of the highest quality and easy to edit. Using multi-channel recording and various other strategies, they strive to give the master tape editor the greatest possible amount of error-free information in a flexible format. It is up to the editor to decide what information he then makes use of in creating the master tape, but the original recording tapes are always stored and can on demand be mixed and edited anew. In terms of text data, I know of cases where huge amounts of master data were edited and ÒcorrectedÓ by an expert without leaving any record of what was changed. If another expert does not agree with these changes, he or she may have to compare every word of the printed original with the edited data, a task which no machine can do and which could easily take several years. In terms of Chinese text data, the basic input d ata must be thoroughly corrected in order to correspond to the printed original. All differences to the printed text must thereafter be thoroughly documented, and at important junctures of the editing process complete data backups must be made and kept for future reference.

  9. Lesson number nine: Good documentation and consideration of the userÕs needs at the planning and production stage pay off. Music CDs contain useful tags for beginnings of movements etc., allowing the user direct access to the pieces of music he or she wants to hear. The documentation delivered with the CD should inform the user about crucial aspects of the recording: composer, score, performers, instruments, recording data, names of responsible engineers, etc. Text databases should also be delivered with detailed printed information, and the data should be well structured and easy to use. For example, the user should be able to search only the parts of the disk he or she is interested in and to limit the scope of a search; all too often, the search programs allow only Òall or nothingÓ approaches.

  10. Lesson number ten: New technology is not a threat but a boon. Digital reproduction of music does not obliterate concerts; on the contrary, the concert business could hardly survive without recordings. Similarly, digital publications do not threaten book publishing: quite the contrary, as several surveys prove. People who use text databases for searching want to own the printed versions, too; they can mark them up, write in the margins, and want to see the context of the words that they can now find s o easily and rapidly. Music recording protection schemes (such as DAT copy blocking mechanisms, encrypted digital broadcasts, etc.) have for the most part failed. Music companies found out quickly that reasonably priced and high quality CDs are the best weapon against unauthorized copying.

Author:Urs APP
Last updated: 95/05/03