to Home Page
The Taipei meeting of the Electronic Buddhist Text Initiative
Impressions by Michel Mohr
Photo of the meeting with Master Hsing Yun of Fo Kuang shan (front row center).
Front row left: Prof. Lewis Lancaster (U.C. Berkeley), Ven. Jonglim (Haeinsa).
Front row right: Ven. Aggasamy (Burma), Prof. Urs App (Hanazono U.).
The 1996 meeting of the Electronic Buddhist Text Initiative took place in Taipei between April 5 and April 9, with a wide participation from the five continents. What follows consists of short reports of the presentations and of outlines of some of the issues that were discussed. Since these are impressions rather than a comprehensive report, the author's interests play an important role.
The meeting was perfectly organized thanks to the support of the Fo Kuang Shan Foundation for Buddhist Culture and Education, with the warm backup of the whole Fo Kuang Shan community. Participants could find no words to thank the members of this community for their minute care of details, wonderful food, and unyielding enthusiasm. The overall success of this meeting was also the fruit of the commitment of the chairman, Professor Lewis Lancaster, and the EBTI coordinators who closely collaborated with our Taiwanese hosts. E-mail or fax addresses of participants are listed on the EBTI Alaska site.
The tight daily schedule of the conference, every day from 9:00 A.M. to 9:00 P.M., involved the discussion of six main topics, presented from different standpoints, area of specialization, or geographical area:
1. Dictionary Projects
Session I "Fo Kuang Shan : Chinese Dictionary Project", Ven. Hui Ray and Dr. Li-Pei Chen (Taiwan)
This session presented one aspect of the accomplishments of the Tripitaka Computerization Department of the Fo Kuang Shan Foundation for Buddhist Culture & Education. The 22,608 entries of the dictionary published by the same foundation in 1988 will be available on disk, with an impressive search engine. The demo-version presented needs Chinese Windows 95, but the wish list of the authors includes making these data available on other platforms or at least beyond this specific version of Windows. Future improvements should include the possibility to generate documents in HTML format.
At this stage, Sanskrit diacritics included in the dictionary are displayed following conventions proper to the Fo Kuang Shan dictionary's electronic text, and appear in the presented version without the actual diacritic marks. One might suggest further enhancement of this huge work by enriching the electronic text with more references to other materials which contains similar information or from which some entries derive, and in particular the page numbers of the Taisho edition of the Canon. At this point, the strategy used for dealing with missing characters is circumscribed to a specific environment on a specific platform. The portability of the data could therefore be greatly improved by the adoption of standardized approaches proposed by other EBTI participants.
Session II "Buddhist Dictionary Project: Web-based CJK-English Buddhist Dictionary Project", Prof. Charles Muller, Toyo Gakuen University (Japan)
In his presentation, Charles Muller discussed his motivations for undertaking this project of a dictionary providing access to Buddhist technical terms in Chinese-Japanese-Korean. The present version of this dictionary is available on Dr. Muller's Web site
The author, who has undertaken this work alone, recently added value to his dictionary by supplying some SGML (Standardized Generalized Markup Language) and HTML (Hyper Text Markup Language) tagging of its entries. Further broadening of this project will depend on the collaboration of other researchers or organizations. Individual contributions to this project, which aims at "turning into a fully 'open' work," are not only welcomed, but will also be acknowledged in each entry provided. Here also, scholars would certainly welcome additional information on the sources used for each entry of the dictionary.
Session III "Tibetan Dictionary Project", Leigh Brasington (Nepal-U.S.A)
This was the first public demonstration of an electronic Tibetan-English dictionary which will soon be available on the market. A fast search engine written in C makes the 81,950 entries of this tool readily available for Windows users. Although some difficulties remain when sorting the Tibetan composite alphabet, this contribution by a professional programmer demonstrates that the challenge of one of the most complex scriptures can be met with success when specialists are sufficiently motivated.
One of the important issues linked with dictionary projects is that of fonts. Available standards are still wanted, despite efforts made in this direction by the proponents of Unicode, which have not been implemented yet as a visible set of fonts. With the booming tendency of data distribution over the Web, choices in the short range have to be made, and scholars cannot wait for commercial companies to produce adequate soft and hardware. In this regard, the promise of Java has been highlighted several times by EBTI participants. The ability for the provider of information to define fonts seems especially promising.
2. Library Resources and Services
***Unscheduled Session***
An addendum to the program was made at the last minute, with a presentation by Mr. Gunner Mikkelsen, from the East Asian Studies Department of Aarhus University (Denmark), "An Electronic Corpus of Manichaean Texts"
He reported that the universities of Leuven (B), Lund (S) and Warwick (GB) have undertaken three main projects, which include: 1. The constitution of an Electronic Library of Manichaean Texts (with wordlists); 2. A dictionary of Manichaean terms and concepts (based upon the wordlists in 1,); 3. Corpus Fontium Manichaeorum (CFM), a collection of printed editions of the Manichaean texts providing these texts in their original fonts, standard transcriptions, translations, commentaries, and word indices.
Those who have dealt with Central Asia know the deep links existing between some Manichaean and Buddhist texts. The building of the electronic library and the CFM implies retrieval of texts in several different languages, which are organized in eight sections (Latin, Greek, Coptic, Syriac, Arabic, Middle Iranian, Chinese, and Old Turkish). In passing, the Middle Iranian section includes Middle Persian proper, Parthian and Sogdian. Like in the case of Buddhism, research coping with such a diversity of languages and geographical areas necessarily involves a comparative dimension. This project emerged in 1987, at the first International Conference on Manichaean Texts held in Lund (Sweden). Presently, more than 90% of all Manichaean data have been input. The project was confronted with several difficulties, like adapting diacritics for Middle Iranian texts, but it is reported as progressing smoothly. The Corpus Fontium Manichaeorum is published at Brepols (B), which has a long tradition of publishing collections of religious texts, like the Corpus Christianorum. This project will result in the production of about 60 volumes over 25 years, including critical texts, translations and commentaries.
Further information on this project can be obtained from Mr. Gunner Mikkelsen (OSTGM@hum.aau.dk), or by looking at the Homepage of the International Association of Manichaean Studies.
(N.B.: because of this unscheduled session, the Session IV has been moved after Session VIII)
Session V "The International Dunhuang Project at the British Library", Dr. Susan Whitfield, The British Library (England)
This session presented a wide project implying the collaboration of several countries. The scattering of Dunhuang manuscripts in different libraries has always been a major obstacle for researchers. While the British Library owns some 28,000 items coming from Dunhuang, most of them Buddhist canonical texts, the total number of Dunhuang texts and fragments kept in world collections is estimated at over 100,000 items. The collapse of the former Soviet Union has finally allowed for extensive cooperation with the Library of Saint Petersburg. A first meeting between representatives of libraries owning Dunhuang materials took place in October 1993, laying the basis for the International Dunhuang Project, which is now becoming widely known. Details can be obtained directly from the Internet Site of the Dunhuang Project
This project is extremely promising also in the sense that it aims at overcoming the dichotomy between those who look at manuscripts as items to be preserved (conservators) and the researchers who look at them as texts to read and interpret. The existing database, designed on the Macintosh using the 4th Dimension relational database, showed how bibliographical information can be combined with graphic images. High resolution images should allow researchers to access the original texts without endangering the fragile papers on which they are recorded. Deciphering of these texts and full annotation of their content will at a later stage be supplemented by scholars belonging to different areas of specialization. Due credit will be given to all those who provide information. The purpose of the project implies availability of data on a cross-platform basis, but here again fonts cause problems.
3. On-line and Network Data
Session VI "Coombs Computing Unit - Electronic Buddhist Archives and the Buddhist Studies WWW Virtual Library projects", Dr. T. M. Ciolek, Australian National University (Australia)
This presentation was given by the Resources Administrator of the world's largest on-line collection of Asian studies data. During his talk, Dr. Ciolek revealed some of the strategies he applied in building and administering his site. He emphasized in particular: 1. The importance of taking the initiative; 2. The necessity of starting with small projects and gradual addition of new modules; 3. Control of integrity and quality of data provided; 4. The need for cooperation and collaboration of concerned individuals.
In his philosophical envisioning of the continuously evolving fluent mass of information carried on the Web, Dr. Ciolek noticed with humor that "we have to live with a mess" and that "we can only minimize the trouble." The Coombs Computing Unit is a team of seven persons who assume the task of maintaining an impressive amount of databases, mailing lists, and the Electronic Buddhist Archives (FTP site). Lessons learned from this work between 1991 and 1996 were presented, including notes on publishing scholarly information on networks and practical recommendations for writing pointers in WWW documents. The handout is available on the WWW. This presentation gave invaluable advice for providing electronic information aimed at both at a general and scholarly public while fighting against the tendency toward "multi-media mediocrity."
Session VII "Global Interactive Cyber Sangha", Gary Ray, CyberSangha On-line (USA)
Mr. Ray's presentation provided a glimpse of the world of Buddhist practitioners in California. Some surprising data were revealed. For instance, believe it or not, according to Gary Ray, 80% of E-mail in the world originates in California. Looking at the past, present and future of the Net, the editor of "Cybersangha. The Buddhist Alternative Journal" noticed tendencies that have emerged since 1992 when there were only about 200 or 300 people using this medium to communicate. According to Mr. Ray, the recent propensity to migrate from electronic bulletin boards to Web sites has for instance had the consequence of slowing down the production of new information. People concentrate on embellishing their information by adding "nice images." One of the main issues for networkers seems now to be the "noise problem," in other words meaningless polemics. According to Gary Ray, this phenomenon shows that the wired community has reached "critical mass" and has grown out of control. Reports of people doing interviews with teachers through E-mail and the statement that "we are but 'nodes'" led to some discussion about the possible drawbacks of this kind of "disembodiment."
Session VIII Prof. Heng Ching Shih, Center for Buddhist Studies at National Taiwan University (Taiwan)
This very impressive WWW site at National Taiwan University was presented for the first time to a general audience. Focusing on Buddhist resources, this is one of the rare Taiwanese sites available in English for those who cannot display Big5 codes or do not read Chinese.
Among the most salient features that were demonstrated, the searchable database of bibliographical information was eye-catching, already including over 40,000 references to both Chinese and Western publications. The fact that it lists contributions to periodicals, with a summary and key-words makes this Center one of the pioneering institutions to release this type of data. This will also benefit researchers tired of being exploited by commercial databases. Original Chinese sources are also provided, with over 200 sutras. The educational purpose of this site is underlined by its corner for learning Sanskrit and Pali on-line. Pronunciation is available when clicking on audio data, while diacritics are displayed using invisible graphics.
Session IV "Buddhist Databases and Library Services", Prof. John Lehman, University of Alaska (USA), and Mr. Howie X. Lan, Instructional Technology Program, University of California, Berkeley
This presentation by computer professionals focused on the latest possibilities offered to overcome present technological limitations. It started from the empirical observation that "people are never patient enough to wait for the right solution." Nevertheless, the fact that most fonts, database programs, and search engines are still incompatible does not appear a fatality anymore. From the perspective of what computer futurologists call "long-term" (roughly 3 years), the possibility of getting different systems with a common interface starts emerging. The fast dissemination of browsers, such as Netscape, allows the optimistic expectation that some kind of viewer will soon enable us to overcome localized system peculiarities. In short, this type of middleware, which might be Netscape version nnn or Java, could "sit in between different systems, and make them work together." This type of software is already effective in libraries. For being prepared to such shift, the essential requirement for the providers of information is to separate the data from their systems. Integration is going to be the word, and platform-independent information its correlate. The current model is still based on the distribution of CD-ROM, but the network model has to develop. The authors of this presentation concluded by suggesting data publishers to consider using Java.
4. Canonic and Text Databases
Section I Chinese
Session IX "Koryo Canon Project of Hae-in Monastery", Ven. Chongnim, Mr. O Tai He Daejanggyong Research Institute, Prof. Lee Kyoo Kap, Chungnam National University (Korea)
Rather than speaking of a mere project, the representatives of the Koryo Canon Project presented their first CD-ROM that had been released shortly before the EBTI meeting. This is the first electronic version of a Buddhist Canon in Chinese characters. A lot of care has been taken in reproducing the Hae-in sa woodblock edition in vertical format, designing nice fonts which look close to the original, and keeping some variant character forms. At this stage, however, the Tripitaka Koreana is not searchable, because its contents go far beyond the limits of Korean national codes and each page has the form of one file. It appears thus to serve presently as a work of art, preserving this national treasure and making it available to a wide audience. Scholars, of course, will be eagerly waiting for a searchable release, which should also be made available in other codes than the Korean one. One of the foremost applications of having a full searchable set of the Tripitaka Koreana would be to enable comparison with other Chinese and Japanese versions of the same texts, which is one of the goals expressed in the handout written by Ven. Jonglim. The endeavor to make data available "to be used in all the world" deserves to be supported. The EBTI also encourages the team working with the Daejanggyong Research Institute to consider collaborating more actively with colleagues outside Korea.
Session X"Three Phases of Markup and Division of Labor", Prof. Urs App, International Research Institute for Zen Buddhism, Hanazono University, Kyoto (Japan)
During the last session of the day, Urs App presented first a demonstration of virtual reality using Quicktime VR applied to Japanese Zen monastery gardens. The central part of his presentation was devoted to illustrating what he calls "the real world of electronic text markup." Instead of expecting the scholar to do everything, Dr. App proposed a form of labor sharing. Tagging of Buddhist text data could thus take place in three distinct phases: 1. Basic structural markup (data administrator or scholar using an ordinary word processing program); 2. Content markup by scholars using a customizable tagging and reference toolset (such as the Hypercard tool Dr. App demonstrated); and 3. Overall SGML markup by data specialists using dedicated SGML editors.
This division of labor adopts a flexible approach to markup, letting each group work in environments and with tools appropriate for the task at hand. Scholars need to do real markup in order to come up with a sensible "Buddhist Document Type Definition" (DTD), not the other way around -- and they need tools to do this without becoming computer experts. The demonstration showed how a Hypercard tool can serve scholars to apply such a flexible strategy, allowing for easy customization and gradual addition of new tags depending on content and context.
This demonstration was complemented by a presentation of the KanjiBase approach for dealing with missing Chinese characters, displayed by Christian Wittern on an IBM in the Windows environment.
Section II Pali
Session XI "BUDSIR V, Pali Text Project", Prof. Supachai Tangwongsan, Dr. Damras Wongsawang, Mahidol University Computing Center (Thailand)
The second section dealing with Canonic databases presented the latest version of the Pali Canon, which is already available for purchase through the American Academy of Religions or at Mahidol University. Release of BUDSIR version 5 running on Windows is planned within a few months, while the first version was already completed in 1988. The latest improvements include a Thai translation of the texts, and the possibility to display the scriptures in Romanized Pali, Devanagari alphabet, or simultaneously in its three different renderings. A sound feature has also been added, making it possible to hear the reading of the text while looking at its contents. Speed and accuracy of the search is remarkable, but scholars reiterated their additional wish to get the references to the existing edition of the Canon by the Pali Text Society inserted in the data. The achievements of this pioneering work and its improvements over a long period of time will certainly benefit other projects. Finally, the commitment by Prof. Supachai Tangwongsan to make these resources Unicode compatible, and eventually to release them on the Net if enough financial support is found, have been warmly applauded.
Session XII "Vipassana Research Institute: Pali Tipitaka Project", Prof. Ravindra Panth, Director, Mr. Frank Snow (India)
The Vipassana Research Institute (V.R.I.) has been committed, since its creation in 1985, to publish and translate Pali materials, to facilitate access to those sources. Its fundamental purpose being "to conduct research related to Vipassana," it has maintained close relations with the Burmese tradition, seeking to revitalize the Buddhist tradition in India. This undertaking has been inspired by the work of S. N. Goenka. Computer input began with the objective of "publishing the entire Pali Canon and allied commentarial literature, based on the Chattha Sangayana version in Myanmese (Burmese) script." Today, a large series of texts in Devanagari and Roman script has already been published. The final corpus will also include unpublished palm leaf manuscripts, such as different Jataka collections. A CD-ROM release of these data is planned, which will include the Pali Text Society references. The input phase of this project is reportedly almost completed.
(***Session XIII "Sri Lankan Pali Project", Ven. Dhammavihari of Sri Lanka, was cancelled due to visa difficulties***)
Session XIV"Sixth Council Pali Canon", Ven. Aggasami Tranminh Tai (Myanmar/Burma), Mr. Son Tu (USA)
In the same area of Pali texts, the Burmese Buddhist authorities, in collaboration with practicing communities abroad, have decided to input the Canon that was agreed upon during the Sixth Council, held in 1954 in Yangon (Rangoon). This project aims simultaneously at preserving the Burmese tradition and at "making these texts accessible to people who cannot read Burmese characters." The targeted audience will be first monks and Pali scholars, then practicing lay-people. This project involves input of 118 volumes, 50 out of which have already been typed. The end of this first phase is planned for March 1997. The search engine used to retrieve data will be based on the Delphi package, also employed in the BUDSIR project. Since this project is supported by donations, it will need support from various areas, including contribution of hardware, which is still largely insufficient in Burma.
To this point, some of the issues that started emerging can be summarized by two remarks. First, proper markup of data in the early phase of a project can later spare lots of trouble. Second, releasing data in the form of CD-ROMs or on the Internet takes its full meaning when these data start to be related to other information.
Section III Sanskrit
Session XV"University of California Sanskrit Project", Prof. Lewis Lancaster, Mr. Yao-ming Tsai, University of California, Berkeley (USA)
Problems encountered when handling Sanskrit materials are slightly different from those of other languages, in the sense that today no Buddhist practicing community uses Sanskrit. The consequence of this situation is that scholars have to assume the input of texts without external support. The study of Gilgit manuscripts, the oldest remaining texts, involve different scripts besides Devanagari. Some of the technical issues related to this peculiarity still have to be resolved. The strategy presented for teaching and learning various scripts makes it possible to become familiar with a new alphabet in its context, without having the eye go from the character chart to the text. Scanned images of the original texts have been pasted as graphics into Word documents. After deciphering these texts, alphabet charts are generated, which present both the handwritten character and its Romanized equivalent. Available solutions for the Sanskrit diacritics are still far from being satisfactory, and this presentation emphasized the need to reject commercial applications that use hardware-locks for their products. Some 20,000 pages of Sanskrit manuscripts have already been input. Gilgit manuscripts are often in poor condition and this work will facilitate the verification of some of Prof. Dutt's early assumptions. Software demonstrations were presented both on the Macintosh and the IBM platforms.
Session XVI"Electronic Prospects for Preservation of Buryat-Mongolian Buddhist Written Heritage", Dr. Tom Rabdonov (Buryat Republic), Prof. David Blundell, National Taiwan University (Taiwan)
This session revealed the existence of a huge library of Buddhist manuscripts, mainly in Tibetan and Mongolian, kept in Ulan-Ude. The Buryat Institute of Social Sciences has been able to preserve one of the richest collections, despite the years of "atheistic propaganda." It includes several thousands of volumes, mainly dating from the beginning of the eighteenth century to the first half of the twentieth century. To preserve these materials and promote their availability for researchers, a working group of scholars and computer specialists has been created. Their project, called "Survival and Researching of the Mongolian Spiritual Heritage" is born, but urgently needs funding. An exhibition of some of the Buryat treasures is in preparation at the Museum of National History in Taipei. Other planned activities of the Buryat Institute of Social Sciences include the organization between June 21-33, 1996, of a symposium on "Central Asian Shamanism: philosophical, historic, cultural, religious and ecological aspects," in Ulan-Ude. Correspondence regarding this symposium should be addressed to:
Organizing Committee Russia, 670042, Buryatia, Ulan-Ude, Sakhyanova str. 6, BION
E-mail: burnc/bion@ulan.rosmail.com
Fax: 7-301-22-63244 box 057
Tel: 36625, 33354, 30603, 32251
Section IV Tibetan
Session XVII "Automatic Optical Recognition of Tibetan Script with Computer", Prof. Masami Kojima, Tohoku Institute of Technology (Japan)
This session presented the latest results obtained by scanning Tibetan printed texts. The hardware used is presently centered on the Japanese NEC 9801 series environment, with an Epson GT-6000 scanner. The OCR algorithms that have been developed allow a high rate of recognition, by setting aside four groups of characters having similar shapes. The scanning results after segmentation eliminate the "noise" (like dust on the paper), identified by its small number of dots. Technical aspects of the "thinning" used in the OCR process were briefly discussed, as was the need to test this kind of technology with prints that are not perfectly neat and regular.
Session XVIII Asian Classics Input Project, Mr. Robert Chilton (USA)
Presentation of the approach followed by the Asian Classics Input Project (ACIP) has clearly showed the broad nature of its implications. The choice of using low cost technology, mainly second-hand computers equipped only with floppy drives, has permitted to localize the input work. Collaboration with the Sera monastery in India, housing almost 3000 monks, has led to exchanging data against support for the monastic university. This type of collaboration also benefited the monks, since their work for the ACIP familiarizes them with the Koti keyboard, a skill than can later be applied to other ventures. Thus, monks also get something from their collaboration other than small wages. The ACIP will soon produce its third release of a CD-ROM. Its policy of distributing its CDs for nominal cost to a wide public was applauded. One of the issues discussed at the end of the presentation was the appropriateness of releasing "raw data" that do not even include headers with version numbers. Some interpellants pointed out that within the context of the rising flood of data, this could lead to a nightmare for bibliographers and users alike.
Session XIX"Otani University Tibetan Project", Prof. Yoshiro Imaeda, Prof. Seiki Miyashita, Otani University (Japan)
Prof. Imaeda first retraced some of the events since the early days of his collaboration with Marcelle Lalou and Jacques Bacot in Paris, when electric typewriters where still a dream. The development of computers subsequently allowed the creation, in 1975, of the first Tibetan input program. Twenty years later, in 1995, Otani University released its Tibetan Language Kit for the Macintosh. Hope for improvement in the computer environment available for Tibetan is linked with a an old enigma. This enigma, which still puzzles Tibetan linguists as it did in the time of Marcelle Lalou and other pioneers, comes from the existence of unknown scripts in some Tibetan manuscripts. Unidentified syllables, which are not transliterations of other known languages, have been found in differents texts, leading to the hypothesis of a so-called "Nam language", which would have been used in Tibetan pre-Buddhist religious texts. Otani University, which has one of the world's largest collections of Tibetan texts, is presently collaborating with the ACIP for the cataloguing of Tibetan texts. It is hoped that a broad range of specialists will join in trying to face the challenge of the unknown Tibetan language.
Section V Vietnamese
Session XX"Vietnamese Buddhist Canon", Ven. Hanh-Tuan (Vietnam)
While many Buddhist traditions plan to preserve and disseminate their heritage in digital form, the case of Vietnam is particular in many respects. The Chinese Buddhist Canon that was introduced to Vietnam was the Northern Sung Tripitaka, but today few people outside the Buddhist clergy and scholars know classical Chinese. Now that all Buddhist congregations have gathered under the banner of the Unified Buddhist Church of Vietnam, agreement on the compilation of a new Canon has become somewhat easier. The objective of this project is double. It aims first at creating a Canon in the Vietnamese language, which will later also be released in electronic form, probably a CD-ROM. The lay Vietnamese people are the targeted audience, and completion of the 200 volumes of this collection will probably take several decades.
5. Teaching and Research
Session XXI"Huntington Archives: Buddhist Art", Ms. Janice Glowski, Ohio State University (USA)
During more than 25 years of field research, Susan and John Huntington have produced photographic archives that include about 300,000 color slides and black-and-white pictures. Discovery that the Nepalese government did not have any archives led to systematic surveys of Nepali Buddhist and Hindu sites. Up to 95% of these sites have now been photographed, which makes this collection an invaluable resource for the study of Nepali art, history, and religion. The pictures of the archives are scanned and classified in a 4th-Dimension database on Macintosh that also serves for teaching art history to university students. An introductory CD-ROM and a Web site provide them with graphic data and information that help them to understand how "art is text" and stimulate their curiosity for learning more about the images they browse. These archives are also related to John Huntington's work for the Encyclopedia of Buddhist Iconography. The Archive is open by appointment to scholars and students.
Session XXII"Lotus Sutra Project", Prof. Jamie Hubbard, Smith College (USA)
This presentation focused on the pedagogical use of multimedia for teaching an undergraduate course. The idea of this project is to produce a CD-ROM that enables students to approach Buddhist history and thought by looking at a single Mahayana scripture (the Lotus Sutra) through its evolution from India to Japan. An AAR panel organized in 1992 by Peter Gregory led Prof. Hubbard to conceive this project of a specific syllabus, which would somehow relieve the burden of teachers who have to handle photocopies. However, the actual work required by such a project appears considerable. The prototype partial version of what will become the CD-ROM is written in HTML and was displayed using Netscape. In passing, readers of this page might be interested in consulting Prof. Hubbard article related to buddhology and computer technology: "Upping the Ante: budstud@millenium.en.edu," Journal of the International Association of Buddhist Studies, Vol. 18 No. 2, 1995, pp. 309-322.
Session XXIII"Pacific Culture and Religion Project State", Prof. Thomas Price, California State University-Humboldt (USA)
The author presented the first version of his "one-man project" aimed at elementary and secondary school students in the California public schools. Images of Japan were taken and collected to build this multimedia application designed with Macromedia Director on a Macintosh. One startling result of testing this software with American primary school students was the realization that their interest in Haiku poems was sometimes greater than that in a flood of pictures. The next release of Director 5.0 should also allow development of this educative software on Windows platform.
6. Coding and Markup
Session XXIV"Markup of Chinese Buddhist Texts", Prof. Ching-chun Hsieh, Mr. Derming Juang, Academia Sinica (Taiwan)
This presentation was divided in two parts, one dealing with a type of markup in non-SGML format, and the other concerning strategies that can be adopted to deal with missing characters. The first part very clearly underlined the characteristics of markup, defined as "a process that formalizes the contents of a text into a settled form." Although markup designed for being interpreted by machines is recent, humans have been using tags for centuries. Since there are several different systems that allow markup, the coexistence of these systems is crucial. Prof. Hsieh emphasized the necessity for the system to have enough flexibility to accommodate the user's ability to understand (process) the text. The purpose of the markup is to establish links between knowledge structures and the documents.
The example presented applied this approach to the short Heart Sutra, with a tagging program written in Visual Basic. It served to show the Academia Sinica's approach, which is a tree-based mathematical approach, in contrast to the formal-language approach adopted in SGML. The similarity between both markup systems is, however, conspicuous. Prof. Hsieh acknowledged that in some regards SGML is more flexible than the Academia Sinica's system, but the consistency of the system makes a conversion feasible.
The second part of this presentation dealt with Chinese missing characters. The statement that "no one really knows how many Chinese characters exist" brought to the forefront one of the most problematic issues, the encoding of this language. An early survey by the Academia Sinica had collected 74,000 characters (without calligraphic forms), a number indicating that it is almost impossible to decide once and forever a complete set of characters. To circumscribe this problem, the Academia Sinica is currently building a glyph Database that allows to decompose characters according to the different elements that constitute them. This is but one of the many results of research conducted at the Academia Sinica, which is doubtlessly the most advanced institution in the field of computer science applied to Chinese documents. The list of texts that have been input by this institution already include many Chinese classics and many texts from the Chinese Buddhist Canon. Remarks that these resources "will hopefully be available on the Web" were enthusiastically received.
Session XXV"Coding and Kanji Issues: Minimal Markup and More -- Some Requirements for Public Texts", Christian Wittern, Goettingen University (Germany)
Discussing the markup of Chinese texts, Christian Wittern underlined the step forward represented by the emergence of standards like SGML or TEI (Text Encoding Initiative). Drawing on these standards, he emphasized the necessity of detailed headers and the encoding of peculiar structural features of Chinese texts. The "minimal" SGML markup he proposed does not require too much effort but greatly facilitates the processing and exchange of information. In contrast, content markup is a middle or long term endeavor. However, concerning the contents of texts, Mr. Wittern proposed some conventions that could be adopted to identify texts (ID-values for texts), persons or places. The example displayed showed the application of this markup to the mapping of a Chinese Chan text, implemented in SGML and displayed with Panorama by SoftQuad. Discussing the issue of implicit tagging versus explicit tagging, Christian Wittern mentioned an extension to SGML called HyTime, which allows to compile read-only data. In complement to the KanjiBase approach to missing Chinese characters, Christian Wittern proposed a simple set of entity references to be used in data exchange, to allow HTML-like description of diacritics. This list will be available on his Web site
EBTI Business Meeting
During the formal business meeting of the Electronic Buddhist Text Initiative, Prof. Lewis Lancaster and Prof. Ching-chun Hsieh were elected as co-chairs until the fall of 1997.
The EBTI coordinators were confirmed:
- Americas: John Lehman, University of Alaska Fairbanks
- Asia: Urs App, Hanazono University, Kyoto, Japan
- Europe: Christian Wittern, University of Göttingen, Germany.
Various issues concerning online communication were discussed. John Lehman agreed to continue administering the EBTI home page. Additional information is found in the EBTI section of the Electronic Bodhidharma WWW site.