漢籍リポジトリの基礎的研究 – Fundamental research of the Kanseki Repository
2016年度の概要
- 2016-04-26 研究班の趣旨について
- 2016-05-10 現在のwww.kanripo.orgの問題点
- 2016-05-24 テキストの複数のバージョンの共同編集
- 2016-07-26 DH2016国際人文情報学会の報告とマンドク・ワークショップの準備について
- 2016-09-26 逸文収集に必要な機能
- 2016-10-11 Franco Moretti, Distant reading と Maps, Graphs and Trees について
- 2016-10-25 研究助成金の申請書についての検討
- 2016-11-22 引用文の見付方(1):近い文書を近くに並ぶ
- 2016-12-13 引用文の見付方(2):bigram、接続確率と pointwise mutual information
- 2017-01-10 漢籍リポジトリ目録の追加採録について
- 2017-01-24 漢籍リポジトリにおける個人やグループによる注釈の取扱いについて
2017年度の概要
- 2017-04-26 昨年度のまとめと今後の課題
- 2017-05-09 TLSと漢籍リポジトリ、初期検討
- 2017-06-13 citfind on www.kanripo.org
- 2017-06-27 citfind(その2)、 論語の引用
- 本日の資料:
- 2017-07-11 論語のheatmap
- 2017-10-10 人文情報学の動向
- DH 2017 Montreal (Aug 8-11, 2017)
- Abstracts, Schedule
- Reeve, Jonathan Pearce; Terlunen, Milan; Eckert, Sierra: Frequently Cited Passages Across Time: New Methods for Studying the Critical Reception of Texts (poster, pdf)
- Tang, Muh-Chyun; Chen, Kuang-hua: A cross-language co-word network comparison of Buddhist literature in Digital Library and Museum of Buddhist study (SP, pdf)
- Chao, Anne Shen; Li, Qiwei: A New and Improved Method to Text-Mining in Chinese: Closer Language Segmentation in Detecting the Shifting Meaning of Patriotism (poster, pdf)
- IABS 2017 Toronto (Aug. 20-25, 2017)
- Programme
- Section 12 Information Technologies in Buddhist Studies
- McCrabb, Ian (University of Sydney): READ Workbench – A Collaborative Corpus Development Framework (cf: http://sydney.edu.au/arts/research/read/about/index.shtml)
- Nagasaki, Kiyonori (International Institute for Digital
Humanities): Possibilities of SAT Taishōzō Image DB through
IIF
- Panel 39: Radich, Michael (Victoria University of Wellington): New Computer-Assisted Techniques for Assessing Internal Evidence of Questions of Ascription in Chinese Buddhist Canonical Texts (cf: https://github.com/ajenhl/tacl)
- Zenodo
- JADH 2017 Kyoto (Sep. 11/12, 2017)
- Programme (Abstracts PDF)
- Keynote: Donald Sturgeon Collaboration at scale: emerging infrastructures for digital scholarship
- Poster DH research and teaching with digital library APIs
- Digital Sinology
- Jenjou Hung: CBETA Research Platform: A Digital Research Environment for Studying Chinese Buddhist Literature in the New Era (cf: http://cbetaonline.dila.edu.tw/en/)
- DH 2017 Montreal (Aug 8-11, 2017)
- 2017-10-24
- The Distant Reading of Religious Texts: A “Big Data” Approach to Mind-Body Concepts in Early China: (https://doi.org/10.1093/jaarel/lfw090)
- 2017-11-14
- 2017-11-28
- Donald Sturgeon: Unsupervised identification of text reuse in early Chinese literature (https://doi.org/10.1093/llc/fqx024)
- 2017-12-12
- Hàn diăn (汉典古籍) - CA corpus as published
- Hàndiăn 6 paragraphs per text
- AP News 88/90 for comparison:
- 2018-01-09
Kanseki test corpus:
- https://github.com/cwittern/kansekitm
- Notebook
- Hoffman et. al Online Learning for LDA, 2010 (base for gensim lda algorithm).
- Landauer, Dumais: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge (Pychological Review 1997:104:2, 211-240)
- https://github.com/cwittern/kansekitm
- 2018-01-23
- https://github.com/cwittern/kansekitm
- Notebooks
- Evaluation of topic model parameters:
- Repeated runs: visualization(pdf) analysis
- Tokenization: visualization(pdf) analysis
- Document length: visualization(pdf) analysis
- Number of topics: visualization(pdf) analysis
2018年度の概要
2018-04-24 「汉学数位基础建设研讨会-Conference on a Digital Foundation for Sinology」
(3月14日-16日於ハーバードセンター上海)
2018-05-08 前年度のまとめ、漢籍リポジトリの現状と今年度の予定
2018-05-22 漢籍リポジトリのアクセス方法
- 漢籍リポジトリのAPI: http://www.kanripo.org/api
- Python module で漢籍リポジトリのアクセス:https://github.com/mandoku/pykanripo
2018-06-12 台湾訪問報告
2018-06-26 TRCSS講演会
- 祝平次 (國立清華大學 中文系): The Digital Humanities in Taiwan: Past, Present and Future
2018-10-09
- JADS 2018
- Preconference workshop: Word Vector Applications for DH
- Keynotes
- Proceedings PDF (31MB)
- TEI 2018
- Kanseki Repository
2018-11-13
- GSD Global Smart Data
- 洪武正韻 @ 韻典網
- 漢籍リポジトリのロードマップ
- 目録
- XML 漢リポのTEI版?
- Annotation? Paul Schacht: Annotation
2018-11-27
2018-12-11
2019-01-08
2019-01-22 IFLA-LRM/TEI Bibliography modeling
2019年度の概要
2019-04-23 前年度のまとめ、漢籍リポジトリの現状とこれからの予定
2019-05-14 Sentence piece
- SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (Kudo&Richardson 2018, PDF)
- Sentencepiece : ニューラル言語処理向けトークナイザ
- Byte pair encoding (Sennrich et.al, Neural Machine Translation of Rare Words with Subword Units, PDF)
- Unigram model (T.Kudo, Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates)
- sentencepiece (Code)
2019-05-28 More sentence piece
2019-06-11 Sentence piece network analysis
- Description
- Summary tables
- Visualization of network : SVG PNG
- Network community detection
- Emmons et. al. (2016) Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
- Blondel et.al (2008) Fast unfolding of communities in large networks
2019-06-25 Sentencepiece vs ngram
2019-07-09 Analysis of the text corpus with sentencepiece
2019-10-08 漢籍リポジトリの更新に向けて
2019-11-12 漢籍リポジトリ XML 形式検討
- 参考:TEI,XML以外のもの
- XML & TEI
2019-11-26 CTS & Perseus
- CTS & Perseus
- Cataloging for a Billion Word Library of Greek and Latin
- The road to Perseus 5
- CITE Architecture
- The Canonical Text Services URN specification
- Applying the Canonical Text Services Model to the Coptic SCRIPTORIUM
- CapiTainS Software suite and guidelines for Citable Texts
- An ontology for Linked Ancient World Data
- (参考: The FRBR Bomb: Or How I Learned to Love the Catalog (2008) )
2020-01-14 Text Alignment Network
2020-01-28 XML format for commentaries
- 既存データの調査:
2020年度の概要
2020-05-12
- 今年度の予定
- 《漢學文典》(通称TLS)の新しい版:https://hxwd.org
2020-05-26
- 次世代漢籍リポジトリに向けて:https://github.com/kanripox
- 実例:說苑
- CollateX – Software for Collating Textual Sources
- 『十八史略』=> KR2b0041
2020-06-09
- 次世代漢籍リポジトリ(開発中):https://github.com/kanripox
- 実例:KR3a0007: 說苑
- Demo (PDF)
- 既存データの検討:
2020-06-23 Textual Communities & implementation of standoff markup
- 参考:
- Textual Communities
- Peter Robinson
- On texts, hierarchies, XML, JSON etc.(Humanist Discussion list): 117, 121 , 123, 125
- Creating and implementing an ontology of documents and texts
- Towards a Theory of Digital Editions
- The Concept of the Work in the Digital Age (published version)
- XStandoff (Sekimo Generic Format)
- MultiX: an XML based formalism to encode multi-structured documents
- SPEEDy
- The Codex - an Atlas of Relations
- 実例:KX2a0001 史記-
2020-07-14 The concept of work in digital texts
2020-10-13 Details of KanripoX format
- * 台灣中央研究院 數位人文研究平台 Digital Humanities Research Platform
- Manifest file for KanripoX:
- Examples:
- Laozi 老子:
- Github / Manifest
- Editions:
- CH8x3004 馬王堆漢墓帛書‧老子甲本 (漢學文典)
- CH8x3005 馬王堆漢墓帛書‧老子甲本卷後古佚書 (漢學文典)
- CH8x3006 馬王堆漢墓帛書‧老子乙本卷前古佚書 (漢學文典)
- CH8x3007 馬王堆漢墓帛書‧老子乙本 (漢學文典)
- KR5c0045 道德真經-戦國-老子 【正統道藏・涵芬樓版】]
- KR5c0045 道德真經-戦國-老子 【正統道藏・三家本】
- KR5c0046 道德經古本篇–傅奕 【正統道藏・涵芬樓版】
- KR5c0046 道德經古本篇–傅奕 【正統道藏・三家本】
- KR5c0065 道德真經註(一)-漢後期-河上公【正統道藏・涵芬樓版】
- KR5c0065 道德真經註(一)-漢後期-河上公【四部叢刊】
- KR5c0073 道德真經註(二)-魏-王弼 【正統道藏・涵芬樓版】
- KR5c0073 道德真經註(二)-魏-王弼 【正統道藏・三家本】
- KR5c0057 老子 (漢學文典)
- KR5c0073 道德真經註-魏-王弼 (漢學文典)
- 寒山詩(出《御定全唐詩》巻806)
- Manifest: 寒山詩
- Text: https://www.kanripo.org/text/KR4h0140/806
- Laozi 老子:
2020-10-27 KanripoX development
- TEITOK
- TLS-漢學文典 (新版) test
- Laozi 老子 - Github / Manifest
2020-11-24
- Japanese Buddhist Manuscripts (Gaétan Rappo)
2020-12-08 KanripoX development
- KanripoX file format schema files:
- Manifest: tagset documentation (Odd file)
- Token: tagset documentation (Odd file)
- Nexus: tagset documentation (Odd file)
- Example (最新版):
- Laozi 老子 - Github / Manifest
- Laozi on HXWD
2021-01-12 Updates to KanripoX files, report
- KanripoX schema file:
- KRX tagset documentation
- KRX odd file
- Report: Next steps for the Kanseki Repository
2021-01-26 Report
- KanripoX schema file:
- KRX updated tagset documentation
- KRX odd file (updated)
- Report: Next steps for the Kanseki Repository (updated)