An update to recent changes on this site.
A large update to vocabulary, now with a total of over 140,000 entries and over 1,300 named entities separately in the data/named_modern_entities.txt file. Most of the new vocabulary is from CC-CEDICT with some cross checking in various sources. Also added the 千字文 Thousand Character Classic.
A large update to vocabulary, now with a total of over 110,000 entries and over 700 named entities separately in the data/named_modern_entities.txt file.
A large update to vocabulary, now with a total of over 100,000 entries.
A new text tokenizer was introduced that scans both left to right and right to left then compares the two. The scan with the least number of terms will be selected. The tokenization method used is similar in principle to previously. It is a greedy method that scans a chunk of text using the dictionary to tokenize text selecting the set of tokens that gives the least number of tokens. Since it is greedy, it is not always optimal. Previously, this was done with only left to right scanning. The change introduced right to left scanning in addition. The left to right scan is compared to the right to left scan and the one with the least number of tokens wins. This rarely happens but it can be confusing for readers when it does. For example, consider the phrase
武裝的士兵 an armored soldier
The left to right tokenizer will select
The right left tokenizer will select
The right to left tokenizer is correct but it takes a human to figure that out. In this case the number of tokens is the same and the left will be returned, so the result is ambiguous. This is part of an ongoing effort to improve the accuracy of the tokenizer. More work is required to help users in ambiguous cases like this.
The vocabulary has been updated with new words, corrections, and references. Please let me know if you see any problems by emailing firstname.lastname@example.org.
2020-01-04: Added contained in with translation to headword pages.
2019-12-25: Added term splitting. Updated vocabulary.
2019-08-18: Added text Dream of the Red Chamber 紅樓夢. Updated vocabulary.
2019-08-10: Added page to Search for Chinese idioms. Updated vocabulary.
2019-07-07: Changed result presentation on the main page from tabular to list form for multi-word expressions.
2019-06-22: Changed user interface to Material Design Web, migrating from the previously used Material Design Lite, to be more mobile friendly. Vocabulary update. Added https.
2019-06-09: The Chinese Notes Chinese-English dictionary now contains over 2,000 idioms. Fixed a problem with the word frequency analysis for the corpus: analysis/corpus_analysis.html The summary word frequency analysis was broken for some time and is now restored. Updated the vocabulary for the embedded Chinese-English dictionary. New number of headwords is 90,300.
2018-09-03: Added highlighted snippets to full text search.
2018-08-11: Added a new feature to search text in document bodies within a specific collection.
2018-07-31: Added a new feature to search text in document bodies. Status is experimental for a period of testing.
2018-07-12: Added a large number of texts to build out a digitial library for classical and historic texts including the thirteen classics, twenty four histories, collections of poetry, and other texts, totalling over 39 million characters. See Texts for details.
2018-04-2: Did a major overhaul to the user interface, using Material Design. Besides the nicer look and feel, it includes dialogs for each Chinese word in texts as well as mouse over. This is much easier to look up word details than links to word definitions because it allows you to better keep your place when reading a text and is much more mobile friendly. Also, added new texts for The Book of Songs 《詩經》, The Classic of Tea 《茶經》, and Book of Documents 《尚書》.
2016-01-04: Changed word detail page to combine all senses in a single HTML page.
2015-12-23: Added many texts to the text collection, including The Analects of Confucius 《論語》, The Book of Rites 《禮記》, Zhuangzi 《莊子》, Records of the Grand Historian 《史記》, Shuo Wen Jie Zi 《說文解字》, and Commentaries on the Four Books by Zhu Xi 朱熹《四書章句集注》.
2015-09-25: Added Er Ya 《爾雅》, an early Chinese dictionary.
2015-09-07: Menu reorganization. Added Texts menu, merged Classics under Texts and Reference, merged Culture under Reference. Updated the Introduction to Literary Chinese pages.
2015-09-07: Started maintaining this page.