脾主什么| 拔罐颜色深浅代表什么| 大便的颜色代表什么| 大拇指疼痛什么原因引起的| 土阜念什么| er是什么元素| 豆腐有什么营养| 红豆生南国什么意思| 夏天吃姜有什么好处| 鬼代表什么数字| 肌酐是什么| 柠檬泡水喝有什么功效| 保鲜卡是什么原理纸片| 李晨的爷爷叫什么| 碱水是什么| 孕妇缺钙吃什么食物补充最快| 减胎对另一个胎儿有什么影响| 婆娑是什么意思| 女人吃什么补气血效果最好| 什么面什么方| 恩施玉露是什么茶| 特发性震颤是什么病| 心脏跳的慢吃什么好| 水洗棉是什么| 定坤丹适合什么人吃| 烧心吃什么马上能缓解| 0a是什么意思| 乳腺癌多发于什么年龄| 血常规检查能查出什么| 脂浊是什么意思| 北京佑安医院擅长什么| 豪情万丈什么意思| 鼠肚鸡肠是什么生肖| 为什么鸡蛋不能和牛奶一起吃| 筋膜炎用什么药好| 办理生育津贴需要什么资料| 拉血是什么原因| pd医学上是什么意思| 敏使朗是什么药| 什么的快乐| 胸闷挂什么科室| 什么前什么后| 胃胀吃什么好| 愤青是什么意思| 女人喝什么茶对身体好| 早谢是什么症状| 唐筛是检查什么的| 达菲是什么药| 轻生什么意思| 平舌音是什么| 梦见小葱是什么意思| 皮肤黑穿什么颜色好看| 最高的学历是什么| 女大十八变是什么意思| 太平天国为什么会失败| sb是什么元素符号| 朝鲜人一日三餐吃什么| 石榴代表什么生肖| 大校上面是什么军衔| 鸡胸肉炒什么菜好吃| 为什么都开头孢不开阿莫西林| 心衰竭吃什么药效果好| 褶是什么意思| 三尖瓣反流什么意思| 儿童身高矮小挂什么科| 朝是什么意思| 天蝎座和什么星座最配| 卵巢多囊症是什么原因造成| 炖肉放什么容易烂| 效果图是什么意思| 手指关节痛什么原因| 负面情绪是什么意思| 卫生院院长是什么级别| 什么叫盗汗| 物是人非是什么意思| 蓝光是什么| 一天当中什么时候最热| 3月份是什么季节| 白带豆腐渣状是什么原因造成的| 11.15是什么星座| 大便前面硬后面稀是什么原因| 贵人相助是什么意思| 星字五行属什么| 杨梅吃了有什么好处| 西红柿有什么营养| 序五行属什么| yair是什么牌子的空调| inf是什么意思| ACEI是什么药| 西咪替丁是什么药| 什么是上升星座| 为什么吃肉多反而瘦了| 保花保果用什么药最好| 惊弓之鸟是什么故事| 小狗拉肚子吃什么药| 前额白发是什么原因| 饭后烧心是什么原因引起的| 五月十七号是什么星座| 睡觉小腿抽筋是什么原因| 什么水果能马上通便| 备孕需要做什么| 郑和下西洋是什么朝代| 突然头晕是什么情况| 鼍是什么动物| 什么样的马| 菠菜不能与什么一起吃| def是什么意思| 孝敬是什么意思| 眉毛下方有痣代表什么| 白带变多是什么原因| 甘油三酯高有什么危害| 腿肿挂什么科| 小辣椒是什么意思| 观音坐莲什么意思| 金秘书为什么那样| mri是什么| 艾灸肚脐眼有什么好处| 看痣挂什么科| 鸡眼是什么原因引起的| 女人吃猪肝有什么好处| 于谦为什么加入国民党| 手上的线分别代表什么图解| 手麻木吃什么药| 脚面浮肿是什么原因| 晚上起夜尿多吃什么药| uu解脲脲原体阳性是什么意思| 吃什么降羊水最快| 手指月牙代表什么意思| 合欢树为什么叫鬼树| 占卜什么意思| 清宫和人流有什么区别| 扑尔敏的学名叫什么| 济州岛有什么好玩的| 钝感力什么意思| 转移灶是什么意思| 比萨斜塔为什么是斜的| 晚餐吃什么健康又营养| 黎明是什么时间| 鲜花什么| 排骨煮什么好吃| 什么油炒菜好吃又健康| 胃暖气是什么症状| 什么样的天安门| 什么眼镜框最轻最舒服| 知天命是什么年纪| 1983年五行属什么| 阴道发痒是什么原因| 石榴花什么时候开花| 什么是马克杯| 梦见抬死人是什么意思| 入殓师是做什么的| 乳清蛋白是什么| 五岳是什么意思| 仔是什么意思| premier是什么牌子| 内分泌失调是什么| 芈月传芈姝结局是什么| e m s是什么快递| 韭菜有什么功效| 心里空落落的是什么意思| 成手是什么意思| 杏林指什么| 唐氏筛查高风险是什么意思| 婢女是什么意思| 儿童咳嗽吃什么消炎药| 十二指肠霜斑样溃疡是什么意思| 小孩肚脐眼周围疼是什么原因| 笨拙是什么意思| hbv病毒是什么意思| 左腿酸痛是什么原因| 肺结节吃什么药能散结| 94年是什么命| 滑膜炎吃什么药好| 麾下什么意思| 抽烟肺疼是什么原因| 鹏字五行属什么| 做完核磁共振后需要注意什么| 肾在五行中属什么| 精囊在什么位置| 什么时候中秋节| 主动脉瓣退行性变是什么意思| 在岸人民币和离岸人民币什么意思| 运动后想吐是什么原因| 湿疹有什么特效药| 泰斗是什么意思| 威慑力是什么意思| 什么叫排卵期| 移徒什么意思| 糖尿病吃什么食物最好| 赤潮是什么意思| 宫颈囊肿是什么意思| 丙型肝炎病毒抗体阴性什么意思| 子宫病变有什么症状| 什么是av| 一个马一个尧读什么| 爱叶有什么作用和功效| 2004年出生属什么| 1940年出生属什么生肖| 亚甲炎是什么病| 产后复查都查什么| 牛筋草有什么作用| 腹泻什么意思| 什么茶解暑| 手麻吃什么药效果好| 明火是什么意思| 血漏是什么病| 男生下面叫什么| max是什么意思| OD是什么| 9月出生的是什么星座| 学兽医需要什么学历| 乙肝三项检查什么| 为什么我的眼里常含泪水| 柳枝什么的什么的| 喝蜂蜜水对身体有什么好处| 主动权是什么意思| 重阳节为什么要插茱萸| 血压低吃什么食物| alk是什么意思| 7.2是什么星座| 黄金桂是什么茶| 什么叫肠易激综合征| 惨不忍睹是什么意思| 黑鱼又叫什么鱼| 瑶五行属性是什么| 朗朗乾坤下一句是什么| 肚子疼用什么药好| 出圈是什么意思| 中国红是什么颜色| 后脑勺长白头发是什么原因| 减肥吃什么菜| 阳历7月15日是什么星座| 尽兴是什么意思| 米杏色是什么颜色| 男人阴囊潮湿吃什么药| 感冒喝什么药| 五月10号是什么星座| 皮脂腺囊肿用什么药膏| 勺子是什么意思| 正方形的纸能折什么| 狮子座什么性格| 咳嗽用什么药| 女人左手麻要注意什么| 脱俗是什么意思| 江苏龙虾盱眙读什么| 阙什么意思| 鱼腥草有什么用处| 芒果和什么不能一起吃| 停经吃什么能来月经| 长方脸适合什么样的发型| 视频是什么意思| 乳头经常痒是什么原因| 犬瘟热是什么症状| 长期拉肚子是怎么回事什么原因造成| 为什么北方人比南方人高| 为什么吃辣的就拉肚子| 为什么耳鸣一直不停| 芹菜榨汁有什么功效| 盲肠憩室是什么意思| 晚黄瓜什么时候种| 宫颈非典型鳞状细胞是什么意思| 齐多夫定片是治什么病的| 乳腺增生不能吃什么| 月经不来要吃什么药| 百度
Skip to content

A tool that aims at converting Taiwanese Hokkien sentences written in Chinese characters to phonetic words.

Notifications You must be signed in to change notification settings

IepIweidieng/common-tl

Repository files navigation

Common TL

Prerequisite

  • Python >= 3.8

Usage examples

Convert a sentence written in Chinese characters to Common TL

A dictionary composed of tab-separated values is required, see Dictionary Preparation.

>>> import ch2rm, ctl_dict
>>> dict_ = ctl_dict.DictSrc().add_dict_src('dict_example/Ch2TwRoman.txt', (ctl_dict.TL, ctl_dict.Word, ctl_dict.ETC)).create_dict()
>>> sentence = ch2rm.chinese_to_roman('你是誰?', dict_)
>>> print(sentence)
[[('dl', 'i2')], [('sc', 'i7')], [('tsc', 'ia5')]]
>>> print(' '.join('-'.join(''.join(syll) for syll in word) for word in sentence))
dli2 sci7 tscia5

Convert a sentence written in TL to Common TL

Punctuation and capitalization are ignored.

>>> import ch2rm
>>> sentence = ch2rm.phonetic_to_tl("Thài-khong p?ng-iú, lín hó! Lín tsia?h-pá bē? ū-?ng, tō-lai gún-tsia tsē--ooh.")
>>> print(sentence)
[[('th', 'ai3'), ('kh', 'oong1')], [('p', 'iong5'), ('', 'iu2')], [('dl', 'in2')], [('h', 'o2')], [('dl', 'in2')], [('tsc', 'iah8'), ('p', 'a2')], [('b', 'e7')], [('', 'u7'), ('', 'iong5')], [('t', 'o7'), ('dl', 'ai5')], [('g', 'un2'), ('tsc', 'ia1')], [('ts', 'e7'), ('', 'ooh0')]]
>>> print(' '.join('-'.join(''.join(syll) for syll in word) for word in sentence))
thai3-khoong1 piong5-iu2 dlin2 ho2 dlin2 tsciah8-pa2 be7 u7-iong5 to7-dlai5 gun2-tscia1 tse7-ooh0

Convert a string-form sentence in Bopomofo (Zhuyin) and TL into IPA or Common TL

String-form sentence means a Python list of string-form words. A string-form word is a Python list of a phonetic syllable. (Explained below)

Custom function definitions:

import ch2rm
import ctl_dict

def phonetic_sentence_to_ipa(sentence, phonetic=None):
     return [ch2rm.phonetic_word_to_ipa(word, phonetic=phonetic) for word in sentence]

def phonetic_sentence_to_ctl(sentence, phonetic=None):
     return [ch2rm.phonetic_word_to_tl(word, phonetic=phonetic) for word in sentence]

Execution (continued):

>>> TL = ctl_dict.TL
>>> Zhuyin = ctl_dict.Zhuyin
>>> sentence = [["ㄍㄢˇ", "ㄒㄧㄝˋ"], ["ㄍㄜˋ", "ㄨㄟˋ"], [TL("lang"), TL("kheh")]]
>>> #            感謝                 各位               人客
>>> phonetic_sentence_to_ipa(sentence, phonetic=Zhuyin)
[[('k', 'an03'), ('?', 'je04')], [('k', '?04'), ('', 'wei04')], [('?l', 'a?5'), ('k?', 'e?4')]]
>>> phonetic_sentence_to_ctl(sentence, phonetic=Zhuyin)
[[('k', 'an03'), ('s', 'ie04')], [('k', 'o04'), ('', 'uei04')], [('dl', 'ang5'), ('kh', 'eh4')]]

Term Definitions/Concepts Used in This Project

Terms/Concepts about Phonetic Notation Systems

Common TL (CTL) (as a phonetic transcription system)

A phonetic transcription system based on modified Taiwanese Romanization System (TL).

CTL uses only ASCII symbols and can be used to transcribe some Chinese languages and dialects other than Taiwanese Hokkien.

Currently applicable Chinese languages and dialects:

  • Taiwanese Hokkien
    • Chiang-chiu and Ch?an-chiu dialects are applicable.
  • Standard Mandarin
  • Taiwanese Hakka
    • Sixian, Hailu, Dabu, Raoping, Zhao'an, and Southern Sixian dialects are applicable.

The commonly used phonetic notations used by these languages and dialects can be converted into CTL and then be converted back without losing any information other than the following:

  • The citation tone of neutral tone syllables is lost for TL.
  • TL o vs. oo for variants of Taiwanese Hokkien where TL o is pronounced as [o] if the Southern transcription variant of CTL is used.

The strict form of CTL can also spells out non-phonemic features so that every grapheme at the same position within a syllable represents the same range of speech sound among all applicable language and dialects. This is the form used by this project.

CTL has two transcription variants as listed in the following table:

CTL Variant [?] [o]~[?] /?/
Southern o oo oo
Northern or o oo

A phonetic transcription system based on the Latin script.

IPA uses non-ASCII symbols and diacritics to transcribe speech sounds.

In this project, a modified form of IPA is used, where the tone is represented by a number instead of IPA tone letters.

CTL can be converted into the modified form of IPA and be converted back without losing any information.

Syllable Component Definition

  • Initial: Initial
  • Final: Medial, nucleus, coda, and tone number

Reference for Typical Linguistic Syllable Component Definitions

Syllable forms (forms to represent a syllable)

  • String form: A str or an instance of _Phonetic (a subclass of UserString; see below)
  • Pair form: An initial-final pair (see below)

Initial-final pair (pair-form syllable)

A pair of an initial string and a final string in phonetic notation which represents a syllable
I.e., (initial, final)

An initial-final pair can also be one of …

  • IPA pair, if the initial and final are both in IPA (phonemic) notation, except that the tone is represented by a number instead of IPA tone letters. (typing hint: phonetic.phonetic.IpaPair)
  • CTL pair, if the initial and final are both in CTL. (typing hint: phonetic.phonetic.CtlPair)

Example:

Word IPA IPA, pair form TL, pair form
phing
(Taiwanese Hokkien, Kaohsiung)
/p?i????/ ('p?', 'i??1') ('ph', 'ing1')

Word forms

  • Ordinarily written text
  • Phonetic string: A single string of the phonetic notation of the whole word.
  • String form: A list of string-form syllables of the word (typing hint: ch2rm.PhoneticSylList)
  • Pair form: A list of pair-form syllables of the word (typing hints: ch2rm.IpaWord & ch2rm.CtlWord)

Example:

Word IPA IPA, string form IPA, pair form
拼音 phing-im
(Taiwanese Hokkien, Kaohsiung)
/p?i????.im??/ ['p?i??1', 'im1'] [('p?', 'i??1'), ('', 'im1')]

Sentence forms

  • Ordinarily written text
  • Segmented written text: A list of ordinarily written words of the sentence
  • Phonetic string: A single string of the phonetic notation of the whole sentence.
  • String form: A list of string-form words of the sentence
  • Pair form: A list of pair-form words of the sentence

In the string form, each string-form word can have independent type (str or one of the subclasses of _Phonetic; see below)

Example:

Sentence Words TL, string form TL, pair form
咱人生來自由
Lán-lang senn--lai tsū-i?
(Taiwanese Hokkien, Kaohsiung)
咱人 lán-lang,
生來 senn--lai,
自由 tsū-i?
[['lan2', 'lang5'], ['senn1', 'lai0'], ['tsu7', 'iu5']] [[('l', 'an2'), ('l', 'ang5')], [('s', 'enn1'), ('l', 'ai0')], [('ts', 'u7'), ('', 'iu5')]]

Comparison of Text Forms

Form Sentence Word Syllable
Ordinarily written text Sentence text
& segmented text
Word text
Phonetic string Phonetic sentence (string) Phonetic word (string) Phonetic syllable
String form
(* of string(s))
String-form sentence (list of …) String-form word (list of …) Phonetic syllable
Pair form
(* of pair(s))
Pair-form sentence (list of …) Pair-form word (list of …) Initial-final pair

Note that:

  • There are no ordinarily written syllable texts. One-syllable texts are treated as one-word text.
  • There are no lists of phonetic word strings. However, a list of ordinarily written word texts can result from word segmentation.

Terms/Concepts about the Dictionary

Dictionary

typing hint: ctl_dict.CtlDict

A word-to-phonetic-notation dictionary built from any number of dictionary text files

Dictionary text file

A file composed of tab-separated values.

Each line in the file should be separated into $n$ fields by $n - 1$ tabs and should at least contains a Word field and a _Phonetic field

(pre-)processing (for dictionary text files)

  • Lines with XML escapes are ignored.
  • Fullwidth spaces are replaced with halfwidth spaces, and then successive spaces are reduced into a single space.
  • Character in parentheses are removed along with the parentheses.
  • Latin phonetic alphabet are normalized into NFD form.
  • Words with the syllable count not consistent with the phonetic notation are ignored.

Dictionary format specification

typing hint: ctl_dict.Format

A tuple of dictionary format tokens used for specifying the fields of a dictionary text file.

Dictionary format token

A set of subclass of UserString.

Current support format tokens (only those not prefixed with _ should be used):

  • Word: Field for the word
  • _Phonetic: Field for the word in phonetic notation (syllables are separated by spaces)
    Subclasses:
    • _RomanPhonetic: Phonetic notation systems using Latin alphabet
      • TaiwaneseRomanization (TL)
      • TaiwaneseHakkaRomanization (THRS)
    • Zhuyin
  • ETC: Ignored field

Dictionary text file specification item

typing hint: ctl_dict.SrcItem

Either a string of the path to a dictionary text file or a tuple with such a string and dictionary format specification.

Dictionary text file specification

typing hint: ctl_dict.SrcSpec

Either a list of (typing hint: ctl_dict.SrcList) or a single dictionary text file specification item.

Comparison of Dictionary Text File Specification Components

Component typing hint
(ctl_dict.*)
typing hint (partially expanded)
Dictionary format token Union[str, Type[UserString]]
Dictionary format specification Format Sequence[Union[str, Type[UserString]]]]
Dictionary text file specification item SrcItem Union[str, Tuple[str, Format]]
Dictionary text file specification SrcSpec Union[List[SrcItem], SrcItem]

Text Conversion Functions

These functions perform word segmentation and pronunciation querying and thus require the use of a dictionary, see Dictionary Preparation.

Word level (ordinarily written)

ch2rm.chinese_word_to_phonetic(word: str, dict_: ctl_dict.CtlDict) -> ctl_dict.DictPronounCandList

Return a list of the possible phonetic notations of the ordinarily written word in the dictionary dict_

Sentence level (ordinarily written)

ch2rm.chinese_to_roman(sentence: str, dict_: ctl_dict.CtlDict, dialects: Lang = ch2rm.lang()) -> List[CtlWord]

Segment the ordinarily written sentence into words, and then convert the result into a pair-form CTL sentence

Phonetic Notation Conversion Functions

These functions only perform conversions between phonetic notations and thus do not require the use of dictionaries.

Syllable level (string form)

phonetic.*.*_syllable_to_ipa(syll: Str, dialect: Optional[str] = ?, variant: Optional[str] = ?) -> IpaPair

Convert a syllable string syll written in the phonetic notation system * into an IPA pair

E.g.,

  • phonetic.tl.tl_syllable_to_ipa(tl_: Str, dialect: Optional[str] = 'chiang', variant: Optional[str] = 'southern') -> phonetic.IpaPair
    • Convert a Taiwanese Hokkien syllable written in TL syll into an IPA pair
  • phonetic.zhuyin.zhuyin_syllable_to_ipa(zhuyin: Str, dialect: Optional[str] = None, variant: Optional[str] = None) -> IpaPair
    • Convert a Standard Mandarin syllable written in Zhuyin/Bopomofo zhuyin into an IPA pair

Syllable level (pair form)

phonetic.common_tl.ipa_pair_to_tl_pair(ipa_pair: IpaPair, dialect: Optional[str] = None, variant: Optional[str] = 'southern') -> CtlPair

Convert an IPA pair into a CTL pair

Word level

ch2rm.phonetic_word_to_ipa(phonetic_word: PhoneticSylList, dialects: Lang = ch2rm.lang(), phonetic: Optional[Type[ctl_dict._Phonetic]] = None) -> IpaWord

Convert a string-form phonetic word into a pair-form IPA word

The phonetic notation system of each syllable is specified by its type (one of the subclasses of _Phonetic) or phonetic if its type is str.

ch2rm.ipa_pair_to_tl(ipa_pair: IpaWord, *args, **kwargs) -> CtlWord

Convert a pair-form IPA word into a pair-form CTL word

Its non-positional arguments are forwarded to phonetic.common_tl.ipa_pair_to_tl_pair().

ch2rm.phonetic_word_to_tl(phonetic_word: PhoneticSylList, dialects: Lang = ch2rm.lang(), phonetic: Optional[Type[ctl_dict._Phonetic]] = None) -> CtlWord

Convert a string-form word into a pair-form CTL word

This function is the combination of the previous two functions.

Sentence level (phonetic string)

ch2rm.phonetic_to_tl(sentence: str, dialects: Lang = ch2rm.lang(), phonetic: Type[ctl_dict._Phonetic] = ctl_dict.TL) -> List[CtlWord]

Segment the TL-like phonetic sentence into words, and then convert the result into a pair-form CTL sentence

Comparison of Conversion Functions

To→
From↓
Phonetic *
(string-form)
IPA *
(pair-form)
CTL *
(pair-form)
Ordinarily written
(chinese_word_to_phonetic())
chinese_to_roman()
?
Phonetic string phonetic_to_tl()
?
Phonetic *
(string-form)

phonetic_word_to_ipa()
*_syllable_to_ipa()

phonetic_word_to_tl()
?
IPA *
(pair-form)

ipa_pair_to_tl()
ipa_pair_to_tl_pair()

Note that:

  • chinese_word_to_phonetic() returns a list of string-form words, which resembles a string-form sentence but every syllable is ensured to be of a subtype of _Phonetic. The return value is treated as in the word level and has the typing hint ctl_dict.DictPronounCandList.
  • There are no other conversion functions which directly convert texts from a level (sentence/word/syllable) to another.

Dialect-specifying arguments

Except for chinese_word_to_phonetic(), all sentence-level and word-level conversion functions accept an optional argument dialects (typing hint: Lang) for specifying the (sub-)dialect for each applicable language/dialect. Its default value is equivalent to:

ch2rm.lang(
    hokkien=ch2rm.lang_opt(dialect='chiang', variant='southern'),
    mandarin=ch2rm.lang_opt(dialect=None, variant=None),
    hakka=ch2rm.lang_opt(dialect='sixian', variant=None),
    common_tl=ch2rm.lang_opt(dialect=None, variant='southern'))

All arguments of ch2rm.lang() and ch2rm.lang_opt() are optional. All arguments of ch2rm.lang_opt() can be positional.

Expected combinations of arguments to ch2rm.lang() are listed in the following table:

Keyword Language/Dialect (dialect, variant) Valid *
hokkien Taiwanese Hokkien ('chiang', *)
('choan', *)
'southern'
'northern'
mandarin Standard Mandarin (None, None)
hakka Taiwanese Hakka ('sixian', None)
('hailu', None)
('dabu', None)
('raoping', 'hsinchu')
('raoping', 'zhuolan')
('zhao_an', None)
('southern_sixian', None)
common_tl (For choosing CTL variant) (None, *) 'southern'
'northern'

For choosing CTL variant, if the argument common_tl is omitted, the argument hokkien is used if provided.

For syllable-level conversion functions, the arguments dialect and variant can be used for specifying the (sub-)dialect. See the above table for expected combinations of dialect and variant.

Word Segmentation Functions

These functions require the use of a dictionary, see Dictionary Preparation.

String level

ctl_segment.split_chinese_word(sentence: str, dict_: ctl_dict.CtlDict) -> List[str]

Segment the ordinarily written sentence into words. Currently, the forward maximum matching algorithm is used.

Before the segmentation, the spaces within sentence are reduced to 1 space between TL-like words and removed otherwise.

File level

ctl_segment.split_file(path: str, dict_: ctl_dict.CtlDict) -> None

Perform word segmentation for the first line of the .trn file at path in-place

A backup file whose path is _bk suffixed to the original path is created.

Directory level

ctl_segment.split_for_each_file(path: str, dict_: ctl_dict.CtlDict) -> None

Perform word segmentation for the first line of all .trn files under path and its all (direct or indirect) sub-directories in-place

It calls ctl_segment.split_file() and thus also creates backup files.

Dictionary Functions

These functions are used to create a CtlDict object.

Module functions

ctl_dict.create_dict(path_list: SrcList, *args, **kwargs) -> CtlDict

Create a dictionary using the given dictionary text file specification

A ctl_dict.DictSrc object is created internally.

Dictionary Builder Methods

These are methods of the class ctl_dict.DictSrc.

DictSrc.add_dict_src(self, path: str, format_: Format)

Add the given dictionary text file with given format specification into the dictionary text file specification

DictSrc.reset_dict_src(self)

Clear the dictionary text file specification

DictSrc.set_dict_src(self, path_list: SrcSpec)

Specify the dictionary text file specification

DictSrc.create_dict(self, reprocess: bool = False, recreate_dump: bool = False) -> CtlDict

Create a dictionary. All the specified dictionary text files are separately loaded and dumped and then combined in-order and dumped again. By default, the already dumped data are loaded if exist.

Dictionary Preparation

A dictionary is required only when text segmentation or pronunciation querying is needed.

The dictionary text files under dict_example/ are minimal examples which merely sufficient to demonstrate these functionalities.

For practical use, it is viable to use the dictionary data from the online dictionaries compiled by the Ministry of Education of Taiwan. The already released dictionary data should be used.

Get Released Dictionary Data

Among these online dictionaries, the following ones are recommended to use and their released data are available:

Convert Dictionary Data

To allow these dictionary data files to be read into a CtlDict object, these files should be converted into tab-separated values.

There are many tools available for such conversion. which are not covered here.

The converted dictionary text files should be able to be loaded into a CtlDict object by using suitable dictionary text file specifications.

About

A tool that aims at converting Taiwanese Hokkien sentences written in Chinese characters to phonetic words.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

经常熬夜吃什么好 10月18日什么星座 什么蛇可以吃 口干是什么原因 乙肝25阳性什么意思
喝牛奶就拉肚子是什么原因 什么药止痛效果最好 菠萝为什么要用盐水泡 毁谤是什么意思 什么屎不臭答案
镜面是什么意思 0.8是什么意思 拔牙什么时候拔最好 五行缺什么怎么查询 球蛋白低是什么原因
龟苓膏是什么 吃什么可以减肥肚子 禄存是什么意思 高级别上皮内瘤变是什么意思 梦见下雪了是什么意思
属兔的婚配什么属相好xinmaowt.com sids是什么意思hcv7jop9ns5r.cn 蜂王浆什么时间吃最好hcv8jop7ns8r.cn 肌无力挂什么科hcv8jop9ns1r.cn 子宫偏小是什么原因hcv9jop0ns3r.cn
梦见棺材什么意思hcv8jop3ns3r.cn 剪头发叫什么手术hcv9jop2ns9r.cn 舌苔发黄是什么原因引起的hcv9jop7ns2r.cn 我想长胖点有什么办法hcv8jop9ns6r.cn 青石是什么石头hcv7jop5ns0r.cn
鲤鱼喜欢吃什么hcv8jop7ns5r.cn xxx是什么意思hcv8jop2ns0r.cn 麻疹是什么病hcv9jop0ns3r.cn 苏慧伦为什么不老hcv8jop2ns1r.cn 益母草有什么功效hcv9jop4ns0r.cn
除了肠镜还有什么方法检查肠道hcv8jop4ns7r.cn 前方高能什么意思hcv7jop6ns3r.cn 中风是什么原因引起的hcv9jop4ns4r.cn 嘴炮是什么意思hcv9jop6ns9r.cn 自作多情是什么意思hcv8jop6ns0r.cn
百度