TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

Yu Zhang, Wenxiang Guo, Changhao Pan, Dongyu Yao, Zhiyuan Zhu, Ziyue Jiang, Yuhan Wang, Tao Jin, Zhou Zhao | Zhejiang University

PyTorch implementation of TCSinger 2 (ACL 2025): Customizable Multilingual Zero-shot Singing Voice Synthesis.

Visit our demo page for audio samples.

News

2025.07: We released the code of TCSinger 2!
2025.07: We realeased the code of STARS!
2025.05: TCSinger 2 is accepted by ACL 2025!

Key Features

We present TCSinger 2, a multi-task multilingual zero-shot SVS model with style transfer and style control based on various prompts.
We introduce the Blurred Boundary Content Encoder for robust modeling and smooth transitions of phoneme and note boundaries.
We design the Custom Audio Encoder using contrastive learning to extract styles from various prompts, while the Flow-based Custom Transformer with Cus-MOE and F0, enhances synthesis quality and style modeling.
Experimental results show that TCSinger 2 outperforms baseline models in subjective and objective metrics across multiple tasks: zero-shot style transfer, multi-level style control, cross-lingual style transfer, and speech-to-singing style transfer.

Quick Start

We provide an example of how you can train your own model and infer with TCSinger 2.

To try on your own dataset, clone this repo on your local machine with NVIDIA GPU + CUDA cuDNN and follow the instructions below.

Dependencies

A suitable conda environment named tcsinger2 can be created and activated with:

conda create -n tcsinger2 python=3.10
conda install --yes --file requirements.txt
conda activate tcsinger2

Multi-GPU

By default, this implementation uses as many GPUs in parallel as returned by torch.cuda.device_count(). You can specify which GPUs to use by setting the CUDA_DEVICES_AVAILABLE environment variable before running the training module.

Train your own model

Data Preparation

Collect your own singing dataset, e.g., including GTSinger, and feel free to add extra data annotated with alignment tools, like STARS.
Place metadata.json (fields: ph, word, item_name, ph_durs, wav_fn, singer, ep_pitches, ep_notedurs, ep_types, emotion, singing_method, technique) and phone_set.json (complete phoneme list) in the desired folder and update the paths in preprocess/preprocess.py. (A reference metadata.json is provided in GTSinger.) Please present the singer attribute as a description specifying the performer’s gender and vocal range, and render the technique attribute either as a concise text listing of skills or as a natural-language account that conveys their sequential order.
Extract F0 for each .wav, save as *_f0.npy, e.g. with RMVPE.
Download HIFI-GAN as the vocoder in useful_ckpts/hifigan and FLAN-T5 in useful_ckpts/flan-t5-large.
Preprocess the dataset:

export PYTHONPATH=.
python preprocess/preprocess.py

Tip: You may also convert your dataset directly to a .csv instead of using metadata.json.

Compute mel-spectrograms:

python preprocess/mel_spec_48k.py --tsv_path data/new/data.tsv --num_gpus 1 --max_duration 20

Post-process:

python preprocess/postprocess_data.py

Training TCSinger 2

Train the VAE module and duration predictor

python main.py --base configs/ae_singing.yaml -t --gpus 0,1,2,3,4,5,6,7

Train the main TCSinger 2 model

python main.py --base configs/tcsinger2.yaml -t --gpus 0,1,2,3,4,5,6,7

Notes

Adjust the compression ratio in the config files (and related scripts).
Change the padding length in the dataloader as needed.
To train the Custom Audio Encoder, format data as in ldm/data/joinaudiodataset_con.py, set the trained VAE path in ae_con.yaml, and proceed with training.

Inference with TCSinger 2

python scripts/test_sing.py

Replace the checkpoint path and CFG coefficient as required. For speech inputs, modify the VAE accordingly.

Acknowledgements

This implementation uses parts of the code from the following Github repos: Make-An-Audio-3, TCSinger Lumina-T2X as described in our code.

Citations

If you find this code useful in your research, please cite our work:

@article{zhang2025tcsinger,
  title={TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis},
  author={Zhang, Yu and Guo, Wenxiang and Pan, Changhao and Yao, Dongyu and Zhu, Zhiyuan and Jiang, Ziyue and Wang, Yuhan and Jin, Tao and Zhao, Zhou},
  journal={arXiv preprint arXiv:2505.14910},
  year={2025}
}

Disclaimer

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's singing without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

常染色体是什么	什么病会引起腰疼	今年什么生肖年	系鞋带什么意思	老子是什么朝代的人
手脱皮擦什么药膏	耄耋是什么意思	龋齿是什么样子的图片	什么叫tct检查	青春痘是什么原因引起的
缺氯有什么症状怎么补	尿发黄什么原因	接待是什么意思	生理期为什么不能拔牙	pa环是什么
金瓶梅是什么	眉毛痒是什么原因	脚老是抽筋是什么原因	喝酒后肚子疼什么原因	偶见是什么意思

3.8什么星座hcv9jop6ns6r.cn	羊肚菌是什么hcv7jop9ns6r.cn	吃什么白细胞升的最快chuanglingweilai.com	番石榴是什么hcv9jop1ns1r.cn	什么魂什么魄hcv9jop0ns4r.cn
宫颈筛查是什么意思hcv8jop8ns8r.cn	心跳过缓是什么原因造成的hcv9jop0ns5r.cn	蒲公英泡水喝有什么副作用inbungee.com	吃什么对脑血管好hcv8jop3ns0r.cn	肝掌是什么原因引起的hcv9jop2ns2r.cn
鸡蛋黄发红是什么原因hcv7jop6ns7r.cn	查血挂什么科hcv9jop6ns9r.cn	什么醒酒zhongyiyatai.com	脑瘫是什么shenchushe.com	红斑狼疮是一种什么病520myf.com
孕期腰疼是什么原因hcv9jop0ns8r.cn	手总是发麻是什么原因hcv9jop0ns3r.cn	感冒发烧吃点什么食物比较好hcv7jop6ns6r.cn	额窦炎吃什么药管用hcv9jop7ns0r.cn	心肌供血不足用什么药hcv7jop4ns6r.cn

南海网：海南1000名农民工欢聚一堂共享美味年夜...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

Yu Zhang, Wenxiang Guo, Changhao Pan, Dongyu Yao, Zhiyuan Zhu, Ziyue Jiang, Yuhan Wang, Tao Jin, Zhou Zhao | Zhejiang University

News

Key Features

Quick Start

Dependencies

Multi-GPU

Train your own model

Data Preparation

Training TCSinger 2

Inference with TCSinger 2

Acknowledgements

Citations

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Name	Name	Last commit message	Last commit date
Latest commit ? History 22 Commits
configs	configs	?	?
ldm	ldm	?	?
preprocess	preprocess	?	?
scripts	scripts	?	?
utils	utils	?	?
vocoder	vocoder	?	?
README.md	README.md	?	?
main.py	main.py	?	?
requirements.txt	requirements.txt	?	?

AaronZ345/TCSinger2

Folders and files

Latest commit

History

Repository files navigation

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

Yu Zhang, Wenxiang Guo, Changhao Pan, Dongyu Yao, Zhiyuan Zhu, Ziyue Jiang, Yuhan Wang, Tao Jin, Zhou Zhao | Zhejiang University

News

Key Features

Quick Start

Dependencies

Multi-GPU

Train your own model

Data Preparation

Training TCSinger 2

Inference with TCSinger 2

Acknowledgements

Citations

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages