MasterThesis

My Master Thesis experimentation source codes

View on GitHub

SEGAN

SE-GAN

Speech Enhancement GAN

Speech Enhancement Generative Adversarial Network

Thesis - Paper(in Turkish)

https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=fl0Kw4p1rmMDotyKRdYv1NC_jHlQf4_EkB366lPjbYSgMgBkdDEloOymzKUxe2_A </br>

Presentation(in Turkish)

https://www.youtube.com/watch?v=UMyHcdOsduU

Pretrained Model

Old SEGAN generator weights are released and can be downloaded in this link. Make sure you place this file into the ckpt_segan directory to make it work with the proper train.opts config file within that folder. The script run_segan_clean.sh will properly read the ckpt in that directory as it is configured to be used with this referenced file.

Introduction to old Segan

Old SEGAN uses skip connections between narrower and expander layers to ensure data won’t mix up while training. WaveGAN has this issue while denoising mean while it is not designed with this intention. Loss function is LSGAN

Old_SEGAN_G

Introduction to New Segan

New SEGAN is an improved version of SEGAN [1]. New SEGAN has a skip connections between narrower and expander layers. Also there is residual skip connections to upper layers of expander. There is two method to combine skip connections; #####[‘summation’ and ‘concatenate’] Narrower and expander skip connections combines and connects to upper expander layer.

Loss function is LSGAN. Loss function doesn’t change while skip connections

New_SEGAN_G

Introduction to scripts

Three models are ready to train and use to make wav2wav speech enhancement conversions. Denoising utterances with its generator network (G).
Note: Determine batch size according to the memory size.

To train this model, the following command should be ran:

python train.py --save_path ckpt_segan+ --batch_size 300 \
		--clean_trainset data/clean_trainset \
		--noisy_trainset data/noisy_trainset \
		--cache_dir data/cache

Read run_new_segan_train.sh for more guidance. This will use the default parameters to structure both G and D, but they can be tunned with many options. For example, one can play with --d_pretrained_ckpt and/or --g_pretrained_ckpt to specify a departure pre-train checkpoint to fine-tune some characteristics of our enhancement system, like language, as in [2].

Cleaning files is done by specifying the generator weights checkpoint, its config file from training and appropriate paths for input and output files (Use soundfile wav writer backend (recommended) specifying the --soundfile flag):

python clean.py --g_pretrained_ckpt ckpt_segan+/<weights_ckpt_for_G> \
		--cfg_file ckpt_segan+/train.opts --synthesis_path enhanced_results \
		--test_files data/noisy_testset --soundfile

Read run_new_segan_clean.sh for more guidance.

There is a WSEGAN, which stands for the dewhispering SEGAN [3]. This system is activated (rather than vanilla SEGAN) by specifying the --wsegan flag. Additionally, the --misalign_pair flag will add another fake pair(augmentation) to the adversarial loss indicating that content changes between input and output of G is bad, something that improved our results for [3].

There is a AEWSEGAN, which stands for the disable discriminator to faster training and get a more lightweight solution. This system is activated (rather than vanilla SEGAN) by specifying the --aewsegan flag. Additionally, the --misalign_pair flag will add another fake pair(augmentation) to the adversarial loss indicating that content changes between input and output of G is bad, something that improved our results for [3].

References:

  1. SEGAN: Speech Enhancement Generative Adversarial Network (Pascual et al. 2017)
  2. Language and Noise Transfer in Speech Enhancement GAN (Pascual et al. 2018)
  3. Whispered-to-voiced Alaryngeal Speech Conversion with GANs (Pascual et al. 2018)
  4. SEGAN linkedin slayt
  5. Dataset

Cite

Thanks to:

@article{pascual2017segan,
  title={SEGAN: Speech Enhancement Generative Adversarial Network},
  author={Pascual, Santiago and Bonafonte, Antonio and Serr{\`a}, Joan},
  journal={arXiv preprint arXiv:1703.09452},
  year={2017}
}
Thanks to Jetbrains. They gave me open source free licence of thier product while i am researching.
I am very appreciated.

Notes

Future Plans