https://arxiv.org/abs/2404.04645 HyperTTS: Parameter Efficient Adaptation in Text to Speech using HypernetworksNeural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaarxiv.orgContributionDynamic Ad..
공부
ICASSP 2022https://arxiv.org/abs/2110.03857IntroductionIn research, the text content of training and test data are often highly similar and in the same text domain. For many real-world applications, TTS systems need to deal with text input with arbitrary content across a wide range of domains.specific target speakers 로부터의 데이터를 증가시키는 것 : costly or impractical → “non-target” speakers 의 데이터를 사용하는 것..
Interspeech 2022https://arxiv.org/abs/2110.05798Contributionpresent transfer learning methods and guidelines for finetuning single-speaker TTS models for a new voiceevaluate and provide a detailed analysis with varying amount of datademonstrate that transfer learning can substantially reduce the training time and amount of data needed for synthesizing a new voiceopen-source framework, provide a ..
https://arxiv.org/abs/2406.09569 Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of TimeWe introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio witarxi..
설치 방법설치brew install openjdk@17path 설정참고 : zshrc 아니고 bash_profile 이신 분들은 ~/.zshrc 부분을 ~/.bash_profile 로 대체하시면 됩니다.brew info openjdk@17 # 설치 내용 확인echo 'export PATH="/opt/homebrew/opt/openjdk@17/bin:$PATH"' >> ~/.zshrcexport CPPFLAGS="-I/opt/homebrew/opt/openjdk@17/include" # 이건 혹시나 컴파일러를 위해서source ~/.zshrcjava -vpath 설정 전에는 java home 경로가 14 였다가 source 명령어로 적용 이후에 확인하면 17로 잘 잡힘을 확인할 수 있다.References..
Abstract Resolution-connected generator, Resolution-wise discriminator 제안 더불어 정확성있게 high-frequency components 재생산을 위해 discriminators 안에서 discrete wavelet transform 이용 Fre-GAN은 MOS에서 Ground-truth audio와 0.03 정도의 차이만 난다. 1. Introduction autoregressive model 들은 좋은 성능을 보여주지만 느린 인퍼런스 속도 이들의 구조적 한계를 해결하기 위해 flow-based vocoders 가 제안되었다. 자연스러운 waveform을 실시간으로 생성함에도 불구하고 병렬적으로 noise sequence를 raw wavefor..
paper link: https://arxiv.org/abs/2003.08934 NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-con arxiv.org Abstract input ..
설명 Rectified Adam 가중치를 업데이트하기 위한 optimizer로, Adam 의 변형입니다. Adam이 가진 Bad local optima convergence problem(local optima에 너무 일찍 도달하여 학습이 거의 일어나지 않는 현상)을 개선하고자 하였습니다. Adam의 수식에 rectification(분산을 consistent하게 만들 수 있는 rectification term)을 곱해줌으로써 학습 초기에 일어날 수 있는 bad local optima problem을 해결하고, 학습 안정성을 높였다고 할 수 있습니다. 사용 optimizer = RAdam(model.parameters(), lr=learning_rate, betas=(0.9, 0.999), weight_d..