사실 kubeflow 설치를 1월 중순에 시도했습니다. 안되서 2월 초에 새로운 마음으로 다시 시도하였습니다. 지금 생각해보니 도커 권한이 없어서 이미지를 못받아와서 모든 것들이 어그러진 것 같더군요,, 여튼 제대로 성공하여 바로 글로 작성해봅니다 ㅎㅎUbuntu 22.04 LTS 에서 실행하였습니다. 쿠버네티스는 1.32 버전을 사용합니다. 또한 minikube 를 사용합니다.1. install mysql, install docker2. install kuberneteshttps://www.whatwant.com/entry/Kubeflow-in-Kuberneteshttps://kubernetes.io/docs/tasks/tools/install-kubectl-linux/ Install and Set U..
공부
https://arxiv.org/abs/1609.03499 WaveNet: A Generative Model for Raw AudioThis paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that arxiv.orgv1 2016, v2 2016IntroductionJoint probabilities 를 pixel/wor..
https://arxiv.org/abs/2006.04558 FastSpeech 2: Fast and High-Quality End-to-End Text to SpeechNon-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. The training of FastSpeech model relies on an autoregressive teacher model for duratioarxiv.orgv1 2020, v8 2022ICLR 2021Introduction기존..
https://arxiv.org/abs/2404.04645 HyperTTS: Parameter Efficient Adaptation in Text to Speech using HypernetworksNeural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaarxiv.orgContributionDynamic Ad..
ICASSP 2022https://arxiv.org/abs/2110.03857IntroductionIn research, the text content of training and test data are often highly similar and in the same text domain. For many real-world applications, TTS systems need to deal with text input with arbitrary content across a wide range of domains.specific target speakers 로부터의 데이터를 증가시키는 것 : costly or impractical → “non-target” speakers 의 데이터를 사용하는 것..
Interspeech 2022https://arxiv.org/abs/2110.05798Contributionpresent transfer learning methods and guidelines for finetuning single-speaker TTS models for a new voiceevaluate and provide a detailed analysis with varying amount of datademonstrate that transfer learning can substantially reduce the training time and amount of data needed for synthesizing a new voiceopen-source framework, provide a ..
https://arxiv.org/abs/2406.09569 Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of TimeWe introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio witarxi..
설치 방법설치brew install openjdk@17path 설정참고 : zshrc 아니고 bash_profile 이신 분들은 ~/.zshrc 부분을 ~/.bash_profile 로 대체하시면 됩니다.brew info openjdk@17 # 설치 내용 확인echo 'export PATH="/opt/homebrew/opt/openjdk@17/bin:$PATH"' >> ~/.zshrcexport CPPFLAGS="-I/opt/homebrew/opt/openjdk@17/include" # 이건 혹시나 컴파일러를 위해서source ~/.zshrcjava -vpath 설정 전에는 java home 경로가 14 였다가 source 명령어로 적용 이후에 확인하면 17로 잘 잡힘을 확인할 수 있다.References..