ChatGPT 와 정리한 내용실전 파이프라인PyTorch 모델 ↓TorchScript (PyTorch 전용 추론) ↓ONNX (프레임워크 중립 포맷) ↓TensorRT (최고 속도 추론) 상황별 추천Pytorch -> TorchScript -> ONNX vs Pytorch -> ONNX ?
관심 있는 부분만 요약했습니다..https://arxiv.org/pdf/2303.03926conditional language modeling task with neural codec codesAR language model 를 사용한 paired phoneme sequences 로부터 첫번째 Encodec quantizer 로 audio codec 을 만들고, 그 코덱들을 나머지 quantizer 로 병렬로 NAR model 을 이용해서 코드를 생성한다multilingual autoregressive codec LM, multilingual non-autoregressive codec LM 이 acoustic tokens 를 서로 다르게 세부적으로 생성acoustic quantizer, vall-e 에..
사실 kubeflow 설치를 1월 중순에 시도했습니다. 안되서 2월 초에 새로운 마음으로 다시 시도하였습니다. 지금 생각해보니 도커 권한이 없어서 이미지를 못받아와서 모든 것들이 어그러진 것 같더군요,, 여튼 제대로 성공하여 바로 글로 작성해봅니다 ㅎㅎUbuntu 22.04 LTS 에서 실행하였습니다. 쿠버네티스는 1.32 버전을 사용합니다. 또한 minikube 를 사용합니다.1. install mysql, install docker2. install kuberneteshttps://www.whatwant.com/entry/Kubeflow-in-Kuberneteshttps://kubernetes.io/docs/tasks/tools/install-kubectl-linux/ Install and Set U..
https://arxiv.org/abs/1609.03499 WaveNet: A Generative Model for Raw AudioThis paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that arxiv.orgv1 2016, v2 2016IntroductionJoint probabilities 를 pixel/wor..
https://arxiv.org/abs/2006.04558 FastSpeech 2: Fast and High-Quality End-to-End Text to SpeechNon-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. The training of FastSpeech model relies on an autoregressive teacher model for duratioarxiv.orgv1 2020, v8 2022ICLR 2021Introduction기존..
https://arxiv.org/abs/2404.04645 HyperTTS: Parameter Efficient Adaptation in Text to Speech using HypernetworksNeural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaarxiv.orgContributionDynamic Ad..
ICASSP 2022https://arxiv.org/abs/2110.03857IntroductionIn research, the text content of training and test data are often highly similar and in the same text domain. For many real-world applications, TTS systems need to deal with text input with arbitrary content across a wide range of domains.specific target speakers 로부터의 데이터를 증가시키는 것 : costly or impractical → “non-target” speakers 의 데이터를 사용하는 것..
Interspeech 2022https://arxiv.org/abs/2110.05798Contributionpresent transfer learning methods and guidelines for finetuning single-speaker TTS models for a new voiceevaluate and provide a detailed analysis with varying amount of datademonstrate that transfer learning can substantially reduce the training time and amount of data needed for synthesizing a new voiceopen-source framework, provide a ..