speech reallm real-time streaming speech recognition with multimodal llms by teaching the flow of time

https://arxiv.org/abs/2406.09569 Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of TimeWe introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio witarxi..
내공얌냠
'speech reallm real-time streaming speech recognition with multimodal llms by teaching the flow of time' 태그의 글 목록