SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Informations
- Type:
- misc
- Auteurs:
- Dong Zhang and Shimin Li and Xin Zhang and Jun Zhan and Pengyu Wang and Yaqian Zhou and Xipeng Qiu
- Pertinence:
-
Moyenne
- Référence:
- zhang2023speechgpt
- Doi:
- Mots-clés:
- Url:
- https://arxiv.org/abs/2305.11000
- Date de publication:
- 05/2023
- Résumé:
- système vocal basé sur gpt (input et output vocales)
- Abstract:
-
Multi-modal large language models are regarded as a crucial step towards Ar-
tificial General Intelligence (AGI) and have garnered significant interest with
the emergence of ChatGPT. However, current speech-language models typi-
cally adopt the cascade paradigm, preventing inter-modal knowledge transfer.
In this paper, we propose SpeechGPT, a large language model with intrinsic
cross-modal conversational abilities, capable of perceiving and generating multi-
model content. With discrete speech representations, we first construct SpeechIn-
struct, a large-scale cross-modal speech instruction dataset. Additionally, we
employ a three-stage training strategy that includes modality-adaptation pre-
training, cross-modal instruction fine-tuning, and chain-of-modality instruction
fine-tuning. The experimental results demonstrate that SpeechGPT has an im-
pressive capacity to follow multi-modal human instructions and highlight the
potential of handling multiple modalities with one model. Demos are shown in
https://0nutation.github.io/SpeechGPT.github.io/. - Pdf:
- Lien pdf
Références
0 articles
Titre | Type | Pertinence | Auteurs | Date Publication | Références | Citations | Actions |
---|---|---|---|---|---|---|---|
Pas encore d'article |
Citations
0 articles
Titre | Type | Pertinence | Auteurs | Date Publication | Références | Citations | Actions |
---|---|---|---|---|---|---|---|
Pas encore d'article |
Mots-clés
0 mots-clés
Nom | Nombre d'articles | Actions |
---|---|---|
Pas encore de mot-clé |
Auteurs
1 auteurs
Nom | Nombre d'articles | Actions |
---|---|---|
Dong Zhang and Shimin Li and Xin Zhang and Jun Zhan and Pengyu Wang and Yaqian Zhou and Xipeng Qiu | 1 |