1. R. Agerri, I. San Vicente, J. Ander Campos, A. Barrena, X. Saralegi, A. Soroa, E. Agirre. “Give your Text Representation Models some Love: the Case for Basque.” In Proceedings of the 12th LREC Conference, 4781-4788. 2020.
  2. A. Aghajanyan et al. “Muppet: Massive multi-task representations with prefinetuning”. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 5799–5811. 2021.
  3. N. Ahmed, W. Muntasir. “The de-democratization of ai: Deep learning and the compute divide in artificial intelligence research”. arXiv preprint arXiv:2010.15581. 2020.
  4. V. Aribandi, Y. Tay, T. Schuster, J. Rao, H. Steven Zheng, S. Vaibhav Mehta, H. Zhuang, V. Q. Tran, D. Bahri, J. Ni, et al. Ext5: “Towards extreme multi-task scaling for transfer learning”. arXiv preprint arXiv:2111.10952. 2021.
  5. J. Armengol-Estapé, C. Pio Carrino, C. Rodriguez-Penagos, O. de Gibert Bonet, C. Armentano-Oller, A. González-Agirre, M. Melero, M. Villegas. “Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan.” In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 4933-4946. 2021.
  6. A. Baevski, Y. Zhou, A. Mohamed, M. Auli. “Wav2vec 2.0: A framework for selfsupervised learning of speech representations”. Advances in Neural Information Processing Systems, 33. 2020.
  7. D. Bahdanau, K. Cho, Y. Bengio. “Neural machine translation by jointly learning to align and translate”. arXiv preprint arXiv:1409.0473. 2014.
  8. T. B. Brown et al. “Language models are few-shot learners”. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS). 2020.
  9. J. Cañete et al. “Spanish pre-trained BERT model and evaluation data.” In Proceedings of the PML4DC Workshop (ICLR 2020). 2020.
  10. E. Casanova, J. Weber, C. D. Shulby, A. Candido Jr, E. Gölge, M. A. Ponti. “Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone.” In International Conference on Machine Learning, pp. 2709-2720. PMLR, 2022.
  11. C. Chien, J. Lin, C. Huang, P. Hsu, H. Lee. “Investigating on incorporating pretrained and learnable speaker representations for multi-speaker multi-style text-tospeech.” In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8588-8592. IEEE, 2021.
  12. A. Chowdhery et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  13. S. Clinchant, K. Woo Jung, and V. Nikoulina. On the use of BERT for neural machine translation. In Proceedings of NLG. 2019.
  14. R. Collobert et al. “Natural language processing (almost) from scratch”. Journal of Machine Learning Research, 12:2493–2537, 2011.
  15. A. Conneau et al. “Unsupervised Cross-lingual Representation Learning at Scale.” In Proceedings of the 58th Annual Meeting of ACL, pp. 8440-8451. 2020.
  16. A. M. Dai, Q. V. Le. “Semi-supervised sequence learning”. Annual Conference of Neural Information Processing Systems (NeurIPS 2015): 3079–3087. 2015.
  17. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova. “BERT: Pre-training of deep bidirectional transformers for language understanding”. In Proceedings of NAACL 2019 Conference: Human Language Technologies, 4171–4186. 2019.
  18. S. Doddapaneni et al. “A primer on pre-trained multilingual language models.” arXiv preprint arXiv:2107.00676. 2021.
  19. L. Dong, S. Xu, Bo Xu. “Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition”. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5884–5888. IEEE. 2018.
  20. A. Elu, G. Azkune, O. Lopez de Lacalle, I. Arganda-Carreras, A. Soroa, and E. Agirre. Inferring spatial relations from textual descriptions of images. Pattern Recognition, 113:107847, 2021.
  21. A. Ettinger. “What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models”. Trans. Association for Computational Linguistics, 8:34–48. 2020.
  22. R. Futrell et al. “Neural language models as psycholinguistic subjects: Representations of syntactic state”. In Proc, 2019 NAACL Conference: Human
    Language Technologies, 1, 32–42. 2019.
  23. S. Gehrmann et al. “The GEM benchmark: Natural language generation, its evaluation and metrics”. In Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021): 96–120. ACL, 2021.
  24. S. Gutiérrez-Fandiño et al. 2022. “Maria: Spanish language models”. Procesamiento del Lenguaje Natural, 68(0):39–60.
  25. A. Gulati, J. Qin, C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, R. Pang. “Conformer: Convolution-augmented Transformer for Speech Recognition”. In Proc. Interspeech 2020, pages 5036–5040. 2020.
  26. K. Gulordava et al. “Green Recurrent Networks Dream Hierarchically”. Proceedings of the 2018 Conference of the NAACL: Human Language Technologies: 1195–1205. 2018.
  27. X. Han, Z. Zhang, N. Ding, Y. Gu, X. Liu, Y. Huo, J. Qiu et al. “Pre-trained models: Past, present and future.” AI Open. 2021.
  28. P. He, X. Liu, J. Gao, W. Chen, 2020. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
  29. J. Howard, S. Ruder. “Universal language model fine-tuning for text classification”. In Proceedings of the 56th Annual Meeting of the ACL, pp. 328–339, Melbourne. 2018.
  30. J. Hu, S. Ruder, A. Siddhant, G. Neubig, O. Firat and M. Johnson. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International Conference on Machine Learning (pp. 4411-4421). PMLR. 2020.
  31. S. Karita, N. Chen, T. Hayashi, T. Hori, H. Inaguma, Z. Jiang, M. Someki, N. E. Yalta Soplin, R. Yamamoto, X. Wang, et al. “A comparative study on Transformer vs RNN in speech applications” In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 449–456. IEEE. 2019.
  32. D. Kiela et al. Dynabench: Rethinking benchmarking in NLP. arXiv preprint arXiv:2104.14337. 2021
  33. J. Kim, J. Kong, J. Son. “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech.” In International Conference on Machine Learning, pp. 5530-5540. PMLR, 2021.
  34. A. Łańcucki. “Fastpitch: Parallel text-to-speech with pitch prediction.” In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6588-6592. IEEE, 2021.
  35. Le Scao, T et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100
  36. J. Li, T. Tang, W. Xin Zhao, J.-R. Wen. “Pretrained language model for text generation: A survey”. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21): 4492–4499. 2021.
  37. Y. Liang et al. “Xglue: A new benchmark dataset for cross-lingual pre-training, understanding and generation”. arXiv, abs/2004.01401. 2020.
  38. X. V. Lin et all. 2022. Few-shot Learning with Multilingual Generative Language Models. arXiv preprint arXiv:2112.10668.
  39. T. Linzen, E. Dupoux, Y. Goldberg. “Assessing the Ability of LSTMs to Learn SyntaxSensitive Dependencies”. Transactions of the Association for Computational Linguistics, 4:521–535. 2016.
  40. J. Liu, Y. Chen, K. Liu, J. Zhao. “Neural cross-lingual event detection with minimal parallel resources”. In Proceedings of the 2019 EMNLP Conference: 738–748. 2019.
  41. P. Liu et al. “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.” arXiv preprint arXiv:2107.13586. 2021.
  42. Y. Liu et al. “Multilingual denoising pre-training for neural machine translation”. Transactions of the Association for Computational Linguistics, 8:726–742, 2020.
  43. T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin. “Advances in pre-training distributed word representations”. In Proceedings of the 11th International Conference on Language Resources and Evaluation. ELRA, 2018.
  44. T. Mikolov, K. Chen, G. Corrado, J. Dean. “Efficient estimation of word representations in vector space”. arXiv preprint arXiv:1301.3781, 2013.
  45. B. Min, H. Ross, E. Sulem, A. Pouran Ben Veyseh, T. Huu Nguyen, O. Sainz, E. Agirre, I. Heinz, D. Roth. “Recent advances in natural language processing via large pretrained language models: A survey”. arXiv preprint arXiv:2111.01243, 2021a.
  46. S. Min, M. Lewis, L. Zettlemoyer, H. Hajishirzi. “Metaicl: Learning to learn in context”. arXiv preprint arXiv:2110.15943, 2021b.
  47. J. Novikova, O. Dusek, A. Cercas Curry, V. Rieser. “Why we need new evaluation metrics for NLG”. In Proceedings of EMNLP 2017, 2241–2252. 2017.
  48. S. Park et al. “KLUE: Korean language understanding evaluation”. Proc. of Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2020.
  49. J. Pennington, R. Socher, C. Manning. “GloVe: Global vectors for wordrepresentation”. In Proceedings of 2014 EMNLP 2014: 1532–1543. 2014.
  50. M. E. Peters et al. “Deep contextualized word representations”. In Proceedings of the 2018 Conference of the NAACL: Human Language Technologies, 2227–2237. 2018.
  51. J. Pfeiffer, I. Vulić, I. Gurevych, S. Ruder. “MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer.” In Proc.2020 EMNLP Conference, 7654-7673. 2020.
  52. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. “Language models are unsupervised multi task learners”. Technical report, OpenAI. 2019.
  53. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever. “Improving language understanding by generative pre-training”. Technical Report. Open AI. 2018.
  54. A. Radford, J. W. Kim, T., Xu, G. Brockman, C. Mcleavey, I. Sutskever.. “Robust Speech Recognition via Large-Scale Weak Supervision”. arXiv preprint: arXiv:2212.04356, 2022.
  55. C. Raffel et al. “Exploring the limits of transfer learning with a unified text-to-text transformer”. Journal of Machine Learning Research, 21:1–67, 2020.
  56. Y. Ren, H., C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, T. Liu. “Fastspeech 2: Fast and highquality end-to-end text to speech.” arXiv preprint arXiv:2006.04558, 2020.
  57. C. Rodriguez-Penagos et al. “The catalan language club”. arXiv preprint arXiv:2112.01894. 2021.
  58. A. Rogers, O. Kovaleva, A. Rumshisky. “A primer in BERTology: What we know about how BERT works”. Transactions of the Association for Computational Linguistics, 8:842–866, 2020.
  59. V. Sanh et al. 2021. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  60. V. Sanh, L. Debut, J. Chaumond, T. Wolf. “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.” In 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing – NeurIPS. 2019
  61. T. Schick, and H. Schütze. 2020. “Exploiting cloze questions for few shot text classification and natural language inference.” arXiv preprint arXiv:2001.07676.
  62. H. Seelawi et al. “ALUE: Arabic language understanding evaluation” In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp 173–184, 2021.
  63. J. Shen, P. Ruoming R. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen et al. “Natural tts synthesis by conditioning wavenet on mel spectrogram predictions.” In 2018 IEEE international conference on acoustics, speech and signal processing. IEEE, 2018.
  64. A. Torfi et al. “Natural language processing advancements by deep learning: A survey”. arXiv preprint arXiv:2003.01200, 2020.
  65. C. Tran, S. Bhosale, J. Cross, P. Koehn, S. Edunov, A. Fan. “Facebook AI’s WMT21 news translation task submission”. In Proc. of WMT, 2021.
  66.  J. Turian, L.-A. Ratinov, Y. Bengio. “Word representations: A simple and general method for semi-supervised learning”. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 384–394, Uppsala, Sweden, 2010.
  67. G. Urbizu, I. San Vicente, X. Saralegi, R. Agerri, A. Soroa. 2022. BasqueGLUE: A Natural Language Understanding Benchmark for Basque. In Proceedings of LREC Conference (pp. 1603-1612).
  68. C. van der Lee, A. Gatt, E. van Miltenburg, E. Krahmer. “Human evaluation of automatically generated text: Current trends and best practice guidelines”.
    Computer Speech & Language, 67:101151. 2021
  69. D. Ustalov, A. Panchenko, C. Biemann. “Watset: Automatic Induction of Synsets from a Graph of Synonyms.” In Proceedings 55th ACL Conference, 1579-1590. 2017.
  70. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, l. Polosukhin. “Attention is all you need”. Annual Conference on Neural Information Processing Systems (NeurIPS 2017), 5998–6008, 2017.
  71. A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, S. Bowman “GLUE: A multi-task benchmark and analysis platform for natural language understanding”. arXiv
    preprint arXiv:1804.07461. 2018.
  72. A. Wang, et al. “Superglue: A stickier benchmark for general-purpose language understanding systems”. Advances in neural information processing systems. 2019.
  73. C. Wang, S. Chen, Y. Wu, Z. Zhang, L. Zhou, S. Liu, Z. Chen et al. “Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.” arXiv preprint arXiv:2301.02111 (2023).
  74. J. Wei et al.”Fine Tuned language models are zero-shot learners”. arXiv preprint arXiv:2109.01652, 2021.
  75. J. Wei et al. 2022. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  76. B. Wilie et al. “In doNLU: Benchmark and resources for evaluating Indonesian natural language understanding”. In Proceedings AACL, pp 843–857. 2020.
  77. G. I. Winata, A. Madotto, X. Lin , R. Liu, J. Yosinski and P. Fung. 2021. Language Models are Few-shot Multilingual Learners. In Proc. ofthe 1st Workshop on Multilingual Representation Learning.
  78. T. Wolf et al., “Transformers: State-of-the-art natural language processing”. In Proc. 2020 EMNLP Conference: System Demonstrations, pp. 38–45. 2020.
  79. L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Raffel. “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer.” In Proceedings of the 2021 NAACL Conference: Human Language Technologies, 483-498. 2021.
  80. Q. Ye, B. Yuchen Lin, X. Ren. “CrossFit: A few-shot learning challenge for cross-task generalization in NLP”. In Proceedings EMNLP 2021, 7163–7189. 2021.
  81. A. Zeyer, P. Bahar, K. Irie, R. Schlüter, H. Ney. “A comparison of transformer and LSTM encoder decoder models for ASR”. In 2019 IEEE Automatic Speech
    Recognition and Understanding Workshop (ASRU), pages 8–15. IEEE. 2019.
  82. M. Zhao and H. Schütze. 2021 Discrete and Soft Prompting for Multilingual Models. In Proceedings of EMNLP 2021.
  83. J. Zhu, Y. Xia, L. Wu, D. He, T. Qin, W. Zhou, H. Li, and T. Liu. “Incorporating BERT into neural machine translation”. In Int. Conference on Learning Representations. 2020