Sound

Authors and titles for recent submissions

[ total of 44 entries: 1-25 | 26-44 ]
[ showing 25 entries per page: fewer | more | all ]

Thu, 2 May 2024

[1] arXiv:2405.00603 [pdf, other]: Title: Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, Jing Xiao

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2405.00307 [pdf, other]: Title: Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition

Authors: Dongyuan Li, Ying Zhang, Yusong Wang, Funakoshi Kataro, Manabu Okumura

Comments: Accepted by Journal of Natural Language Processing. arXiv admin note: text overlap with arXiv:2310.00283

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3] arXiv:2405.00248 [pdf, other]: Title: Who is Authentic Speaker

Authors: Qiang Huang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[4] arXiv:2405.00233 [pdf, other]: Title: SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Authors: Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

Comments: Demo and code: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[5] arXiv:2405.00384 (cross-list from cs.CV) [pdf, other]: Title: Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol

Authors: Konstantinos Apostolidis, Jakob Abesser, Luca Cuccovillo, Vasileios Mezaris

Comments: Accepted for publication, 3rd ACM Int. Workshop on Multimedia AI against Disinformation (MAD'24) at ACM ICMR'24, June 10, 2024, Phuket, Thailand. This is the "accepted version"

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2405.00367 (cross-list from cs.IR) [pdf, other]: Title: Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

Authors: Yoori Oh, Yoseob Han, Kyogu Lee

Comments: Accepted at SIGIR 2024 short paper track

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 1 May 2024

[7] arXiv:2404.19441 [pdf, other]: Title: ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Authors: Yuzhe Gu, Enmao Diao

Comments: Preprint

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2404.19214 [pdf, other]: Title: EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

Authors: Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2404.19212 [pdf, other]: Title: EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning

Authors: Ziqi Liang, Jianzong Wang, Xulong Zhang, Yong Zhang, Ning Cheng, Jing Xiao

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:2404.19187 [pdf, other]: Title: CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition

Authors: Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2404.19723 (cross-list from eess.AS) [pdf, other]: Title: Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

Authors: Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12] arXiv:2404.19622 (cross-list from cs.HC) [pdf, other]: Title: Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

Authors: Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson

Comments: 13+1 pages, 2 figures, accepted at the Human Motion Generation workshop (HuMoGen) at CVPR 2024

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:2404.19615 (cross-list from cs.CV) [pdf, other]: Title: SemiPL: A Semi-supervised Method for Event Sound Source Localization

Authors: Yue Li, Baiqiao Yin, Jinfu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2404.19375 (cross-list from eess.AS) [pdf, ps, other]: Title: Deep low-latency joint speech transmission and enhancement over a gaussian channel

Authors: Mohammad Bokaei, Jesper Jensen, Simon Doclo, Jan Østergaard

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Tue, 30 Apr 2024 (showing first 11 of 12 entries)

[15] arXiv:2404.18791 [pdf, other]: Title: Certification of Speaker Recognition Models to Additive Perturbations

Authors: Dmitrii Korzh, Elvir Karimov, Mikhail Pautov, Oleg Y. Rogov, Ivan Oseledets

Comments: 9 pages, 9 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[16] arXiv:2404.18514 [pdf, other]: Title: A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models

Authors: Nicolas Facchinetti, Federico Simonetta, Stavros Ntalampiras

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:2404.18355 [pdf, other]: Title: Pièces de viole des Cinq Livres and their statistical signatures: the musical work of Marin Marais and Jordi Savall

Authors: Igor Lugo, Martha G. Alatriste-Contreras

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applications (stat.AP)
[18] arXiv:2404.18094 [pdf, other]: Title: USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

Authors: Wenbin Wang, Yang Song, Sanjay Jha

Comments: 15 pages, 13 figures. Copyright has been transferred to IEEE

Journal-ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[19] arXiv:2404.18081 [pdf, other]: Title: ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20] arXiv:2404.18002 [pdf, other]: Title: Towards Privacy-Preserving Audio Classification Systems

Authors: Bhawana Chhaglani, Jeremy Gummeson, Prashant Shenoy

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2404.17983 [pdf, other]: Title: TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality

Authors: Tiantian Feng, Xuan Shi, Rahul Gupta, Shrikanth S. Narayanan

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[22] arXiv:2404.17821 [pdf, ps, other]: Title: An automatic mixing speech enhancement system for multi-track audio

Authors: Xiaojing Liu, Angeliki Mourgela, Hongwei Ai, Joshua D. Reiss

Comments: 5 pages

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[23] arXiv:2404.17806 [pdf, other]: Title: T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

Comments: Preprint submitted to IEEE MLSP 2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.17721 [pdf, ps, other]: Title: An RFP dataset for Real, Fake, and Partially fake audio detection

Authors: Abdulazeez AlAli, George Theodorakopoulos

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[25] arXiv:2404.17608 [pdf, ps, other]: Title: Synthesizing Audio from Silent Video using Sequence to Sequence Modeling

Authors: Hugo Garrido-Lestache Belinchon, Helina Mulugeta, Adam Haile

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

[ total of 44 entries: 1-25 | 26-44 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Thu, 2 May 2024

Wed, 1 May 2024

Tue, 30 Apr 2024 (showing first 11 of 12 entries)