PROGRAM



Full Program


Session Room Chair
Overview Session 1 Meeting Room 1
Date Time Title Speaker
4-Dec 16:20-16:40 A Decade of Progress in Sound Event Localization and Detection: Transforming Environmental Sound Analysis for Real-World Impact Woon-Seng Gan, Nanyang Technological University
16:40-17:00 Exploring the Forward-Forward Algorithm: A Novel Learning Approach Waleed H. Abdulla, The University of Auckland
17:00-17:20 Eye-gaze-based Human-Intention Detection Kosin Chamnongthai, King Mongkut's University of Technology Thonburi
17:20-17:40 From GPT Evolution to Enterprise Deployment: Key Trends in Generative AI Jing-Ming Guo, National Taiwan University of Science and Technology
17:40-18:00 An Overview of Online Distributed Kernel Methods for Supervised and Unsupervised Learning Anthony Kuh, University of Hawaii


Session Room Chair
Overview Session 2 Meeting Room 8
Date Time Title Speaker
5-Dec 10:20-10:40 An AI-based Diagnostic-aid for Epileptic Electroencephalography Toshihisa Tanaka, Tokyo University of Agriculture and Technology
10:40-11:00 Machine Learning for Analytics Architecture: AI to Design AI Video Chris Gwo Giun Lee, National Cheng Kung University
11:00-11:20 Compression of Large AI Models Weisi Lin, Nanyang Technological University
11:20-11:40 Introduction to Multi-Camera Systems and 3D Quality Assessment Sanghoon Lee, Yonsei University
11:40-12:00 Highlight of New Image Generative Models and Applications to Image Manipulations Wan-Chi Siu, Hong Kong Polytechnic University & St. Francis University

Session Room Chair
Overview Session 3 Merged Room (Room 10 + 11)
Date Time Title Speaker
6-Dec 9:00-9:20 Overview of Source Camera Identification Techniques Bonnie N. F. Law, The Hong Kong Polytechnic University
9:20-9:40 Recent Advances in Complete Quality Preserving Data Hiding KokSheik Wong, Monash University Malaysia
9:40-10:00 Real or Fake? Frontiers of Countering Fake Media in the Age of Infodemics Isao Echizen, National Institute of Informatics
10:00-10:20 User Preference Modeling and Analysis in Choice Problems H. Vicky Zhao, Tsinghua University


Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
04-12-2024 11:00-11:20 SRC-gAudio: Sampling-Rate-Controlled Audio Generation Li, Chenxing*; Xu, Manjie; Yu, Dong
11:20-11:40 Scale-invariant Online Voice Activity Detection under Various Environments Takeda, Ryu*; Komatani, Kazunori
11:40-12:00 Sound Quality Improvement in Visual Microphone by Emphasizing Focused Area Based on Focal Rate Nakano, Hayata*; Geng, Yuting; Iwai, Kenta; Nishiura, Takanobu
12:00-12:20 Deep-Learning-Based Speech Enhancement with Rough-Focused Optical Laser Microphone by Reconstructing Complex Spectrum Nakano, Yuki*; Geng, Yuting; Iwai, Kenta; Nishiura, Takanobu

Session Room Chair
Biomedical Signal Processing and Systems Room 2 -
Date Time Title Authors
04-12-2024 11:00-11:20 Bluemarble: Bridging Latent Uncertainty in Articulatory-to-Speech Synthesis with a Learned Codebook um, seyun*; Kim, Miseul; Kim, Doyeon; Kang, Hong-Goo
11:20-11:40 Iterative Demographic Attentional Feature Fusion-based CNN and Transformer Network for Accurate Cuffless Blood Pressure Estimation Tang, Liwen; Zheng, Dingchang; Chen, Fei*
11:40-12:00 Sampling Pattern Augmentation to Enhance Deep Learning-based Image Reconstruction of MRI Yamato, Kazuki*; Ito, Satoshi
12:00-12:20 Data Augmentation and Assessment for Enhanced Ovarian Tumor Classification Pham, Loan Thi*; Pham, Gia-Minh; Nguyen, Tien-Dat; Le, Hung Van; Pham, Chi-Mai; Le, Thi Lan; Vu, Duy-Hai; Vu, Hai; Tran, Thanh-Hai

Session Room Chair
Machine Learning and Data Analytics Room 3 -
Date Time Title Authors
04-12-2024 11:00-11:20 GMA: Green Multi-Modal Alignment for Image-Text Retrieval Yang, Tsung-Shan*; Wang, Yun-Cheng; Wei, Chengwei; You, Suya; Kuo, C.-C. Jay
11:20-11:40 Improving Semi-Supervised Object Detection by ROI-Enhanced Contrastive Learning Huang, Teng-Kuan Huang; Yeh, Mei-Chen*
11:40-12:00 Real-time Segmentation of Coronary Artery Calcification Using Spatial Attention and Parallel Convolution Asakawa, Tetsuya*; Hashimoto, Masashi; Miyaji, Takeshi; shimizu, kazuki; Nomura, Kei; Aono, Masaki
12:00-12:20 ViP-CBM: Reducing Parameters in Concept Bottleneck Models by Visual-Projected Embeddings Qi, Ji; Wang, Huisheng; Zhao, H. Vicky*

Session Room Chair
Machine Learning and Data Analytics Room 4 -
Date Time Title Authors
04-12-2024 11:00-11:20 Psychological Driving Style Estimation from GPS Sensor Data Alone Horimoto, Hiroto; Kimura, Ryusei; Tanaka, Takahiro; Okada, Shogo*
11:20-11:40 Adversarial Augmentation and Adaptation for Speech Recognition Chien, Jen-Tzung*; Sun, Wei-Yu
11:40-12:00 Empathetic Response Generation via Regularized Q-Learning Chien, Jen-Tzung*; Wu, Yi-Chien
12:00-12:20 Continual Learning with Self-Organizing Maps: A Novel Group-Based Unsupervised Sequential Training Approach Hirani, Gaurav R*; Wang, Kevin I-Kai; Abdulla, Waleed

Session Room Chair
Machine Learning and Data Analytics Room 5 -
Date Time Title Authors
04-12-2024 11:00-11:20 YOLO for High Resolution Images without Retraining Minami, Daisuke*; Nishikawa, Kiyoshi
11:20-11:40 Noise-Robust Estimation of Early-part Room Impulse Responses based on Physics-Informed Neural Network with Dynamic Pulling Method Kurata, Ken*; Sato, Gen; Tsunokuni, Izumi; Ikeda, Yusuke
11:40-12:00 A Multi-Domain Camera Model Identification Feature Restoration Network to Counter AI Compression Attacks jinkai, zhang*
12:00-12:20 Deep Learning-based Intraoperative Video Analysis for Cataract Surgery Instrument Identification Guo, Zhe*; Chan, Yuk Hee; Law, Ngai Fong

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
04-12-2024 11:00-11:20 GSBIQA: Green Saliency-guided Blind Image Quality Assessment Method Mei, Zhanxuan*; Wang, Yun-Cheng; Kuo, C.-C. Jay
11:20-11:40 AFSDet: Video Small Object Detection Based on Adaptive Focused Slicing Huang, Kangjian; Yang, Yan*; Jiang, Yongquan; Zhang, Xiaobo; Li, Zhuyi Angelina
11:40-12:00 Dual Motion Attention and Enhanced Knowledge Distillation for Video Frame Interpolation Zhang, Deng yong*; lou, runqi; Chen, Jiaxin; Liao, Xin; Yang, Gaobo; ding, xiangling
12:00-12:20 EavaNet: Enhancing Emotional Facial Expressions in 3D Avatars through Speech-Driven Animation um, seyun*; Lee, YongJu; Ko, WooSeok; Zhou, Yuan; Lee, Sangyoun; Kang, Hong-Goo

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
04-12-2024 11:00-11:20 On the Importance of Time and Pitch Relativity for Transformer-based Symbolic Music Generation Inaba, Tatsuro*; Yoshii, Kazuyoshi; Nakamura, Eita
11:20-11:40 Optimal Investment With Incomplete Information and Herd Effect Wang, Huisheng; Liu, Mingxiao; Qi, Ji; Zhao, H. Vicky*
11:40-12:00 YOLO-DC: Enhancing object detection with deformable convolutions and contextual mechanism Zhang, Deng yong*; Xu, Chuanzhen; Chen, Jiaxin; Liao, Xin
12:00-12:20 One-step Spectral Estimation for Euclidean Distance Matrix Approximation Li, Yicheng*; Sun, Xinghua

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
04-12-2024 11:00-11:20 SDNet: Noise-Robust Bandwidth Extension under Flexible Sampling Rates Yang, Junkang*; Liu, Hongqing; Gan, Lu; Zhou, Yi; Li, Xing; Jia, Jie; Yao, Jinzhuo
11:20-11:40 GLASS: Investigating Global and Local context Awareness in Speech Separation Ho, Kuan-Hsun*; Yu, En-Lun; Hung, Jeih-weih; Huang, Shih-Chieh; Chen, Berlin
11:40-12:00 Low-resource Language Adaptation with Ensemble of PEFT Approaches Kwok, Chin Yuen*; Li, Sheng; Yip, Jia Qi; Chng, Eng Siong
12:00-12:20 Diverse Time-Frequency Attention Neural Network for Acoustic Echo Cancellation Yao, Jinzhuo*; Liu, Hongqing; Zhou, Yi; Gan, Lu; Yang, Junkang

Session Room Chair
Speech and Language Processing Room 9 -
Date Time Title Authors
04-12-2024 11:00-11:20 LDMSE: Low Computational Cost Generative Diffusion Model for Speech Enhancement Nishi, Yuki*; Iwano, Koji; SHINODA, Koichi
11:20-11:40 MTFNet: Multi-Scale Transformer Framework for Robust Emotion Monitoring in Group Learning Settings Zhang, Yi*
11:40-12:00 Target Speaker Extraction Method by Emphasizing the Active Speech with an Additional Enhancer Yang, Xue; Bao, Changchun*; Zhang, Xu; Chen, Xianhong

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
04-12-2024 14:00-14:20 A Study on Multimodal Fusion and Layer Adapter in Emotion Recognition Shi, Xiaohan*; Gao, Yuan; He, Jiajun; Mi, Jinyi; LI, Xingfeng; Toda, Tomoki
14:20-14:40 Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation Wang, Xianrui*; Zhang, Shiqi; He, Bo; Makino, Shoji; Chen, Jingdong
14:40-15:00 Enhancing Neural Speech Embeddings for Generative Speech Models Kim, Doyeon*; Song, Yanjue; Madhu, Nilesh; Kang, Hong-Goo
15:00-15:20 Design of Spectrogram-Consistency Regularization Term Dependent on Observation in Independent Low-Rank Matrix Analysis for Blind Source Separation Kojima, Takaaki*; Takamune, Norihiro; Kitamura, Daichi; Saruwatari, Hiroshi
15:20-15:40 On Joint Dereverberation and Single Moving Source Separation with Online Source Steering Zhang, Yiting*; Mo, Kaien; Ueda, Tetsuya; Yang, Yichen; Makino, Shoji
15:40-16:00 New Perspectives and Insights on Distortionless Microphone Array Beamforming Zhang, Fan*; Benesty, Jacob; Pan, Chao; Chen, Jingdong

Session Room Chair
Biomedical Signal Processing and Systems Room 2 -
Date Time Title Authors
04-12-2024 14:00-14:20 Postoperative Delirium Prediction Based on Preoperative Electrocardiogram and Electroencephalogram Mito, Shogo; Miyajima, Miho; Tomioka, Hirofumi; Sato, Hitomi; Takeuchi, Takashi; Muto, Hitoshi; Kabasawa, Yuji; Harada, Hiroyuki; Eguchi, Kana; Kato, Shota; Kano, Manabu*
14:20-14:40 A method for classification NEO–FFI answers fabricated and advantageous due to psychological bias using brainwave specific brain activity networks ASHIKAWA, YUTO*; Ito, Takashi; Ishizu, Syohei; Kurihara, Yosuke
14:40-15:00 Effect of White Noise on Working Memory Using Event-Related Potentials Lee, Seung-won; LEE, Jun-Seok; Hwang, Han-Jeong*
15:00-15:20 Automated prediction of loudness growth curve using EEG signals Tiwari, Nitya*
15:20-15:40 Separation of Cardiopulmonary Sound Signals for Classification of Respiratory Diseases Zheng, Ruxin*
15:40-16:00 Performance Improvement of Single Plane-Wave Imaging Using U-Net and Discrete Wavelet Transform Shidara, Hiromi*; Miura, Kanta; Ishii, Takuro; Ito, Koichi; Aoki, Takafumi; Saijo, Yoshifumi ; Ohmiya, Jun

Session Room Chair
Multimedia Security and Forensics Room 5 -
Date Time Title Authors
04-12-2024 14:00-14:20 Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories Chen, Zongmei; Liao, Xin*; Wu, Xiaoshuai; Chen, Yanxiang
14:20-14:40 A Document Presentation Attack Detection Scheme with Optical Flow under a Flashlight Chen, Changsheng*; Chen, Wenyu; Chen, Ximin; Li, Haodong
14:40-15:00 Robust Image Watermarking Scheme under Halftone Distortion with Surrogate Model Chen, Changsheng*; Li, Xijin
15:00-15:20 Physical Domain Adversarial Attacks Against Source Printer Image Attribution Purnekar, Nischay*; Tondi, Benedetta; Barni, Mauro
15:20-15:40 A Diffusion-Based Approach for Restoring Face-swapped Images Niu, Yuanchen; Li, Yuanman*; Zhang, Guijia; Li, Xia
15:40-16:00 AI-generated image detectors are surprisingly easy to mislead... for now Lyu, Zihang*; Xiao, Jun; Zhang, Cong; Lam, Kin-Man

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
04-12-2024 14:00-14:20 Green Video Camouflaged Object Detection Wang, Xinyu*; Chen, Hong-Shuo; Zhou, Zhiruo; You, Suya; Madni, Azad; Kuo, C.-C. Jay
14:20-14:40 A Survey on Objective Quality Assessment of Omnidirectional Images Sui, Xiangjie*; Wang, Shiqi ; Fang, Yuming
14:40-15:00 Enhancing YOLOv7 with GLF-Trans for Precision in Small Object Detection Yoshikawa, Naohito*; Ikehara, Masaaki
15:00-15:20 Ablation Study to Derive a Computationally Efficient Deep Learning-Based Super-Resolution Approach Jamil, Asfa*; Artusi, Alessandro
15:20-15:40 Adaptive Spatial Re-sampling Method for Video Coding for Machines An, Eunbin; Kim, Ayoung; Jung, Soon Heung; Choo, Hyon-Gon; Seo, Kwang-Deok*
15:40-16:00 Rotation Invariant Spatio-Spectral Total Variation for Hyperspectral Image Denoising Takemoto, Shingo*; Ono, Shunsuke

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
04-12-2024 14:00-14:20 Multi-Channel Fusion Human Activity Recognition Algorithm Based on Millimeter-Wave Radar Zhu, Junda*; Guo, Shisheng; Tang, Longzhen; Guolong, Cui
14:20-14:40 Optimizing Computational Efficiency: In-Memory Computing with Dynamic Switching Huang, Chao-Ting*; Tsai, Kun-Lin
14:40-15:00 Modeling and Analysis of the Interaction between Opinions and Actions among Heterogeneous Agents Zhang, Hangjing; Zhao, H. Vicky*
15:00-15:20 Adaptive Subspace Clustering for Matrix Completion Wada, Takuto*; Sasaki, Ryohei; Konishi, Katsumi
15:20-15:40 A High-Isolation Sub-6 GHz In-Band Full-Duplex Communication System shi, chengzhe*; Pan, Wensheng; Ma, Wanzhi; Liu, Ying; Xu, Qiang; Zhang, Zhiya; Shao, Shihai
15:40-16:00 Generalized Graph Signal Sampling under Subspace Priors by Difference-of-Convex Minimization Yamashita, Keitaro*; Naganuma, Kazuki; Ono, Shunsuke

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
04-12-2024 14:00-14:20 GE2E-AC: Generalized End-to-End Loss Training for Accent Classification Watanabe, Chihiro*; Kameoka, Hirokazu
14:20-14:40 Efficient Feature Selection for Word Embedding Dimension Reduction Xue, Jintang*; Wang, Yun-Cheng; Wei, Chengwei; Kuo, C.-C. Jay
14:40-15:00 Fine-Grained Quantitative Emotion Editing for Speech Generation Inoue, Sho*; Zhou, Kun; Wang, Shuai; Li, Haizhou
15:00-15:20 Improving Speaker Consistency in Speech-to-Speech Translation Using Speaker Retention Unit-to-Mel Techniques Zhou, Rui*
15:20-15:40 Speech Separation using Neural Audio Codecs with Embedding Loss Yip, Jia Qi*; Kwok, Chin Yuen; Ma, Bin; Chng, Eng Siong
15:40-16:00 Speech Synthesis from IPA Sequences through EMA Data Maruyama, Koki*; Sawada, Shun; Ohmura, Hidefumi; Katsurada, Kouichi

Session Room Chair
Speech and Language Processing Room 9 -
Date Time Title Authors
04-12-2024 14:00-14:20 BEES: A New Acoustic Task for Blended Emotion Estimation in Speech LI, Xingfeng*; Shi, Xiaohan; Si, Yuke; Zhang, Zilong; Cui, Feifei; Li, Yongwei; Liu, Yang; Unoki, Masashi; Akagi, Masato
14:20-14:40 Is Corpus Truth for Human Perception?: Quality Assessment of Voice Response Timing in Conversational Corpus through Timing Replacement Yoshikawa, Sadahiro*; Ishii, Ryo; Okada, Shogo
14:40-15:00 Enhancing Branchformer with Dynamic Branch Merging Module for Code-Switching Speech Recognition Hu, Hong-Jie*; Chen, Chia-Ping
15:00-15:20 Optimizing Multi-Speaker Speech Recognition with Online Decoding and Data Augmentation Strategies Peng, Yizhou*; Chng, Eng Siong
15:20-15:40 Adapting OpenAI's Whisper for Speech Recognition on Code-Switch Mandarin-English SEAME and ASRU2019 Datasets Yang, Yuhang; Peng, Yizhou*; Huang, Hao; Chng, Eng Siong; Zhong, Xionghu

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
04-12-2024 16:20-16:40 A Low-Complexity Adaptive Beamformer for Joint Reverberation and Noise Suppression Zhang, Fan*; Pan, Chao; Chen, Jingdong; Benesty, Jacob
16:40-17:00 Multichannel Speech Enhancement Using Complex-Valued Graph Convolutional Networks and Triple-Path Attentive Recurrent Networks Shen, Xingyu; Zhu, Prof. Wei-Ping*
17:00-17:20 Anomalous Machine Sound Detection Based on Time Domain Gammatone Spectrogram Feature and IDNN Model Hafiz, Primanda Adyatma*; Mawalim, Candy Olivia; Puji Lestari, Dessi; Sakti, Sakriani; Unoki, Masashi
17:20-17:40 Unsupervised Anomalous Sound Detection Using Timbral and Human Voice Disorder-Related Acoustic Features Akbar Hashemi Rafsanjani, Malik*; Mawalim, Candy Olivia; Lestari, Dessi Puji; Sakti, Sakriani; Unoki, Masashi
17:40-18:00 Real-Time Monophonic Dual-Pitch Extraction Model Tran, Ngoc-Son; Hsieh, Pei-Chin; Shen, Yih-Liang*; Chu, Yen-Hsun; Chi, Tai-Shih

Session Room Chair
Biomedical Signal Processing and Systems Room 2 -
Date Time Title Authors
04-12-2024 16:20-16:40 Predictive Analysis of Driver Drowsiness Progression: Multi-Level Drowsiness Classification Using Physiological Signals Dachoponchai, Natchira; Wongsawat, Yodchanan; Arnin, Jetsada*
16:40-17:00 Feature Extraction for Machine Learning-based Sleep Stage Classification Using PPG-Derived Parameters and Skin Temperature Buaruk, Suphachok; Thanaviratananich, Sikawat; Treesuthacheep, Peerasit; Deepaisarn, Somrudee*
17:00-17:20 Parameterizing Hierarchical Particle Filters with Concept Drift for Time-varying Parameter Estimation Murphy, Joshua*; Rosato, Conor; Millard, Andrew; Maskell, Simon
17:20-17:40 Pop Noise Detection Using Group Delay Cepstral Coefficients Shah, Arth Juhul*; Patil, Hemant
17:40-18:00 Novel Estimators for the Number of Susceptible Individuals in SIR Models of Infectious Epidemics van Wyk, Anton; McDonald, Andre M*; Rubin, David; Zhang, FangFang

Session Room Chair
Multimedia Security and Forensics Room 5 -
Date Time Title Authors
04-12-2024 16:20-16:40 A Study on Variable Embedding Locations of Reversible Spectral Speech Watermarking HUANG, Xuping*; Ito, Akinori
16:40-17:00 Normalizing Flows-Based Latent Variable Rearrangement for Generative Image Steganography Wu, Sifan*; Dong, Li
17:00-17:20 Detecting Spoof Voices in Asian Non-Native Speech: An Indonesian and Thai Case Study Adila, Aulia*; Mawalim, Candy Olivia; Unoki, Masashi
17:20-17:40 Privacy-Preserving Anomaly Detection in Bitstream Video based on Gaussian Mixture Model Chen, Yike; Song, Yuru; Zheng, Peijia *; Du, Yusong; Luo, Weiqi
17:40-18:00 Source Attribution for Images Generated by Diffusion-Based Text-to-Image Models: Exploring the Forensics Approach Jiang, Xinqi; Tian, Jinyu*

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
04-12-2024 16:20-16:40 Hyperspectral Unmixing With Row-Sparsity Enhancement: A Difference-of-Convex Approach Naganuma, Kazuki*; Ono, Shunsuke
16:40-17:00 How Accurate Can Large Vision Language Model Perform for Images with Compression Degradation? Fang, Xiaohan*; CHEN, PEILIN; Wang, Meng; Wang, Shiqi
17:00-17:20 Enhanced RefineDNet for Single Image Dehazing Ren, Jingyu*
17:20-17:40 Tsnake: A Time-Embedded Recurrent Contour-Based Instance Segmentation Model Hsu, Chen-Jui; Ding, Jian-Jiun*; Shih, Chun-Jen

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
04-12-2024 16:20-16:40 Affine Combination of General Adaptive Filters Jin, Danqi*; Chen, Yitong; Chen, Jie; Huang, Gongping
16:40-17:00 An Annealing-Inspired Gradient-Descent Based Suboptimal Solver for Combinatorial Problems Shu Ping, Chang; Lee, Cheng-Che; Lee, Hsin-Jung; Kuan, Chieh-Hsiung; Young, Jason Gemsun; Yao, Chia-Yu; Ding, Jian-Jiun*
17:00-17:20 A Solution For Anomaly Detection of Red Beans In A Product Processing Line Nguyen, Duc Hai; Do, Hiep Trong; Nguyen, Hoang-Linh-Phuong; Nguyen, Quoc-Khanh; Tran, Duc-Tan; Bui, Tien Son Tien; Nguyen, VanToi*
17:20-17:40 A Novel kind of WVD Associated with the Linear Canonical Transform Peng, Jia-Yin; Chen, Jian-Yi; Li, Bing-Zhao*
17:40-18:00 A Discrete-Valued Signal Estimation by Nonconvex Enhancement of SOAV with cLiGME Model Shoji, Satoshi*; Yata, Wataru; Kume, Keita; Yamada, Isao

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
04-12-2024 16:20-16:40 Frequency & Channel Attention Network for Small Footprint Noisy Spoken Keyword Spotting Lin, Yuanxi*; Gapanyuk, Yuriy E
16:40-17:00 Long Audio File Speaker Diarization with Feasible End-to-End Models Huang, Kai-Wei*; Chen, Chia-Ping
17:00-17:20 Analysis of Various Self-Supervised Learning Models for Automatic Pronunciation Assessment Lee, Haeyoung*; Kim, Sunhee; Chung, Minhwa
17:20-17:40 Band-Split Inter-SubNet: Band-Split with Subband Interaction for Monaural Speech Enhancement Pan, Yen-Chou; Shen, Yih-Liang*; Liao, Yuan-Fu; Chi, Tai-Shih
17:40-18:00 Speech Dereverberation with Deconvolution Regularized by Denoising Hu, Haonan; Yang, Ziye; Chen, Jie*; Zhang, Lijun

Session Room Chair
Speech and Language Processing Room 9 -
Date Time Title Authors
04-12-2024 16:20-16:40 Domain Adaptation by Alternating Learning of Acoustic and Linguistic Information for Japanese Deaf and Hard-of-Hearing People Takahashi, Kaito*; Wakabayashi, Yukoh; Ohta, Kengo; Kobayashi, Akio; Kitaoka, Norihide
16:40-17:00 Speech emotion recognition based on crossmodal transformer and attention weight correction Terui, Ryusei*; Yamada, Takeshi
17:00-17:20 Unsupervised Discovery of Non-Categorical L2 Error Patterns Using Wav2Vec2.0 Code Vectors Hong, Eunsoo*; Kim, Sunhee; Chung, Minhwa
17:20-17:40 An Effective Contextualized Automatic Speech Recognition Approach Leveraging Self-Supervised Phoneme Features Pai, Li-Ting*; Wang, Yi-Cheng; Yan, Bi-Cheng; Wang, Hsin-Wei; Lu, Jia-Liang; Lin, Chi-Han; Xu, Juan-Wei ; Chen, Berlin
17:40-18:00 COIN-AT-PVAD: A Conditional Intermediate Attention PVAD Yu, En-Lun*; Ruei-Xian, Chang; Hung, Jeih-weih; Huang, Shih-Chieh; Chen, Berlin

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
05-12-2024 10:20-10:40 Wind Noise Reduction with Orthogonal Polynomial Expansion Du, Li*; Zhang, Lijun
10:40-11:00 Few-Shot Open-Set Keyword Spotting with Multi-Stage Training Li, LoYa*; Lo, Tien-Hong; Hung, Jeih-weih; Huang, Shih-Chieh; Chen, Berlin
11:00-11:20 Self-Supervised Augmented Diffusion Model for Anomalous Sound Detection Yin, Jiawei; gao, yu*; Zhang, Wenbin; Zhang, Mingjun
11:20-11:40 Murmur Separation and Classification from Heart Sound Using Constrained Singular Spectrum Analysis and Wavelet Transform Qi, Yuanyang*; Sanei, Saeid
11:40-12:00 A Non-Intrusive Speech Quality Assessment Model using Whisper and Multi-Head Attention Lin, Guojian; Tsao, Yu; Chen, Fei*

Session Room Chair
Emerging Technologies and Applications Of Image Processing And Computer Vision Room 3 -
Date Time Title Authors
05-12-2024 10:20-10:40 Confidence-Aware Learning for Person Re-identification with Noisy Labels Kim, Duhyun*; Sim, Jae-Young
10:40-11:00 Test-Time Optimization for Post-Processing of Compressed Videos Kim, Hongil; Han, Changwoo; Kim, Donghyun; Lim, Sung-Chang; Jung, Seung-Won*
11:00-11:20 Lifelong Person Re-Identification with Backward-Compatibility Oh, Minyoung; Sim, Jae-Young*
11:20-11:40 Enhancing Semiconductor X-RAY Images: A Framework Combining Denoising and Super-Resolution Modules With a Novel Dataset Shim, Jae Hoon*; Kim, Min Woo; Lee, Sang Hwa; Cho, Nam Ik
11:40-12:00 Monocular Depth Estimation for Autonomous Driving Based on Instance Clustering Guidance Kim, Dahyun*; Jin, Dongkwon; Kim, Chang-Su

Session Room Chair
Advanced Topics on Sound Event and Scene Analysis Room 4 -
Date Time Title Authors
05-12-2024 10:20-10:40 Multi-Modal Video Summarization Based on Two-Stage Fusion of Audio, Visual, and Recognized Text Information Yang, Zekun*; He, Jiajun; Toda, Tomoki
10:40-11:00 Prediction-error-based Adaptive SpecAugment for Fine-tuning the Masked Model on Audio Classification Tasks Zhang, Xiao*; XING, HAORAN; Song, Mingxue; Takeuchi, Daiki; Harada, Noboru; Makino, Shoji
11:00-11:20 Synchronization of Signals with Sampling Rate Offset and Missing Data Using Dynamic Programming Matching Takeuchi, Hayato*; Ono, Nobutaka
11:20-11:40 LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators? Koga, Naoki; Bando, Yoshiaki; Imoto, Keisuke*
11:40-12:00 SSL-based Chewing and Swallowing Detection Using Multiple Skin-contact Microphones Tsukagoshi, Toshihiro*; Koiwai, Kazuhiro; Nishida, Masafumi; Nishimura, Masafumi

Session Room Chair
Recent Advances in Multimedia Enrichment and Security Room 5 -
Date Time Title Authors
05-12-2024 10:20-10:40 Enhancing Security Using Random Binary Weights in Privacy-Preserving Federated Learning Sawada, Hiroto*; Imaizumi, Shoko ; Kiya, Hitoshi
10:40-11:00 Estimation of rotation angle and anisotropic scaling rate using pilot signals for watermarking Kawano, Rinka*; Kawamura, Masaki
11:00-11:20 On the Security of Bitstream-level JPEG Encryption with Restart Markers Hirose, Mare*; Imaizumi, Shoko ; Kiya, Hitoshi
11:20-11:40 Improved Ultimate Link without Markers for Projective Transformation Yamadera, Keiji; Niimi, Michiharu*
11:40-12:00 Detection of Diffusion-Generated Images Using Sparse Coding Tanaka, Daishi; Niimi, Michiharu*

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
05-12-2024 10:20-10:40 Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals Mi, Jinyi*; Kim, Sehun; Toda, Tomoki
10:40-11:00 Ev3DGS:Event Enhanced 3D Gaussian Splatting from Blurry Images Huang, Junwu; Wan, Zhexiong; Lu, Zhicheng; Zhu, Juanjuan; He, Mingyi; Dai, Yuchao*
11:00-11:20 New Abnormal Behavior Detection for Patient Surveillance System Han, Yujin; kim, taewan*
11:20-11:40 Utilizing Cross Layer Attentions for Semantic Segmentation of Small Objects Lu, Chi-Hsuan; Chung, Yu-Hsien; Cho, Jung-Hui; Yu, Chih-Chang*
11:40-12:00 Music2Fail: Transfer Music to Failed Recorder Style Leong, Chon In*; Chung, I-Ling; Chao, Kin Fong; Wang, Jun-You; Yang, Yi-Hsuan; Jang, Roger

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
05-12-2024 10:20-10:40 U-Mamba-Net: A highly efficient Mamba-based U-net style network for noisy and reverberant speech separation Dang, Shaoxiang*; Matsumoto, Tetsuya; Takeuchi, Yoshinori; Kudo, Hiroaki
10:40-11:00 Graph Filter Transfer for Time-Varying Signal Estimation Between Two Networks Fukuhara, Tsutahiro*; Hara, Junya; Higashi, Hiroshi; Tanaka, Yuichi
11:00-11:20 Few-Shot Audio Classification Model for Detecting Classroom Interactions Using LaSO Features in Prototypical Networks Iqbal, Md Rashed*; Ritz, Christian; Yang, Jie
11:20-11:40 Subset Random Sampling of Finite Time-vertex Graph Signals Sheng, Hang; Shu, Qinji; FENG, HUI*; Hu, bo
11:40-12:00 Dynamic Sensor Placement on Graphs Based on Graph Signal Sampling Theory Nomura, Saki*; Hara, Junya; Higashi, Hiroshi; Tanaka, Yuichi

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
05-12-2024 10:20-10:40 Can We Estimate Purchase Intention Based on Zero-shot Speech Emotion Recognition? Nagase, Ryotaro; Sumiyoshi, Takashi; Yamashita, Natsuo; Dohi, Kota; Kawaguchi, Yohei*
10:40-11:00 Assessment and Improvement of Customer Service Speech with Multiple Large Language Models Watanabe, So; Leow, Chee Siang*; Hoshino, Junichi; Utsuro, Takehito; Nishizaki, Hiromitsu
11:00-11:20 JAM: A Unified Neural Architecture for Joint Multi-granularity Pronunciation Assessment and Phone-level Mispronunciation Detection and Diagnosis Towards a Comprehensive CAPT System He, Yue-Yang*; Yan, Bi-Cheng; Lo, Tien-Hong; Lin, Meng-Shin; Hsu, Yung-Chang; Chen, Berlin
11:20-11:40 Data Augmentation Methods and Influence of Speech Recognition Performance for TED Talk's English to Japanese Speech Translation Masuda, Kento*; Yamamoto, Kazumasa; nakagawa, seiichi
11:40-12:00 Empower Typed Descriptions by Large Language Models for Speech Emotion Recognition Wu, Haibin; Chou, Huang-Cheng*; Chang, Kai-Wei; Goncalves, Lucas; Du, Jiawei; Jang, Jyh-Shing Roger; Lee, Chi-Chun; Lee, Hung-yi

Session Room Chair
Advanced Signal Processing for Information Collection and Data Analysis in Wireless Environmental Sensing Room 9 -
Date Time Title Authors
05-12-2024 10:20-10:40 Data-Driven Tuning for Weighted Least Square of BLE-AoA-based Indoor Localization Ohashi, Ginji; Ibi, Shinsuke*; Takahashi, Takumi; Iwai, Hisato
10:40-11:00 Observation of the terrestrial radio environment using the low earth orbit satellite constellation Obata, Takatoshi*; Takyu, Osamu; Inage, Kei; Fujii, Takeo; Yoshida, Kohei; Ariyoshi, Masayuki
11:00-11:20 Deep Unfolding Aided Parameter Optimization for Multi-task Diffusion LMS Algorithm Tong, Xiaoqing*; Hayashi, Kazunori
11:20-11:40 Reduced-dimensional MUSIC Algorithm for Frequency Diverse Array in MIMO Radar System Zhu, Beizuo*; Hayashi, Kazunori; Mori, Hiroki
11:40-12:00 Collection of Correlated Information from Superimposed Multiple Chirp Signals Aoyama, Koki*; Adachi, Koichi

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
05-12-2024 14:00-14:20 EEND-EM: End-to-End Neural Speaker Diarization with EM-Network Woo, Beom Jun*; Yoon, Ji Won; Han, Min Hyun; Moon, Chan Yeong; Kim, Nam Soo
14:20-14:40 Multi-Task Learning Approaches for Music Similarity Representation Learning Based on Individual Instrument Sounds Imamura, Takehiro*; Hashizume, Yuka; Toda, Tomoki
14:40-15:00 Personal Voice Activity Detection With Ultra-Short Reference Speech Xu, Longting; Zhang, Mingjun; Zhang, Wenbin; Wang, Tianyi; Yin, Jiawei; gao, yu*
15:00-15:20 An Investigation on the Speech Recovery from EEG Signals Using Transformer Mizuno, Tomoaki*; Kishida, Takuya; Yoshimura, Natsue; Nakashika, Toru

Session Room Chair
Audio Processing Room 2 -
Date Time Title Authors
05-12-2024 14:00-14:20 WavLM and Omni-Scale CNNs: Enhancing Boundary Detection in Partially Spoofed Audio Li, Menghan*; Huang, Zhihua
14:20-14:40 Semi-Supervised Far-Field Speaker Verification with Distance Metric Domain Adaptation Wang, Han*; He, Mingrui; Zhang, Mingjun; Xu, Longting
14:40-15:00 Non-Target Conversion Based Speech Steganography for Secure Speech Communication System Zhang, Mingjun; Feng, Yan; gao, yu; Xu, Longting*
15:00-15:20 Enhancing Acoustic Scene Classification with Layer-wise Fine-Tuning on the SSAST Model Hao, Shuting*; Saito, Daisuke; Minematsu, Nobuaki

Session Room Chair
High Performance Image and Video Processing and Applications Room 3 -
Date Time Title Authors
05-12-2024 14:00-14:20 Forward Prediction-Guided Cross-Partition Targeted Pruning for VVenC Tang, Jingyuan*; Sun, Songlin
14:20-14:40 Contrastive Learning Based Knowledge Distillation for Enhancing Defect Detection Guo, Jing-Ming; Yuan, Lun-Da; HUANG, CIAN*; Zeng, Yi-Chong
14:40-15:00 Screen Content Encoding Network Based on Deep Contextual Information Gong, Tianyu*; Zhang, Tao; Zhong, Ye; Zhang, Mengmeng; Bai, Huihui
15:00-15:20 A Coarse-to-Fine Change Detection Framework for Remote Sensing Sparse Cultivated Land hu, yuan*; Zhang, Yifan; Ma, Mingyang; Mei, Shaohui

Session Room Chair
New Frontiers in Biometric Authentication Room 4 -
Date Time Title Authors
05-12-2024 14:00-14:20 A Quasilinear-Time CVP Algorithm for Triangular Lattice Based Fuzzy Extractors and Fuzzy Signatures Takahashi, Kenta*; Nakamura, Wataru
14:20-14:40 Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling Okano, Masora*; Ito, Koichi; Nishigaki, Masakatsu; Ohki, Tetsushi
14:40-15:00 Multibiometrics Using a Single Face Image Ito, Koichi*; Tonosaki, Taito; Aoki, Takafumi; Ohki, Tetsushi; Nishigaki, Masakatsu
15:00-15:20 Multi-Observed Authentication: A secure and usable authentication based on multi-point observation of a single physical credential Hatakeyama, Wataru*; Nozaki, Shinnosuke; Serizawa, Ayumi; Yoshirira, Mizuho; Fujita, Masahiro; Yoshimura, Ayako; Ohki, Tetsushi; Nishigaki, Masakatsu

Session Room Chair
Recent Advances in Multimedia Enrichment and Security Room 5 -
Date Time Title Authors
05-12-2024 14:00-14:20 Generation of Target Speech with Speaker Individuality Based on Accent Conversion for English Pronunciation Learning Hamakawa, Rei; Niimi, Michiharu*
14:20-14:40 Proposal of Blind Extractable Additive Video Watermarking Method Harada, Nao*; Kawano, Rinka; Kawamura, Masaki
14:40-15:00 Transfer-Based Adversarial Attack Against Multimodal Models by Exploiting Perturbed Attention Region Disabato, Raffaele*; Maung Maung, April Pyone; Nguyen, Huy Hong; Echizen, Isao
15:00-15:20 A Permutation-based Reversible Data Hiding Method with Zero Visual Distortion Zhu, Wendi*; Wong, KokSheik; Kuribayashi, Minoru

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
05-12-2024 14:00-14:20 VietSing: A High-quality Vietnamese Singing Voice Corpus Vu, Minh Duc*; Wei, Zhou; Bhattarai, Binit; Teh, Kah Kuan; Dat, Tran Huy
14:20-14:40 Inertial Strengthened CLIP model for Zero-shot Multimodal Egocentric Activity Recognition He, Mingzhou; Wang, Haojie; Zhou, Shuchang; Wu, Qingbo*; Ngan, King Ngi; Meng, Fanman; Li, Hongliang
14:40-15:00 Optimization of the Intensity Aware Loss for Dynamic Facial Expression Recognition Lau, Davy Tec-Hinh; Ding, Jian-Jiun*; Muller, Guillaume
15:00-15:20 Dictionary Learning Based Two-stage Near-lossless Video Compression Zhang, Zuhai; Jia, Luheng*; Song, Li; Zhu, Shuyuan; Guo, Yuanfang; Jia, Kebin

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
05-12-2024 14:00-14:20 Dictionary Learning for Directed Graph Signals via Augmented GFT Naito, Tsubasa*; Ito, Ryuto; Tanaka, Yuichi; Muramatsu, Shogo
14:20-14:40 Robust Quantile Regression Under Unreliable Data Shoji, Yoshifumi*; Yukawa, Masahiro
14:40-15:00 Ensemble learning based head-related transfer function personalization using anthropometric features Shen, Yih-Liang*; Chi, Tai-Shih
15:00-15:20 Blind Estimation of Room Volume from Reverberant Speech Based on the Modulation Transfer Function Siripool, Nutchanon*; kongprawechnon, Waree; Unoki, Masashi

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
05-12-2024 14:00-14:20 Disentangling Speaker Representations from Intuitive Prosodic Features for Speaker-Adaptative and Prosody-Controllable Speech Synthesis Pengyu, Cheng*
14:20-14:40 A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker’s Shadowings Geng, Haopeng *; Saito, Daisuke; Minematsu, Nobuaki; Geng, Haopeng
14:40-15:00 EADSum: Element-Aware Distillation for Enhancing Low-Resource Abstractive Summarization Lu, Jia-Liang*; Yan, Bi-Cheng; Wang, Yi-Cheng; Lo, Tien-Hong; Wang, Hsin-Wei; Pai, Li-Ting; Chen, Berlin
15:00-15:20 A Tiny Whisper-SER: Unifying Automatic Speech Recognition and Multi-label Speech Emotion Recognition Tasks Chou, Huang-Cheng*

Session Room Chair
Advancements in Biosignal Decoding and Neuromodulation for Human Function Enhancement Room 9 -
Date Time Title Authors
05-12-2024 14:00-14:20 Context-FFT: A Context Feed Forward Transformer Network for EEG-based Speech Envelope Decoding Chen, Ximin; Ding, Yuting; Yan, Nan; Chen, Changsheng; Chen, Fei*
14:20-14:40 Effect of Dynamic Binaural Beats on Concentration Enhancement LEE, Jun-Seok; Lee, Yun-Sung; Hwang, Han-Jeong*
14:40-15:00 EEG-based Evaluation of Enjoyment Emotion during cognitive-motor task Aoki, Haruna*; Zhang, Sinan; Ono, Yumie
15:00-15:20 Exploring Brain Connectivity Patterns and Cognitive Resilience in Aging: A Study with the LEMON Dataset ks, Kapeleshh*; Wei, Chen; Domer, Prince Aldrin; Ji, Hong

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
05-12-2024 16:40-17:00 Experimental Evaluation of Speech Enhancement for In-Car Environment Using Blind Source Separation and DNN-based Noise Suppression Takeuchi, Yutsuki*; Nakashima, Taishi; Ono, Nobutaka; Takazawa, Takashi; Shimanoe, Shuhei; Tsuchiya, Yoshinori
17:00-17:20 Auxiliary-Function-Based Steering Vector Estimation Method for Spatially Regularized Independent Low-Rank Matrix Analysis Hirata, Sota*; Takamune, Norihiro; Yamaoka, Kouei; Kitamura, Daichi; Saruwatari, Hiroshi; Takahashi, Yu; KONDO, Kazunobu
17:20-17:40 Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions Mi, Jinyi*; Shi, Xiaohan; Ma, Ding; He, Jiajun; Fujimura, Takuya; Toda, Tomoki
17:40-18:00 Data generation for speaker diarization by speaker transition information Ichikawa, Keigo*; Ueno, Sei; Lee, Akinobu

Session Room Chair
Audio Processing Room 2 -
Date Time Title Authors
05-12-2024 16:40-17:00 Generating Room Impulse Responses Using Neural Networks Trained with Weighted Combinations of Acoustic Parameter Loss Functions Ren, Hualin*; Ritz, Christian; Zhao, Jiahong; Zheng, Xiguang; Jang, Daeyoung
17:00-17:20 Audio Similarity Detection Malhotra, Siddharth; Mankad, Sapan H*
17:20-17:40 Towards a B-format Ambisonic Room Impulse Response Generator Using Conditional Generative Adversarial Network Ren, Hualin*; Ritz, Christian; Zhao, Jiahong; Zheng, Xiguang; Jang, Daeyoung
17:40-18:00 What to Refer and How? - Exploring Handling of Auxiliary Information in Target Speaker Extraction Hayashi, Tomohiro*; Ogino, Riku; Saijo, Kohei; Ogawa, Tetsuji

Session Room Chair
High Performance Image and Video Processing and Applications Room 3 -
Date Time Title Authors
05-12-2024 16:40-17:00 Efficient Adaptation for Real-World Omnidirectional Image Super-Resolution Yang, Cuixin*; Dong, Rongkang; Lam, Kin-Man
17:00-17:20 More Direct and stage-wise network for Face Super Resolution Horiguchi, Yohei*
17:20-17:40 Camera Focal Length Prediction for Neural Novel View Synthesis from Monocular Video Chakraborty, Dipanita*; Chiracharit, Werapon; Chamnongthai, Kosin; Okada, Minoru
17:40-18:00 Scene-Segmentation-Based Exposure Compensation for Tone Mapping of High Dynamic Range Scenes Kinoshita, Yuma*; Kiya, Hitoshi

Session Room Chair
Wireless Communications and Networking Room 4 -
Date Time Title Authors
05-12-2024 16:40-17:00 Combining PTS Technique with Polar Coding for OFDM Systems He, Ching-Huan; CHEN, HOUSHOU*; Zhang, Jia-Chun; Tseng, Chih-Kai
17:00-17:20 Blind Self-Interference Analog Canceller with Differential Delay for Backscatter Communications Nishikawa, Koichi; Ibi, Shinsuke*; Takahashi, Takumi; Iwai, Hisato
17:20-17:40 IoT-based Smart Attendance System using Face Recognition and Motion Detection Saadon, Umi Syamimi*; Lim, Chern Hong

Session Room Chair
Recent Advances in Multimedia Enrichment and Security Room 5 -
Date Time Title Authors
05-12-2024 16:40-17:00 Generation of Photo Slideshow with Song based on Closeness between Concept of Lyrics and That of Images Hashimoto, Mei; Niimi, Michiharu*
17:00-17:20 Disposable-key-based image encryption for collaborative learning of Vision Transformer Aso, Rei*; Shiota, Sayaka; Kiya, Hitoshi
17:20-17:40 Significance of Lower Frequency Regions for Audio Deepfake Detection Shah, Arth Juhul*; Patil, Hemant
17:40-18:00 EAViT: External Attention Vision Transformer for Audio Classification Iqbal, Aquib; Zim, Abid Hasan; Tonmoy, Md Asaduzzaman; Zhou, Limengnan ; Malik, Asad*; Kuribayashi, Minoru

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
05-12-2024 16:40-17:00 A Two-Stage Method for 3D Architecture Wireframe Reconstruction from Airborne LiDAR Point Cloud Zhang, Jiahao; Liu, Qi*; Hui, Le; Dai, Yuchao
17:00-17:20 A Two-Stage Method for 3D Architecture Wireframe Reconstruction from Airborne LiDAR Point Cloud Zhang, Jiahao; Liu, Qi*; Hui, Le; Dai, Yuchao
17:20-17:40 Secure Moving Object Detection Transformer in Compressed Video with Feature Fusion Song, Yuru; Chen, Yike; Zheng, Peijia *; Du, Yusong; Luo, Weiqi
17:40-18:00 NeRF-FCM: Attention-based Feature Calibration Mechanisms for 3D Object Detection Using NeRF Goshu, Hana Lebeta*; Xiao, Jun; Chan, Kin-Chung; Zhang, Cong; Gemeda, Mulugeta Tegegn; Lam, Kin-Man

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
05-12-2024 16:40-17:00 Robust Adaptive Filtering Based on Adaptive Projected Subgradient Method: Moreau Enhancement of Distance Function Sawada, Daiki; Yukawa, Masahiro*
17:00-17:20 Significance of Entropy Based Features For Dysarthric Severity Level Classification Avula, Meghana*; Pusuluri, Aditya; Patil, Hemant
17:20-17:40 Incorporating Auditory Processing into Undergraduate Signal Processing Courses to Enhance Student Learning Nie, Kaibao *
17:40-18:00 A Real-Time Platform for Portable and Scalable Active Noise Mitigation for Construction Machinery Peksi, Santi; Gan, Woon Seng *; Lai, Chung Kwan; Lee, Yen Theng ; Shi, Dongyuan; Lam, Bhan

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
05-12-2024 16:40-17:00 A Comparative Study on the Biases of Age, Gender, Dialects, and L2 speakers of Automatic Speech Recognition for Korean Language Na, Jonghwan; Park, Yeseul; Lee, Bowon*
17:00-17:20 NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec Nakata, Wataru*; Saeki, Takaaki; Saito, Yuki; Takamichi, Shinnosuke; Saruwatari, Hiroshi
17:20-17:40 Targeted Representation with Information Disentanglement Encoding Networks in Tasks Nagawaki, Takumi*; Ikeda, Keisuke; Tamura, Satoshi; Chike, Kohei; Nagano, Hiroyuki; Nose, Masaki
17:40-18:00 PG-MDD: Prompt-Guided Mispronunciation Detection and Diagnosis Leveraging Articulatory Features Lin, Meng-Shin*; Yan, Bi-Cheng; Lo, Tien-Hong; Wang, Hsin-Wei; He, Yue-Yang; Chao, Wei-Cheng; Chen, Berlin

Session Room Chair
Advancements in Biosignal Decoding and Neuromodulation for Human Function Enhancement Room 9 -
Date Time Title Authors
05-12-2024 16:40-17:00 Effect of Phase-Locked Transcranial Alternating Current Stimulation on Vocal tremor WANG, JUNTING*; Koganemaru, Satoko; Shima, Atsushi; Cao, Yedi; Hirakawa, Kana; Iwagana, Ken; Suehiro, Atsushi; Maekawa, Keiko; Mima, Tatsuya; Ono, Yumie
17:00-17:20 Complex CNN incorporating Hilbert transform for steady-state visual evoked potential BCI Takata, Rintaro*; Washizawa, Yoshikazu
17:20-17:40 Electroencephalogram-Based Effective Features for Sustained Attention Assessment in Conversation Togashi, Masaya; Chanpornpakdi, Ingon; Tanaka, Toshihisa*

Session Room Chair
Signal Processing for Drone Audition & Recent Advances in Intelligent Signal Processing Room 1 -
Date Time Title Authors
06-12-2024 09:00-09:20 Relative Transfer Matrix for Drone Audition Applications: Source Enhancement Manamperi, Wageesha*; Abhayapala, Thushara
09:20-09:40 Beamforming informed independent low-rank matrix analysis for sound source enhancement in unmanned aerial vehicles Teh, Jin Xuan*; Takamune, Norihiro; Saruwatari, Hiroshi; Yen, Benjamin; Kingan, Michael; Hioka, Yusuke
09:40-10:00 SMoLnet-T: An Efficient Complex-spectral Mapping Speech Enhancement Approach with Frame-wise CNN and Spectral Combination Transformer for Drone Audition Tan, Zhi-Wei*; Khong , Andy W H
10:00-10:20 Integrating VGGSK and BEATs for Enhanced Sound Event Detection: A Semi-Supervised GRU-Based System with Weak Labels and Synthetic Soundscapes Chan, Po-Cheng*; Chen, Wei-Yu; Wang, Jia-Ching; Lu, Chung-li; Chuang, Hsiang Feng; cheng, yu-han
10:20-10:40 Drone audition: implementation of an indoor multi-drone system for sound source tracking Yen, Benjamin*; Nakadai, Kazuhiro
10:40-11:00 Implementation of a Robot Operation System-based network for sound source localization using multiple drones Yamamoto, Takumi*; Hoshiba, Kotaro; Yen, Benjamin; Nakadai, Kazuhiro

Session Room Chair
Converging AI and Computer Vision: Innovations and Potential Room 2 -
Date Time Title Authors
06-12-2024 09:00-09:20 Hyperspectral Anomaly Detection Using Robust Principal Component Analysis with Autoencoding Adversarial Networks Emoto, Atsuya; Matsuoka, Ryo*
09:20-09:40 Optimising Neural Networks with Fine-Grained Forward-Forward Algorithm: A Novel Backpropagation-Free Training Algorithm Gong, James; Li, Bruce; Abdulla, Waleed*
09:40-10:00 Two-Way Malaysian Sign Language Communication System for Inclusive Education HII, Veron Zhen Liang; LO, Aaron Ken Kiat; LEE, Ida Pei Xin; ABUAN, ALEC VINCE GONZALES; Lee, Sue Han*; Then, Patrick HangHui
10:00-10:20 PRTGaussian: Efficient Relighting Using 3D Gaussians with Precomputed Radiance Transfer Zhang, Libo*; Han, Yuxuan; Lin, Wenbin; Ling, Jingwang; Xu, Feng

Session Room Chair
AI-Driven Innovations in Cybersecurity Advanced Applications in Signal Processing, Multimedia Security, and Privacy Room 3 -
Date Time Title Authors
06-12-2024 09:00-09:20 ET-SSM: Linear-Time Encrypted Traffic Classification Method Based On Structured State Space Model Yanjun, Li*; Zhao, Xiangyu; Zhengpeng, Zha; Ling, Zhen-Hua
09:20-09:40 Toward Universal Detector for Synthesized Images by Estimating Generative AI Models Seo, Ryota*; Kuribayashi, Minoru; Ura, Akinobu; Mallet, Antoine; Cogranne, Rémi; Mazurczyk, Wojciech; Megías, David
09:40-10:00 Innovative Information Hiding in H.266/VVC Using Sub-Block Transform Technique Hau, Joan*; Tew, Yiqi; Tan, Li Peng
10:00-10:20 GGMDDC: An Audio Deepfake Detection Multilingual Dataset Purohit, Ravindrakumar M.*; Shah, Arth Juhul; Patil, Hemant

Session Room Chair
Embedded and Real-Time Systems for AI and Signal Processing Applications Room 4 -
Date Time Title Authors
06-12-2024 09:00-09:20 Accelerated Real-Time Local Maxima Detection in Video Streams Using FPGA Technology Nayazirly, Anindhita; Salomo, Yahwista*; Adiono, Trio; Syafalni, Infall; Sutisna, Nana; Mulyawan, Rahmat
09:20-09:40 A Configurable OFDM Baseband Processor for RF-UOWC System-on-Chip Adiono, Trio; Setiawan, Erwin*; Jonathan, Michael; Mulyawan, Rahmat; Sutisna, Nana; Syafalni, Infall; Popoola, Wasiu
09:40-10:00 Hammering Sound Inspection System Using HPSS and Gradient Boosting with a Wall-Climbing Robot Koyama, Nichika*
10:00-10:20 Implementation of Real Time Oscillometric Based Algorithm for Blood Pressure Measurement in Patient Monitor Adiono, Trio; Amadeus, Clarence*; Thomi, Teuku Rafifsyah; Sinaga, Sindy Novaria Cicilya

Session Room Chair
Selected Papers from APSIPA Workshop on Advanced Signal and Information Processing Room 5 -
Date Time Title Authors
06-12-2024 09:00-09:20 Automated Pseudo-Label Generation and Parallel Computing for Enhanced Few-Shot Medical Image Segmentation Do, Ha Thanh *; Nguyen Trong, Duc; Do, Tien-Dung
09:20-09:40 Enhanced Sparse Convolutional Detection Model for 3D Object Detection in Autonomous Vehicles Adapted to Traffic Conditions in Vietnam Do, Ha Thanh *; Dung, Vu Hoang; Nguyen, Kien Trung
09:40-10:00 Enhancing Cell Segmentation using Deep Learning Models by Custom Processing Techniques Do, Ha Thanh *; Nguyen, Van De; Dang Hoang, Minh Huong; Huy, Nguyễn Quang; Dinh Manh, Cuong Initail
10:00-10:20 Marker-Aware Ovarian Tumor Segmentation from Ultrasound Images Bui, Hoang-Son*; Tran, Sy-Hoang; Nguyen, Thuy-Binh; Tran, Thanh-Hai; Vu, Hai; Lan, Le Thi

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
06-12-2024 09:00-09:20 ACE-Flow: Auto Color Encoding for Enhanced Low-Light Image Restoration Qiu, Jiachen; Zuo, Yushen; Lam, Kin-Man*
09:20-09:40 PBJDT: Point-Based Joint Detection-and-Tracking Lee, Zhen-Xun; Ding, Jian-Jiun*
09:40-10:00 Capturing Dynamic Identity Features for Speaker-Adaptive Visual Speech Recognition Kashiwagi, Sara*; Tanaka, Keitaro; Morishima, Shigeo
10:00-10:20 A Byte-based GPT-2 Model for Bit-flip JPEG Bitstream Restoration Qin, Hao; SUN, Haoran; Wang, Yi*

Session Room Chair
Acoustic Scene Analysis and Signal Enhancement Based on Advanced Signal Processing and Machine Learning Room 7 -
Date Time Title Authors
06-12-2024 09:00-09:20 Successive Speaker Relative Transfer Function Estimation Through Relative Transfer Matrix in Noisy Reverberant Environments Manamperi, Wageesha*; Abhayapala, Thushara
09:20-09:40 Heavy-tailed Distributions-Based Online Semi-blind Source Separation for Nonlinear Echo Cancellation Zhang, Liyuan*; Wang, Xianrui; Yang, Yichen; Ueda, Tetsuya; Makino, Shoji; Chen, Jingdong
09:40-10:00 A Single-InputBinaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments zheng, tianqin*; Pei, Hanchen; Pan, Ningning; Jin, Jilu; Huang, Gongping; Chen, Jingdong; Benesty, Jacob
10:00-10:20 Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human--Avatar Dialogue Systems Ishikawa, Yuto*; Take, Osamu; Nakamura, Tomohiko; Takamune, Norihiro; Saito, Yuki; Takamichi, Shinnosuke; Saruwatari, Hiroshi

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
06-12-2024 09:00-09:20 EMO-Codec: An In-Depth Look at Emotion Preservation Capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations Ren, Wenze*; Lin, Yi-Cheng; Chou, Huang-Cheng; Wu, Haibin; Wu, Yi-Chiao; Lee, Hung-yi; Lee, Chi-Chun; Wang, Hsin-Min; Tsao, Yu
09:20-09:40 Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model Park, Joonyong*; Saito, Daisuke; Minematsu, Nobuaki
09:40-10:00 Investigating the Language Independence of Voice Activity Projection Models through Standardization of Speech Segmentation Labels Sato, Yuki*; Chiba, Yuya; Higashinaka, Ryuichiro
10:00-10:20 A Preliminary Study on Analysing Mandarin Tone Values of Romance L2 Mandarin Learners Li, Wu-Hao*; Liu, Te-hsin; CHIANG, Chen Yu

Session Room Chair
Signal Processing for Drone Audition & Recent Advances in Intelligent Signal Processing Room 1 -
Date Time Title Authors
06-12-2024 10:40-11:00 Drone audition: dataset and methods for ground surface material classification using drone noise in outdoor environment Yano, Tsubasa*; Yen, Benjamin; Nakadai, Kazuhiro
11:00-11:20 Seismic-ionospheric Precursor Prediction Using Deep Learning Pham, Tung Bach*; Chang, Pao-Chi; Wang, Jia-Ching
11:20-11:40 Swarm Active Audition System with Robots and Drones for a Search and Rescue Task Nakadai, Kazuhiro*; Kumon, Makoto; Sasaki, Yoko; Hoshiba, Kotaro; Yen, Benjamin

Session Room Chair
Converging AI and Computer Vision: Innovations and Potential Room 2 -
Date Time Title Authors
06-12-2024 10:40-11:00 RepViT Based Lightweight Architecture for Distracted Driving Detection Jian, Muwei*; Ling, Yukun
11:00-11:20 HSIC as Information Compression for Training Deep Neural Network Sofi, Roshan Birjais*; Wang, Kevin I-Kai; Abdulla, Waleed
11:20-11:40 Zero-Shot Learning for Haze Removal Using Fusion of Near-Infrared and Color Images Kato, Onhi*; Kubota, Akira
11:40-12:00 Color Enhancement for the Colorblind Using Color Correction Intensity Map and Pix2pix Image Conversion Komatsu, Shu*; Kubota, Akira

Session Room Chair
Multimedia Processing Systems in the AI Era Room 3 -
Date Time Title Authors
06-12-2024 10:40-11:00 Detecting Abnormal Machine Sounds Using An Ensemble Approach with Data Augmentation Techniques Chan, Po-Cheng*; Lu, Chung-li; Wang, Jia-Ching
11:00-11:20 Leveraging Semi-Supervised Learning with BEATs Feature Extraction and Bi-GRU Classification on Heterogeneous Datasets Chen, Wei-Yu; Lu, Chung-li; Chan, Po-Cheng*; Chuang, Hsiang Feng; cheng, yu-han; Wang, Jia-Ching
11:20-11:40 Leveraging Attention Mechanisms for Breast Cancer Diagnosis akumalla, Brahma reddy*; Pham, Tung Bach; Zhuang, Yung-Yu; Prihasto, Bima; Chang, Pao-Chi; Wang, Jia-Ching
11:40-12:00 Enhanced Detection of Illegally Parked Vehicles Using YOLO and Good Feature to Track Methods Maftuh Alwafi, Fauzan; Mugi Pratama, Boby; Le, Phuong Thi; Prihasto, Bima*; Wang, Jia-Ching

Session Room Chair
Embedded and Real-Time Systems for AI and Signal Processing Applications Room 4 -
Date Time Title Authors
06-12-2024 10:40-11:00 Exploration Robot Based On YOLOv8 Algorithm Syafalni, Infall*; Winasta Sinisuka, Angelica; Kalam Amal Tauhid, Dwi; Ahmad, Farrel; Alif Putra Yasa, Muhammad; Alexander Wen, Steven; Setiawan, Erwin; Sutisna, Nana; Adiono, Trio
11:00-11:20 Optimizing Deep Q-Network for Shortest Path Computation of Mobile Robot Agents Sumarudin, A*; Sutisna, Nana; Syafalni, Infall; Riyanto Trilaksono, Bambang; Adiono, Trio
11:20-11:40 Leveraging IoT and Machine Learning for Efficient Rice Stock Monitoring and Prediction Sutisna, Nana*; Prawira Nugroho, Aditya; Jeffrey, Christopher; Ramadhana, Rizky; Mahendra, Ronggur; Jonathan, Michael; Syafalni, Infall; Adiono, Trio
11:40-12:00 Comparative Evaluation of Fine-Tuned Hybrid Transformer and Band-Split Recurrent Neural Networks for Music Source Separation Kalang Al Qalyubi, Ken; Ahmadi, Nur*; Puji Lestari, Dessi

Session Room Chair
Selected Papers from APSIPA Workshop on Advanced Signal and Information Processing Room 5 -
Date Time Title Authors
06-12-2024 10:40-11:00 Enhancing Shear Wave Propagation Analysis in Tissue with Directional Filtering of Reflected Waves Luong, Hai Quang*; Tran, Nghia Duc; Nguyen, Hiep; Sinh Cong, Lam; Tran, Duc-Tan
11:00-11:20 Structural Analysis of Asian and African Rice Panicles via Transfer Learning Dinh, Tran Hiep*
11:20-11:40 New approach for Alzheimer's disease classification using topographic maps and deep learning model Le, Quoc Anh*; Thinh, Nguyen hong
11:40-12:00 M-IRRA: A Multilingual Model for Text-based Person Search Tran, Phong Ngoc Hung; Phan, Thi-Hoai; Nguyen, Thuy-Binh; Do, Ngoc-Diep; Nguyễn, Quân Hồng; Tran, Thanh-Hai ; Duong, Thanh Thi-Hien; Le, Thi Lan*

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
06-12-2024 10:40-11:00 GMNER-LF: Generative Multi-modal Named Entity Recognition Based on LLM with Information Fusion Hu, Huiyun*; Kong, Junda; Xiao, Bo; Wang, Fei; Ge, Yang; Sun, Hongzhi
11:00-11:20 WildPose: HRNet-based Lightweight and Efficient Wildlife Pose Estimation BAKANA, SIBUSISO R*; Zhang, Yongfei ; Twala, Bhekisipho
11:20-11:40 A Multi-Perceptual Learning Network for Retina OCT Image Denoising and Classification Lam, Kin-Man*

Session Room Chair
Advanced Topics for Automatic Speakers Recognition Room 7 -
Date Time Title Authors
06-12-2024 10:40-11:00 JOSEPH: PHONETIC-AWARE SPEAKER EMBEDDING FOR FAR-FIELD SPEAKER VERIFICATION JIN, Zezhong*; TU, Youzhi; Mak, Manwai
11:00-11:20 Vocal Tract Length Perturbation-based Pseudo-Speaker Augmentation Considering Speaker Variability for Speaker Verification Zou, Hengyi*; Shiota, Sayaka
11:20-11:40 Differences Between Singer and Speaker Verification: Training Singer Feature Representation Extractor Utilizing Singing Voice Characteristics Toma, Sayaka*; Ariga, Tomoki; Higuchi, Yosuke; Hayasaka, Ichiju; Shigyo, Rie; Ogawa, Tetsuji

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
06-12-2024 10:40-11:00 Peer Learning via Shared Speech Representation Prediction for Target Speech Separation Yang, Xusheng*; Zhao, Zifeng; Zou, Yuexian
11:00-11:20 Developing a Multilingual Spontaneous L2 Speech Corpus for Automated Proficiency Assessment Han, Seunghee*; Kim, Sunhee; Chung, Minhwa
11:20-11:40 Prediction of Negative User Reactions Towards System Responses During Attentive Listening Lala, Divesh*; Inoue, Koji; Kawahara, Tatsuya
11:40-12:00 Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition Chen, Jianan*; Chu, Chenhui; Li, Sheng; Kawahara, Tatsuya

Session Room Chair
Few-shot Vision, Language, and Multimedia Processing under LLMs Room 9 -
Date Time Title Authors
06-12-2024 10:40-11:00 A Noisy Context Optimization Approach for Chinese Spelling Correction Zhang, Guangwei; Xiong, Yongping; Li, Ruifan*
11:00-11:20 GVDIE: A Zero-Shot Generative Information Extraction Method for Visual Documents Based on Large Language Models Qi, Siyang*; Wang, Fei; Sun, Hongzhi; Ge, Yang; Xiao, Bo
11:20-11:40 META: Text Detoxification by leveraging METAmorphic Relations and Deep Learning Methods Choo, Alika*; Pal, Arghya; Rajanala, Sailaja; Sen, Arkendu
11:40-12:00 Visual semantic alignment network based on pre-trained ViT for few-shot image classification Zhang, Jiaming; Wu, Jijie; Li, Xiaoxu*

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
04-12-2024 11:00-11:20 SRC-gAudio: Sampling-Rate-Controlled Audio Generation Li, Chenxing*; Xu, Manjie; Yu, Dong
11:20-11:40 Scale-invariant Online Voice Activity Detection under Various Environments Takeda, Ryu*; Komatani, Kazunori
11:40-12:00 Sound Quality Improvement in Visual Microphone by Emphasizing Focused Area Based on Focal Rate Nakano, Hayata*; Geng, Yuting; Iwai, Kenta; Nishiura, Takanobu
12:00-12:20 Deep-Learning-Based Speech Enhancement with Rough-Focused Optical Laser Microphone by Reconstructing Complex Spectrum Nakano, Yuki*; Geng, Yuting; Iwai, Kenta; Nishiura, Takanobu

Session Room Chair
Biomedical Signal Processing and Systems Room 2 -
Date Time Title Authors
04-12-2024 11:00-11:20 Bluemarble: Bridging Latent Uncertainty in Articulatory-to-Speech Synthesis with a Learned Codebook um, seyun*; Kim, Miseul; Kim, Doyeon; Kang, Hong-Goo
11:20-11:40 Iterative Demographic Attentional Feature Fusion-based CNN and Transformer Network for Accurate Cuffless Blood Pressure Estimation Tang, Liwen; Zheng, Dingchang; Chen, Fei*
11:40-12:00 Sampling Pattern Augmentation to Enhance Deep Learning-based Image Reconstruction of MRI Yamato, Kazuki*; Ito, Satoshi
12:00-12:20 Data Augmentation and Assessment for Enhanced Ovarian Tumor Classification Pham, Loan Thi*; Pham, Gia-Minh; Nguyen, Tien-Dat; Le, Hung Van; Pham, Chi-Mai; Le, Thi Lan; Vu, Duy-Hai; Vu, Hai; Tran, Thanh-Hai

Session Room Chair
Machine Learning and Data Analytics Room 3 -
Date Time Title Authors
04-12-2024 11:00-11:20 GMA: Green Multi-Modal Alignment for Image-Text Retrieval Yang, Tsung-Shan*; Wang, Yun-Cheng; Wei, Chengwei; You, Suya; Kuo, C.-C. Jay
11:20-11:40 Improving Semi-Supervised Object Detection by ROI-Enhanced Contrastive Learning Huang, Teng-Kuan Huang; Yeh, Mei-Chen*
11:40-12:00 Real-time Segmentation of Coronary Artery Calcification Using Spatial Attention and Parallel Convolution Asakawa, Tetsuya*; Hashimoto, Masashi; Miyaji, Takeshi; shimizu, kazuki; Nomura, Kei; Aono, Masaki
12:00-12:20 ViP-CBM: Reducing Parameters in Concept Bottleneck Models by Visual-Projected Embeddings Qi, Ji; Wang, Huisheng; Zhao, H. Vicky*

Session Room Chair
Machine Learning and Data Analytics Room 4 -
Date Time Title Authors
04-12-2024 11:00-11:20 Psychological Driving Style Estimation from GPS Sensor Data Alone Horimoto, Hiroto; Kimura, Ryusei; Tanaka, Takahiro; Okada, Shogo*
11:20-11:40 Adversarial Augmentation and Adaptation for Speech Recognition Chien, Jen-Tzung*; Sun, Wei-Yu
11:40-12:00 Empathetic Response Generation via Regularized Q-Learning Chien, Jen-Tzung*; Wu, Yi-Chien
12:00-12:20 Continual Learning with Self-Organizing Maps: A Novel Group-Based Unsupervised Sequential Training Approach Hirani, Gaurav R*; Wang, Kevin I-Kai; Abdulla, Waleed

Session Room Chair
Machine Learning and Data Analytics Room 5 -
Date Time Title Authors
04-12-2024 11:00-11:20 YOLO for High Resolution Images without Retraining Minami, Daisuke*; Nishikawa, Kiyoshi
11:20-11:40 Noise-Robust Estimation of Early-part Room Impulse Responses based on Physics-Informed Neural Network with Dynamic Pulling Method Kurata, Ken*; Sato, Gen; Tsunokuni, Izumi; Ikeda, Yusuke
11:40-12:00 A Multi-Domain Camera Model Identification Feature Restoration Network to Counter AI Compression Attacks jinkai, zhang*
12:00-12:20 Deep Learning-based Intraoperative Video Analysis for Cataract Surgery Instrument Identification Guo, Zhe*; Chan, Yuk Hee; Law, Ngai Fong

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
04-12-2024 11:00-11:20 GSBIQA: Green Saliency-guided Blind Image Quality Assessment Method Mei, Zhanxuan*; Wang, Yun-Cheng; Kuo, C.-C. Jay
11:20-11:40 AFSDet: Video Small Object Detection Based on Adaptive Focused Slicing Huang, Kangjian; Yang, Yan*; Jiang, Yongquan; Zhang, Xiaobo; Li, Zhuyi Angelina
11:40-12:00 Dual Motion Attention and Enhanced Knowledge Distillation for Video Frame Interpolation Zhang, Deng yong*; lou, runqi; Chen, Jiaxin; Liao, Xin; Yang, Gaobo; ding, xiangling
12:00-12:20 EavaNet: Enhancing Emotional Facial Expressions in 3D Avatars through Speech-Driven Animation um, seyun*; Lee, YongJu; Ko, WooSeok; Zhou, Yuan; Lee, Sangyoun; Kang, Hong-Goo

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
04-12-2024 11:00-11:20 On the Importance of Time and Pitch Relativity for Transformer-based Symbolic Music Generation Inaba, Tatsuro*; Yoshii, Kazuyoshi; Nakamura, Eita
11:20-11:40 Optimal Investment With Incomplete Information and Herd Effect Wang, Huisheng; Liu, Mingxiao; Qi, Ji; Zhao, H. Vicky*
11:40-12:00 YOLO-DC: Enhancing object detection with deformable convolutions and contextual mechanism Zhang, Deng yong*; Xu, Chuanzhen; Chen, Jiaxin; Liao, Xin
12:00-12:20 One-step Spectral Estimation for Euclidean Distance Matrix Approximation Li, Yicheng*; Sun, Xinghua

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
04-12-2024 11:00-11:20 SDNet: Noise-Robust Bandwidth Extension under Flexible Sampling Rates Yang, Junkang*; Liu, Hongqing; Gan, Lu; Zhou, Yi; Li, Xing; Jia, Jie; Yao, Jinzhuo
11:20-11:40 GLASS: Investigating Global and Local context Awareness in Speech Separation Ho, Kuan-Hsun*; Yu, En-Lun; Hung, Jeih-weih; Huang, Shih-Chieh; Chen, Berlin
11:40-12:00 Low-resource Language Adaptation with Ensemble of PEFT Approaches Kwok, Chin Yuen*; Li, Sheng; Yip, Jia Qi; Chng, Eng Siong
12:00-12:20 Diverse Time-Frequency Attention Neural Network for Acoustic Echo Cancellation Yao, Jinzhuo*; Liu, Hongqing; Zhou, Yi; Gan, Lu; Yang, Junkang

Session Room Chair
Speech and Language Processing Room 9 -
Date Time Title Authors
04-12-2024 11:00-11:20 LDMSE: Low Computational Cost Generative Diffusion Model for Speech Enhancement Nishi, Yuki*; Iwano, Koji; SHINODA, Koichi
11:20-11:40 MTFNet: Multi-Scale Transformer Framework for Robust Emotion Monitoring in Group Learning Settings Zhang, Yi*
11:40-12:00 Target Speaker Extraction Method by Emphasizing the Active Speech with an Additional Enhancer Yang, Xue; Bao, Changchun*; Zhang, Xu; Chen, Xianhong

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
04-12-2024 14:00-14:20 A Study on Multimodal Fusion and Layer Adapter in Emotion Recognition Shi, Xiaohan*; Gao, Yuan; He, Jiajun; Mi, Jinyi; LI, Xingfeng; Toda, Tomoki
14:20-14:40 Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation Wang, Xianrui*; Zhang, Shiqi; He, Bo; Makino, Shoji; Chen, Jingdong
14:40-15:00 Enhancing Neural Speech Embeddings for Generative Speech Models Kim, Doyeon*; Song, Yanjue; Madhu, Nilesh; Kang, Hong-Goo
15:00-15:20 Design of Spectrogram-Consistency Regularization Term Dependent on Observation in Independent Low-Rank Matrix Analysis for Blind Source Separation Kojima, Takaaki*; Takamune, Norihiro; Kitamura, Daichi; Saruwatari, Hiroshi
15:20-15:40 On Joint Dereverberation and Single Moving Source Separation with Online Source Steering Zhang, Yiting*; Mo, Kaien; Ueda, Tetsuya; Yang, Yichen; Makino, Shoji
15:40-16:00 New Perspectives and Insights on Distortionless Microphone Array Beamforming Zhang, Fan*; Benesty, Jacob; Pan, Chao; Chen, Jingdong

Session Room Chair
Biomedical Signal Processing and Systems Room 2 -
Date Time Title Authors
04-12-2024 14:00-14:20 Postoperative Delirium Prediction Based on Preoperative Electrocardiogram and Electroencephalogram Mito, Shogo; Miyajima, Miho; Tomioka, Hirofumi; Sato, Hitomi; Takeuchi, Takashi; Muto, Hitoshi; Kabasawa, Yuji; Harada, Hiroyuki; Eguchi, Kana; Kato, Shota; Kano, Manabu*
14:20-14:40 A method for classification NEO–FFI answers fabricated and advantageous due to psychological bias using brainwave specific brain activity networks ASHIKAWA, YUTO*; Ito, Takashi; Ishizu, Syohei; Kurihara, Yosuke
14:40-15:00 Effect of White Noise on Working Memory Using Event-Related Potentials Lee, Seung-won; LEE, Jun-Seok; Hwang, Han-Jeong*
15:00-15:20 Automated prediction of loudness growth curve using EEG signals Tiwari, Nitya*
15:20-15:40 Separation of Cardiopulmonary Sound Signals for Classification of Respiratory Diseases Zheng, Ruxin*
15:40-16:00 Performance Improvement of Single Plane-Wave Imaging Using U-Net and Discrete Wavelet Transform Shidara, Hiromi*; Miura, Kanta; Ishii, Takuro; Ito, Koichi; Aoki, Takafumi; Saijo, Yoshifumi ; Ohmiya, Jun

Session Room Chair
Multimedia Security and Forensics Room 5 -
Date Time Title Authors
04-12-2024 14:00-14:20 Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories Chen, Zongmei; Liao, Xin*; Wu, Xiaoshuai; Chen, Yanxiang
14:20-14:40 A Document Presentation Attack Detection Scheme with Optical Flow under a Flashlight Chen, Changsheng*; Chen, Wenyu; Chen, Ximin; Li, Haodong
14:40-15:00 Robust Image Watermarking Scheme under Halftone Distortion with Surrogate Model Chen, Changsheng*; Li, Xijin
15:00-15:20 Physical Domain Adversarial Attacks Against Source Printer Image Attribution Purnekar, Nischay*; Tondi, Benedetta; Barni, Mauro
15:20-15:40 A Diffusion-Based Approach for Restoring Face-swapped Images Niu, Yuanchen; Li, Yuanman*; Zhang, Guijia; Li, Xia
15:40-16:00 AI-generated image detectors are surprisingly easy to mislead... for now Lyu, Zihang*; Xiao, Jun; Zhang, Cong; Lam, Kin-Man

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
04-12-2024 14:00-14:20 Green Video Camouflaged Object Detection Wang, Xinyu*; Chen, Hong-Shuo; Zhou, Zhiruo; You, Suya; Madni, Azad; Kuo, C.-C. Jay
14:20-14:40 A Survey on Objective Quality Assessment of Omnidirectional Images Sui, Xiangjie*; Wang, Shiqi ; Fang, Yuming
14:40-15:00 Enhancing YOLOv7 with GLF-Trans for Precision in Small Object Detection Yoshikawa, Naohito*; Ikehara, Masaaki
15:00-15:20 Ablation Study to Derive a Computationally Efficient Deep Learning-Based Super-Resolution Approach Jamil, Asfa*; Artusi, Alessandro
15:20-15:40 Adaptive Spatial Re-sampling Method for Video Coding for Machines An, Eunbin; Kim, Ayoung; Jung, Soon Heung; Choo, Hyon-Gon; Seo, Kwang-Deok*
15:40-16:00 Rotation Invariant Spatio-Spectral Total Variation for Hyperspectral Image Denoising Takemoto, Shingo*; Ono, Shunsuke

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
04-12-2024 14:00-14:20 Multi-Channel Fusion Human Activity Recognition Algorithm Based on Millimeter-Wave Radar Zhu, Junda*; Guo, Shisheng; Tang, Longzhen; Guolong, Cui
14:20-14:40 Optimizing Computational Efficiency: In-Memory Computing with Dynamic Switching Huang, Chao-Ting*; Tsai, Kun-Lin
14:40-15:00 Modeling and Analysis of the Interaction between Opinions and Actions among Heterogeneous Agents Zhang, Hangjing; Zhao, H. Vicky*
15:00-15:20 Adaptive Subspace Clustering for Matrix Completion Wada, Takuto*; Sasaki, Ryohei; Konishi, Katsumi
15:20-15:40 A High-Isolation Sub-6 GHz In-Band Full-Duplex Communication System shi, chengzhe*; Pan, Wensheng; Ma, Wanzhi; Liu, Ying; Xu, Qiang; Zhang, Zhiya; Shao, Shihai
15:40-16:00 Generalized Graph Signal Sampling under Subspace Priors by Difference-of-Convex Minimization Yamashita, Keitaro*; Naganuma, Kazuki; Ono, Shunsuke

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
04-12-2024 14:00-14:20 GE2E-AC: Generalized End-to-End Loss Training for Accent Classification Watanabe, Chihiro*; Kameoka, Hirokazu
14:20-14:40 Efficient Feature Selection for Word Embedding Dimension Reduction Xue, Jintang*; Wang, Yun-Cheng; Wei, Chengwei; Kuo, C.-C. Jay
14:40-15:00 Fine-Grained Quantitative Emotion Editing for Speech Generation Inoue, Sho*; Zhou, Kun; Wang, Shuai; Li, Haizhou
15:00-15:20 Improving Speaker Consistency in Speech-to-Speech Translation Using Speaker Retention Unit-to-Mel Techniques Zhou, Rui*
15:20-15:40 Speech Separation using Neural Audio Codecs with Embedding Loss Yip, Jia Qi*; Kwok, Chin Yuen; Ma, Bin; Chng, Eng Siong
15:40-16:00 Speech Synthesis from IPA Sequences through EMA Data Maruyama, Koki*; Sawada, Shun; Ohmura, Hidefumi; Katsurada, Kouichi

Session Room Chair
Speech and Language Processing Room 9 -
Date Time Title Authors
04-12-2024 14:00-14:20 BEES: A New Acoustic Task for Blended Emotion Estimation in Speech LI, Xingfeng*; Shi, Xiaohan; Si, Yuke; Zhang, Zilong; Cui, Feifei; Li, Yongwei; Liu, Yang; Unoki, Masashi; Akagi, Masato
14:20-14:40 Is Corpus Truth for Human Perception?: Quality Assessment of Voice Response Timing in Conversational Corpus through Timing Replacement Yoshikawa, Sadahiro*; Ishii, Ryo; Okada, Shogo
14:40-15:00 Enhancing Branchformer with Dynamic Branch Merging Module for Code-Switching Speech Recognition Hu, Hong-Jie*; Chen, Chia-Ping
15:00-15:20 Optimizing Multi-Speaker Speech Recognition with Online Decoding and Data Augmentation Strategies Peng, Yizhou*; Chng, Eng Siong
15:20-15:40 Adapting OpenAI's Whisper for Speech Recognition on Code-Switch Mandarin-English SEAME and ASRU2019 Datasets Yang, Yuhang; Peng, Yizhou*; Huang, Hao; Chng, Eng Siong; Zhong, Xionghu

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
04-12-2024 16:20-16:40 A Low-Complexity Adaptive Beamformer for Joint Reverberation and Noise Suppression Zhang, Fan*; Pan, Chao; Chen, Jingdong; Benesty, Jacob
16:40-17:00 Multichannel Speech Enhancement Using Complex-Valued Graph Convolutional Networks and Triple-Path Attentive Recurrent Networks Shen, Xingyu; Zhu, Prof. Wei-Ping*
17:00-17:20 Anomalous Machine Sound Detection Based on Time Domain Gammatone Spectrogram Feature and IDNN Model Hafiz, Primanda Adyatma*; Mawalim, Candy Olivia; Puji Lestari, Dessi; Sakti, Sakriani; Unoki, Masashi
17:20-17:40 Unsupervised Anomalous Sound Detection Using Timbral and Human Voice Disorder-Related Acoustic Features Akbar Hashemi Rafsanjani, Malik*; Mawalim, Candy Olivia; Lestari, Dessi Puji; Sakti, Sakriani; Unoki, Masashi
17:40-18:00 Real-Time Monophonic Dual-Pitch Extraction Model Tran, Ngoc-Son; Hsieh, Pei-Chin; Shen, Yih-Liang*; Chu, Yen-Hsun; Chi, Tai-Shih

Session Room Chair
Biomedical Signal Processing and Systems Room 2 -
Date Time Title Authors
04-12-2024 16:20-16:40 Predictive Analysis of Driver Drowsiness Progression: Multi-Level Drowsiness Classification Using Physiological Signals Dachoponchai, Natchira; Wongsawat, Yodchanan; Arnin, Jetsada*
16:40-17:00 Feature Extraction for Machine Learning-based Sleep Stage Classification Using PPG-Derived Parameters and Skin Temperature Buaruk, Suphachok; Thanaviratananich, Sikawat; Treesuthacheep, Peerasit; Deepaisarn, Somrudee*
17:00-17:20 Parameterizing Hierarchical Particle Filters with Concept Drift for Time-varying Parameter Estimation Murphy, Joshua*; Rosato, Conor; Millard, Andrew; Maskell, Simon
17:20-17:40 Pop Noise Detection Using Group Delay Cepstral Coefficients Shah, Arth Juhul*; Patil, Hemant
17:40-18:00 Novel Estimators for the Number of Susceptible Individuals in SIR Models of Infectious Epidemics van Wyk, Anton; McDonald, Andre M*; Rubin, David; Zhang, FangFang

Session Room Chair
Multimedia Security and Forensics Room 5 -
Date Time Title Authors
04-12-2024 16:20-16:40 A Study on Variable Embedding Locations of Reversible Spectral Speech Watermarking HUANG, Xuping*; Ito, Akinori
16:40-17:00 Normalizing Flows-Based Latent Variable Rearrangement for Generative Image Steganography Wu, Sifan*; Dong, Li
17:00-17:20 Detecting Spoof Voices in Asian Non-Native Speech: An Indonesian and Thai Case Study Adila, Aulia*; Mawalim, Candy Olivia; Unoki, Masashi
17:20-17:40 Privacy-Preserving Anomaly Detection in Bitstream Video based on Gaussian Mixture Model Chen, Yike; Song, Yuru; Zheng, Peijia *; Du, Yusong; Luo, Weiqi
17:40-18:00 Source Attribution for Images Generated by Diffusion-Based Text-to-Image Models: Exploring the Forensics Approach Jiang, Xinqi; Tian, Jinyu*

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
04-12-2024 16:20-16:40 Hyperspectral Unmixing With Row-Sparsity Enhancement: A Difference-of-Convex Approach Naganuma, Kazuki*; Ono, Shunsuke
16:40-17:00 How Accurate Can Large Vision Language Model Perform for Images with Compression Degradation? Fang, Xiaohan*; CHEN, PEILIN; Wang, Meng; Wang, Shiqi
17:00-17:20 Enhanced RefineDNet for Single Image Dehazing Ren, Jingyu*
17:20-17:40 Tsnake: A Time-Embedded Recurrent Contour-Based Instance Segmentation Model Hsu, Chen-Jui; Ding, Jian-Jiun*; Shih, Chun-Jen

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
04-12-2024 16:20-16:40 Affine Combination of General Adaptive Filters Jin, Danqi*; Chen, Yitong; Chen, Jie; Huang, Gongping
16:40-17:00 An Annealing-Inspired Gradient-Descent Based Suboptimal Solver for Combinatorial Problems Shu Ping, Chang; Lee, Cheng-Che; Lee, Hsin-Jung; Kuan, Chieh-Hsiung; Young, Jason Gemsun; Yao, Chia-Yu; Ding, Jian-Jiun*
17:00-17:20 A Solution For Anomaly Detection of Red Beans In A Product Processing Line Nguyen, Duc Hai; Do, Hiep Trong; Nguyen, Hoang-Linh-Phuong; Nguyen, Quoc-Khanh; Tran, Duc-Tan; Bui, Tien Son Tien; Nguyen, VanToi*
17:20-17:40 A Novel kind of WVD Associated with the Linear Canonical Transform Peng, Jia-Yin; Chen, Jian-Yi; Li, Bing-Zhao*
17:40-18:00 A Discrete-Valued Signal Estimation by Nonconvex Enhancement of SOAV with cLiGME Model Shoji, Satoshi*; Yata, Wataru; Kume, Keita; Yamada, Isao

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
04-12-2024 16:20-16:40 Frequency & Channel Attention Network for Small Footprint Noisy Spoken Keyword Spotting Lin, Yuanxi*; Gapanyuk, Yuriy E
16:40-17:00 Long Audio File Speaker Diarization with Feasible End-to-End Models Huang, Kai-Wei*; Chen, Chia-Ping
17:00-17:20 Analysis of Various Self-Supervised Learning Models for Automatic Pronunciation Assessment Lee, Haeyoung*; Kim, Sunhee; Chung, Minhwa
17:20-17:40 Band-Split Inter-SubNet: Band-Split with Subband Interaction for Monaural Speech Enhancement Pan, Yen-Chou; Shen, Yih-Liang*; Liao, Yuan-Fu; Chi, Tai-Shih
17:40-18:00 Speech Dereverberation with Deconvolution Regularized by Denoising Hu, Haonan; Yang, Ziye; Chen, Jie*; Zhang, Lijun

Session Room Chair
Speech and Language Processing Room 9 -
Date Time Title Authors
04-12-2024 16:20-16:40 Domain Adaptation by Alternating Learning of Acoustic and Linguistic Information for Japanese Deaf and Hard-of-Hearing People Takahashi, Kaito*; Wakabayashi, Yukoh; Ohta, Kengo; Kobayashi, Akio; Kitaoka, Norihide
16:40-17:00 Speech emotion recognition based on crossmodal transformer and attention weight correction Terui, Ryusei*; Yamada, Takeshi
17:00-17:20 Unsupervised Discovery of Non-Categorical L2 Error Patterns Using Wav2Vec2.0 Code Vectors Hong, Eunsoo*; Kim, Sunhee; Chung, Minhwa
17:20-17:40 An Effective Contextualized Automatic Speech Recognition Approach Leveraging Self-Supervised Phoneme Features Pai, Li-Ting*; Wang, Yi-Cheng; Yan, Bi-Cheng; Wang, Hsin-Wei; Lu, Jia-Liang; Lin, Chi-Han; Xu, Juan-Wei ; Chen, Berlin
17:40-18:00 COIN-AT-PVAD: A Conditional Intermediate Attention PVAD Yu, En-Lun*; Ruei-Xian, Chang; Hung, Jeih-weih; Huang, Shih-Chieh; Chen, Berlin

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
05-12-2024 10:20-10:40 Wind Noise Reduction with Orthogonal Polynomial Expansion Du, Li*; Zhang, Lijun
10:40-11:00 Few-Shot Open-Set Keyword Spotting with Multi-Stage Training Li, LoYa*; Lo, Tien-Hong; Hung, Jeih-weih; Huang, Shih-Chieh; Chen, Berlin
11:00-11:20 Self-Supervised Augmented Diffusion Model for Anomalous Sound Detection Yin, Jiawei; gao, yu*; Zhang, Wenbin; Zhang, Mingjun
11:20-11:40 Murmur Separation and Classification from Heart Sound Using Constrained Singular Spectrum Analysis and Wavelet Transform Qi, Yuanyang*; Sanei, Saeid
11:40-12:00 A Non-Intrusive Speech Quality Assessment Model using Whisper and Multi-Head Attention Lin, Guojian; Tsao, Yu; Chen, Fei*

Session Room Chair
Emerging Technologies and Applications Of Image Processing And Computer Vision Room 3 -
Date Time Title Authors
05-12-2024 10:20-10:40 Confidence-Aware Learning for Person Re-identification with Noisy Labels Kim, Duhyun*; Sim, Jae-Young
10:40-11:00 Test-Time Optimization for Post-Processing of Compressed Videos Kim, Hongil; Han, Changwoo; Kim, Donghyun; Lim, Sung-Chang; Jung, Seung-Won*
11:00-11:20 Lifelong Person Re-Identification with Backward-Compatibility Oh, Minyoung; Sim, Jae-Young*
11:20-11:40 Enhancing Semiconductor X-RAY Images: A Framework Combining Denoising and Super-Resolution Modules With a Novel Dataset Shim, Jae Hoon*; Kim, Min Woo; Lee, Sang Hwa; Cho, Nam Ik
11:40-12:00 Monocular Depth Estimation for Autonomous Driving Based on Instance Clustering Guidance Kim, Dahyun*; Jin, Dongkwon; Kim, Chang-Su

Session Room Chair
Advanced Topics on Sound Event and Scene Analysis Room 4 -
Date Time Title Authors
05-12-2024 10:20-10:40 Multi-Modal Video Summarization Based on Two-Stage Fusion of Audio, Visual, and Recognized Text Information Yang, Zekun*; He, Jiajun; Toda, Tomoki
10:40-11:00 Prediction-error-based Adaptive SpecAugment for Fine-tuning the Masked Model on Audio Classification Tasks Zhang, Xiao*; XING, HAORAN; Song, Mingxue; Takeuchi, Daiki; Harada, Noboru; Makino, Shoji
11:00-11:20 Synchronization of Signals with Sampling Rate Offset and Missing Data Using Dynamic Programming Matching Takeuchi, Hayato*; Ono, Nobutaka
11:20-11:40 LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators? Koga, Naoki; Bando, Yoshiaki; Imoto, Keisuke*
11:40-12:00 SSL-based Chewing and Swallowing Detection Using Multiple Skin-contact Microphones Tsukagoshi, Toshihiro*; Koiwai, Kazuhiro; Nishida, Masafumi; Nishimura, Masafumi

Session Room Chair
Recent Advances in Multimedia Enrichment and Security Room 5 -
Date Time Title Authors
05-12-2024 10:20-10:40 Enhancing Security Using Random Binary Weights in Privacy-Preserving Federated Learning Sawada, Hiroto*; Imaizumi, Shoko ; Kiya, Hitoshi
10:40-11:00 Estimation of rotation angle and anisotropic scaling rate using pilot signals for watermarking Kawano, Rinka*; Kawamura, Masaki
11:00-11:20 On the Security of Bitstream-level JPEG Encryption with Restart Markers Hirose, Mare*; Imaizumi, Shoko ; Kiya, Hitoshi
11:20-11:40 Improved Ultimate Link without Markers for Projective Transformation Yamadera, Keiji; Niimi, Michiharu*
11:40-12:00 Detection of Diffusion-Generated Images Using Sparse Coding Tanaka, Daishi; Niimi, Michiharu*

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
05-12-2024 10:20-10:40 Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals Mi, Jinyi*; Kim, Sehun; Toda, Tomoki
10:40-11:00 Ev3DGS:Event Enhanced 3D Gaussian Splatting from Blurry Images Huang, Junwu; Wan, Zhexiong; Lu, Zhicheng; Zhu, Juanjuan; He, Mingyi; Dai, Yuchao*
11:00-11:20 New Abnormal Behavior Detection for Patient Surveillance System Han, Yujin; kim, taewan*
11:20-11:40 Utilizing Cross Layer Attentions for Semantic Segmentation of Small Objects Lu, Chi-Hsuan; Chung, Yu-Hsien; Cho, Jung-Hui; Yu, Chih-Chang*
11:40-12:00 Music2Fail: Transfer Music to Failed Recorder Style Leong, Chon In*; Chung, I-Ling; Chao, Kin Fong; Wang, Jun-You; Yang, Yi-Hsuan; Jang, Roger

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
05-12-2024 10:20-10:40 U-Mamba-Net: A highly efficient Mamba-based U-net style network for noisy and reverberant speech separation Dang, Shaoxiang*; Matsumoto, Tetsuya; Takeuchi, Yoshinori; Kudo, Hiroaki
10:40-11:00 Graph Filter Transfer for Time-Varying Signal Estimation Between Two Networks Fukuhara, Tsutahiro*; Hara, Junya; Higashi, Hiroshi; Tanaka, Yuichi
11:00-11:20 Few-Shot Audio Classification Model for Detecting Classroom Interactions Using LaSO Features in Prototypical Networks Iqbal, Md Rashed*; Ritz, Christian; Yang, Jie
11:20-11:40 Subset Random Sampling of Finite Time-vertex Graph Signals Sheng, Hang; Shu, Qinji; FENG, HUI*; Hu, bo
11:40-12:00 Dynamic Sensor Placement on Graphs Based on Graph Signal Sampling Theory Nomura, Saki*; Hara, Junya; Higashi, Hiroshi; Tanaka, Yuichi

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
05-12-2024 10:20-10:40 Can We Estimate Purchase Intention Based on Zero-shot Speech Emotion Recognition? Nagase, Ryotaro; Sumiyoshi, Takashi; Yamashita, Natsuo; Dohi, Kota; Kawaguchi, Yohei*
10:40-11:00 Assessment and Improvement of Customer Service Speech with Multiple Large Language Models Watanabe, So; Leow, Chee Siang*; Hoshino, Junichi; Utsuro, Takehito; Nishizaki, Hiromitsu
11:00-11:20 JAM: A Unified Neural Architecture for Joint Multi-granularity Pronunciation Assessment and Phone-level Mispronunciation Detection and Diagnosis Towards a Comprehensive CAPT System He, Yue-Yang*; Yan, Bi-Cheng; Lo, Tien-Hong; Lin, Meng-Shin; Hsu, Yung-Chang; Chen, Berlin
11:20-11:40 Data Augmentation Methods and Influence of Speech Recognition Performance for TED Talk's English to Japanese Speech Translation Masuda, Kento*; Yamamoto, Kazumasa; nakagawa, seiichi
11:40-12:00 Empower Typed Descriptions by Large Language Models for Speech Emotion Recognition Wu, Haibin; Chou, Huang-Cheng*; Chang, Kai-Wei; Goncalves, Lucas; Du, Jiawei; Jang, Jyh-Shing Roger; Lee, Chi-Chun; Lee, Hung-yi

Session Room Chair
Advanced Signal Processing for Information Collection and Data Analysis in Wireless Environmental Sensing Room 9 -
Date Time Title Authors
05-12-2024 10:20-10:40 Data-Driven Tuning for Weighted Least Square of BLE-AoA-based Indoor Localization Ohashi, Ginji; Ibi, Shinsuke*; Takahashi, Takumi; Iwai, Hisato
10:40-11:00 Observation of the terrestrial radio environment using the low earth orbit satellite constellation Obata, Takatoshi*; Takyu, Osamu; Inage, Kei; Fujii, Takeo; Yoshida, Kohei; Ariyoshi, Masayuki
11:00-11:20 Deep Unfolding Aided Parameter Optimization for Multi-task Diffusion LMS Algorithm Tong, Xiaoqing*; Hayashi, Kazunori
11:20-11:40 Reduced-dimensional MUSIC Algorithm for Frequency Diverse Array in MIMO Radar System Zhu, Beizuo*; Hayashi, Kazunori; Mori, Hiroki
11:40-12:00 Collection of Correlated Information from Superimposed Multiple Chirp Signals Aoyama, Koki*; Adachi, Koichi

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
05-12-2024 14:00-14:20 EEND-EM: End-to-End Neural Speaker Diarization with EM-Network Woo, Beom Jun*; Yoon, Ji Won; Han, Min Hyun; Moon, Chan Yeong; Kim, Nam Soo
14:20-14:40 Multi-Task Learning Approaches for Music Similarity Representation Learning Based on Individual Instrument Sounds Imamura, Takehiro*; Hashizume, Yuka; Toda, Tomoki
14:40-15:00 Personal Voice Activity Detection With Ultra-Short Reference Speech Xu, Longting; Zhang, Mingjun; Zhang, Wenbin; Wang, Tianyi; Yin, Jiawei; gao, yu*
15:00-15:20 An Investigation on the Speech Recovery from EEG Signals Using Transformer Mizuno, Tomoaki*; Kishida, Takuya; Yoshimura, Natsue; Nakashika, Toru

Session Room Chair
Audio Processing Room 2 -
Date Time Title Authors
05-12-2024 14:00-14:20 WavLM and Omni-Scale CNNs: Enhancing Boundary Detection in Partially Spoofed Audio Li, Menghan*; Huang, Zhihua
14:20-14:40 Semi-Supervised Far-Field Speaker Verification with Distance Metric Domain Adaptation Wang, Han*; He, Mingrui; Zhang, Mingjun; Xu, Longting
14:40-15:00 Non-Target Conversion Based Speech Steganography for Secure Speech Communication System Zhang, Mingjun; Feng, Yan; gao, yu; Xu, Longting*
15:00-15:20 Enhancing Acoustic Scene Classification with Layer-wise Fine-Tuning on the SSAST Model Hao, Shuting*; Saito, Daisuke; Minematsu, Nobuaki

Session Room Chair
High Performance Image and Video Processing and Applications Room 3 -
Date Time Title Authors
05-12-2024 14:00-14:20 Forward Prediction-Guided Cross-Partition Targeted Pruning for VVenC Tang, Jingyuan*; Sun, Songlin
14:20-14:40 Contrastive Learning Based Knowledge Distillation for Enhancing Defect Detection Guo, Jing-Ming; Yuan, Lun-Da; HUANG, CIAN*; Zeng, Yi-Chong
14:40-15:00 Screen Content Encoding Network Based on Deep Contextual Information Gong, Tianyu*; Zhang, Tao; Zhong, Ye; Zhang, Mengmeng; Bai, Huihui
15:00-15:20 A Coarse-to-Fine Change Detection Framework for Remote Sensing Sparse Cultivated Land hu, yuan*; Zhang, Yifan; Ma, Mingyang; Mei, Shaohui

Session Room Chair
New Frontiers in Biometric Authentication Room 4 -
Date Time Title Authors
05-12-2024 14:00-14:20 A Quasilinear-Time CVP Algorithm for Triangular Lattice Based Fuzzy Extractors and Fuzzy Signatures Takahashi, Kenta*; Nakamura, Wataru
14:20-14:40 Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling Okano, Masora*; Ito, Koichi; Nishigaki, Masakatsu; Ohki, Tetsushi
14:40-15:00 Multibiometrics Using a Single Face Image Ito, Koichi*; Tonosaki, Taito; Aoki, Takafumi; Ohki, Tetsushi; Nishigaki, Masakatsu
15:00-15:20 Multi-Observed Authentication: A secure and usable authentication based on multi-point observation of a single physical credential Hatakeyama, Wataru*; Nozaki, Shinnosuke; Serizawa, Ayumi; Yoshirira, Mizuho; Fujita, Masahiro; Yoshimura, Ayako; Ohki, Tetsushi; Nishigaki, Masakatsu

Session Room Chair
Recent Advances in Multimedia Enrichment and Security Room 5 -
Date Time Title Authors
05-12-2024 14:00-14:20 Generation of Target Speech with Speaker Individuality Based on Accent Conversion for English Pronunciation Learning Hamakawa, Rei; Niimi, Michiharu*
14:20-14:40 Proposal of Blind Extractable Additive Video Watermarking Method Harada, Nao*; Kawano, Rinka; Kawamura, Masaki
14:40-15:00 Transfer-Based Adversarial Attack Against Multimodal Models by Exploiting Perturbed Attention Region Disabato, Raffaele*; Maung Maung, April Pyone; Nguyen, Huy Hong; Echizen, Isao
15:00-15:20 A Permutation-based Reversible Data Hiding Method with Zero Visual Distortion Zhu, Wendi*; Wong, KokSheik; Kuribayashi, Minoru

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
05-12-2024 14:00-14:20 VietSing: A High-quality Vietnamese Singing Voice Corpus Vu, Minh Duc*; Wei, Zhou; Bhattarai, Binit; Teh, Kah Kuan; Dat, Tran Huy
14:20-14:40 Inertial Strengthened CLIP model for Zero-shot Multimodal Egocentric Activity Recognition He, Mingzhou; Wang, Haojie; Zhou, Shuchang; Wu, Qingbo*; Ngan, King Ngi; Meng, Fanman; Li, Hongliang
14:40-15:00 Optimization of the Intensity Aware Loss for Dynamic Facial Expression Recognition Lau, Davy Tec-Hinh; Ding, Jian-Jiun*; Muller, Guillaume
15:00-15:20 Dictionary Learning Based Two-stage Near-lossless Video Compression Zhang, Zuhai; Jia, Luheng*; Song, Li; Zhu, Shuyuan; Guo, Yuanfang; Jia, Kebin

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
05-12-2024 14:00-14:20 Dictionary Learning for Directed Graph Signals via Augmented GFT Naito, Tsubasa*; Ito, Ryuto; Tanaka, Yuichi; Muramatsu, Shogo
14:20-14:40 Robust Quantile Regression Under Unreliable Data Shoji, Yoshifumi*; Yukawa, Masahiro
14:40-15:00 Ensemble learning based head-related transfer function personalization using anthropometric features Shen, Yih-Liang*; Chi, Tai-Shih
15:00-15:20 Blind Estimation of Room Volume from Reverberant Speech Based on the Modulation Transfer Function Siripool, Nutchanon*; kongprawechnon, Waree; Unoki, Masashi

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
05-12-2024 14:00-14:20 Disentangling Speaker Representations from Intuitive Prosodic Features for Speaker-Adaptative and Prosody-Controllable Speech Synthesis Pengyu, Cheng*
14:20-14:40 A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker’s Shadowings Geng, Haopeng *; Saito, Daisuke; Minematsu, Nobuaki; Geng, Haopeng
14:40-15:00 EADSum: Element-Aware Distillation for Enhancing Low-Resource Abstractive Summarization Lu, Jia-Liang*; Yan, Bi-Cheng; Wang, Yi-Cheng; Lo, Tien-Hong; Wang, Hsin-Wei; Pai, Li-Ting; Chen, Berlin
15:00-15:20 A Tiny Whisper-SER: Unifying Automatic Speech Recognition and Multi-label Speech Emotion Recognition Tasks Chou, Huang-Cheng*

Session Room Chair
Advancements in Biosignal Decoding and Neuromodulation for Human Function Enhancement Room 9 -
Date Time Title Authors
05-12-2024 14:00-14:20 Context-FFT: A Context Feed Forward Transformer Network for EEG-based Speech Envelope Decoding Chen, Ximin; Ding, Yuting; Yan, Nan; Chen, Changsheng; Chen, Fei*
14:20-14:40 Effect of Dynamic Binaural Beats on Concentration Enhancement LEE, Jun-Seok; Lee, Yun-Sung; Hwang, Han-Jeong*
14:40-15:00 EEG-based Evaluation of Enjoyment Emotion during cognitive-motor task Aoki, Haruna*; Zhang, Sinan; Ono, Yumie
15:00-15:20 Exploring Brain Connectivity Patterns and Cognitive Resilience in Aging: A Study with the LEMON Dataset ks, Kapeleshh*; Wei, Chen; Domer, Prince Aldrin; Ji, Hong

Session Room Chair
Audio Processing Room 1 -
Date Time Title Authors
05-12-2024 16:40-17:00 Experimental Evaluation of Speech Enhancement for In-Car Environment Using Blind Source Separation and DNN-based Noise Suppression Takeuchi, Yutsuki*; Nakashima, Taishi; Ono, Nobutaka; Takazawa, Takashi; Shimanoe, Shuhei; Tsuchiya, Yoshinori
17:00-17:20 Auxiliary-Function-Based Steering Vector Estimation Method for Spatially Regularized Independent Low-Rank Matrix Analysis Hirata, Sota*; Takamune, Norihiro; Yamaoka, Kouei; Kitamura, Daichi; Saruwatari, Hiroshi; Takahashi, Yu; KONDO, Kazunobu
17:20-17:40 Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions Mi, Jinyi*; Shi, Xiaohan; Ma, Ding; He, Jiajun; Fujimura, Takuya; Toda, Tomoki
17:40-18:00 Data generation for speaker diarization by speaker transition information Ichikawa, Keigo*; Ueno, Sei; Lee, Akinobu

Session Room Chair
Audio Processing Room 2 -
Date Time Title Authors
05-12-2024 16:40-17:00 Generating Room Impulse Responses Using Neural Networks Trained with Weighted Combinations of Acoustic Parameter Loss Functions Ren, Hualin*; Ritz, Christian; Zhao, Jiahong; Zheng, Xiguang; Jang, Daeyoung
17:00-17:20 Audio Similarity Detection Malhotra, Siddharth; Mankad, Sapan H*
17:20-17:40 Towards a B-format Ambisonic Room Impulse Response Generator Using Conditional Generative Adversarial Network Ren, Hualin*; Ritz, Christian; Zhao, Jiahong; Zheng, Xiguang; Jang, Daeyoung
17:40-18:00 What to Refer and How? - Exploring Handling of Auxiliary Information in Target Speaker Extraction Hayashi, Tomohiro*; Ogino, Riku; Saijo, Kohei; Ogawa, Tetsuji

Session Room Chair
High Performance Image and Video Processing and Applications Room 3 -
Date Time Title Authors
05-12-2024 16:40-17:00 Efficient Adaptation for Real-World Omnidirectional Image Super-Resolution Yang, Cuixin*; Dong, Rongkang; Lam, Kin-Man
17:00-17:20 More Direct and stage-wise network for Face Super Resolution Horiguchi, Yohei*
17:20-17:40 Camera Focal Length Prediction for Neural Novel View Synthesis from Monocular Video Chakraborty, Dipanita*; Chiracharit, Werapon; Chamnongthai, Kosin; Okada, Minoru
17:40-18:00 Scene-Segmentation-Based Exposure Compensation for Tone Mapping of High Dynamic Range Scenes Kinoshita, Yuma*; Kiya, Hitoshi

Session Room Chair
Wireless Communications and Networking Room 4 -
Date Time Title Authors
05-12-2024 16:40-17:00 Combining PTS Technique with Polar Coding for OFDM Systems He, Ching-Huan; CHEN, HOUSHOU*; Zhang, Jia-Chun; Tseng, Chih-Kai
17:00-17:20 Blind Self-Interference Analog Canceller with Differential Delay for Backscatter Communications Nishikawa, Koichi; Ibi, Shinsuke*; Takahashi, Takumi; Iwai, Hisato
17:20-17:40 IoT-based Smart Attendance System using Face Recognition and Motion Detection Saadon, Umi Syamimi*; Lim, Chern Hong

Session Room Chair
Recent Advances in Multimedia Enrichment and Security Room 5 -
Date Time Title Authors
05-12-2024 16:40-17:00 Generation of Photo Slideshow with Song based on Closeness between Concept of Lyrics and That of Images Hashimoto, Mei; Niimi, Michiharu*
17:00-17:20 Disposable-key-based image encryption for collaborative learning of Vision Transformer Aso, Rei*; Shiota, Sayaka; Kiya, Hitoshi
17:20-17:40 Significance of Lower Frequency Regions for Audio Deepfake Detection Shah, Arth Juhul*; Patil, Hemant
17:40-18:00 EAViT: External Attention Vision Transformer for Audio Classification Iqbal, Aquib; Zim, Abid Hasan; Tonmoy, Md Asaduzzaman; Zhou, Limengnan ; Malik, Asad*; Kuribayashi, Minoru

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
05-12-2024 16:40-17:00 A Two-Stage Method for 3D Architecture Wireframe Reconstruction from Airborne LiDAR Point Cloud Zhang, Jiahao; Liu, Qi*; Hui, Le; Dai, Yuchao
17:00-17:20 A Two-Stage Method for 3D Architecture Wireframe Reconstruction from Airborne LiDAR Point Cloud Zhang, Jiahao; Liu, Qi*; Hui, Le; Dai, Yuchao
17:20-17:40 Secure Moving Object Detection Transformer in Compressed Video with Feature Fusion Song, Yuru; Chen, Yike; Zheng, Peijia *; Du, Yusong; Luo, Weiqi
17:40-18:00 NeRF-FCM: Attention-based Feature Calibration Mechanisms for 3D Object Detection Using NeRF Goshu, Hana Lebeta*; Xiao, Jun; Chan, Kin-Chung; Zhang, Cong; Gemeda, Mulugeta Tegegn; Lam, Kin-Man

Session Room Chair
Signal and Information Processing & Systems Room 7 -
Date Time Title Authors
05-12-2024 16:40-17:00 Robust Adaptive Filtering Based on Adaptive Projected Subgradient Method: Moreau Enhancement of Distance Function Sawada, Daiki; Yukawa, Masahiro*
17:00-17:20 Significance of Entropy Based Features For Dysarthric Severity Level Classification Avula, Meghana*; Pusuluri, Aditya; Patil, Hemant
17:20-17:40 Incorporating Auditory Processing into Undergraduate Signal Processing Courses to Enhance Student Learning Nie, Kaibao *
17:40-18:00 A Real-Time Platform for Portable and Scalable Active Noise Mitigation for Construction Machinery Peksi, Santi; Gan, Woon Seng *; Lai, Chung Kwan; Lee, Yen Theng ; Shi, Dongyuan; Lam, Bhan

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
05-12-2024 16:40-17:00 A Comparative Study on the Biases of Age, Gender, Dialects, and L2 speakers of Automatic Speech Recognition for Korean Language Na, Jonghwan; Park, Yeseul; Lee, Bowon*
17:00-17:20 NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec Nakata, Wataru*; Saeki, Takaaki; Saito, Yuki; Takamichi, Shinnosuke; Saruwatari, Hiroshi
17:20-17:40 Targeted Representation with Information Disentanglement Encoding Networks in Tasks Nagawaki, Takumi*; Ikeda, Keisuke; Tamura, Satoshi; Chike, Kohei; Nagano, Hiroyuki; Nose, Masaki
17:40-18:00 PG-MDD: Prompt-Guided Mispronunciation Detection and Diagnosis Leveraging Articulatory Features Lin, Meng-Shin*; Yan, Bi-Cheng; Lo, Tien-Hong; Wang, Hsin-Wei; He, Yue-Yang; Chao, Wei-Cheng; Chen, Berlin

Session Room Chair
Advancements in Biosignal Decoding and Neuromodulation for Human Function Enhancement Room 9 -
Date Time Title Authors
05-12-2024 16:40-17:00 Effect of Phase-Locked Transcranial Alternating Current Stimulation on Vocal tremor WANG, JUNTING*; Koganemaru, Satoko; Shima, Atsushi; Cao, Yedi; Hirakawa, Kana; Iwagana, Ken; Suehiro, Atsushi; Maekawa, Keiko; Mima, Tatsuya; Ono, Yumie
17:00-17:20 Complex CNN incorporating Hilbert transform for steady-state visual evoked potential BCI Takata, Rintaro*; Washizawa, Yoshikazu
17:20-17:40 Electroencephalogram-Based Effective Features for Sustained Attention Assessment in Conversation Togashi, Masaya; Chanpornpakdi, Ingon; Tanaka, Toshihisa*

Session Room Chair
Signal Processing for Drone Audition & Recent Advances in Intelligent Signal Processing Room 1 -
Date Time Title Authors
06-12-2024 09:00-09:20 Relative Transfer Matrix for Drone Audition Applications: Source Enhancement Manamperi, Wageesha*; Abhayapala, Thushara
09:20-09:40 Beamforming informed independent low-rank matrix analysis for sound source enhancement in unmanned aerial vehicles Teh, Jin Xuan*; Takamune, Norihiro; Saruwatari, Hiroshi; Yen, Benjamin; Kingan, Michael; Hioka, Yusuke
09:40-10:00 SMoLnet-T: An Efficient Complex-spectral Mapping Speech Enhancement Approach with Frame-wise CNN and Spectral Combination Transformer for Drone Audition Tan, Zhi-Wei*; Khong , Andy W H
10:00-10:20 Integrating VGGSK and BEATs for Enhanced Sound Event Detection: A Semi-Supervised GRU-Based System with Weak Labels and Synthetic Soundscapes Chan, Po-Cheng*; Chen, Wei-Yu; Wang, Jia-Ching; Lu, Chung-li; Chuang, Hsiang Feng; cheng, yu-han
10:20-10:40 Drone audition: implementation of an indoor multi-drone system for sound source tracking Yen, Benjamin*; Nakadai, Kazuhiro
10:40-11:00 Implementation of a Robot Operation System-based network for sound source localization using multiple drones Yamamoto, Takumi*; Hoshiba, Kotaro; Yen, Benjamin; Nakadai, Kazuhiro

Session Room Chair
Converging AI and Computer Vision: Innovations and Potential Room 2 -
Date Time Title Authors
06-12-2024 09:00-09:20 Hyperspectral Anomaly Detection Using Robust Principal Component Analysis with Autoencoding Adversarial Networks Emoto, Atsuya; Matsuoka, Ryo*
09:20-09:40 Optimising Neural Networks with Fine-Grained Forward-Forward Algorithm: A Novel Backpropagation-Free Training Algorithm Gong, James; Li, Bruce; Abdulla, Waleed*
09:40-10:00 Two-Way Malaysian Sign Language Communication System for Inclusive Education HII, Veron Zhen Liang; LO, Aaron Ken Kiat; LEE, Ida Pei Xin; ABUAN, ALEC VINCE GONZALES; Lee, Sue Han*; Then, Patrick HangHui
10:00-10:20 PRTGaussian: Efficient Relighting Using 3D Gaussians with Precomputed Radiance Transfer Zhang, Libo*; Han, Yuxuan; Lin, Wenbin; Ling, Jingwang; Xu, Feng

Session Room Chair
AI-Driven Innovations in Cybersecurity Advanced Applications in Signal Processing, Multimedia Security, and Privacy Room 3 -
Date Time Title Authors
06-12-2024 09:00-09:20 ET-SSM: Linear-Time Encrypted Traffic Classification Method Based On Structured State Space Model Yanjun, Li*; Zhao, Xiangyu; Zhengpeng, Zha; Ling, Zhen-Hua
09:20-09:40 Toward Universal Detector for Synthesized Images by Estimating Generative AI Models Seo, Ryota*; Kuribayashi, Minoru; Ura, Akinobu; Mallet, Antoine; Cogranne, Rémi; Mazurczyk, Wojciech; Megías, David
09:40-10:00 Innovative Information Hiding in H.266/VVC Using Sub-Block Transform Technique Hau, Joan*; Tew, Yiqi; Tan, Li Peng
10:00-10:20 GGMDDC: An Audio Deepfake Detection Multilingual Dataset Purohit, Ravindrakumar M.*; Shah, Arth Juhul; Patil, Hemant

Session Room Chair
Embedded and Real-Time Systems for AI and Signal Processing Applications Room 4 -
Date Time Title Authors
06-12-2024 09:00-09:20 Accelerated Real-Time Local Maxima Detection in Video Streams Using FPGA Technology Nayazirly, Anindhita; Salomo, Yahwista*; Adiono, Trio; Syafalni, Infall; Sutisna, Nana; Mulyawan, Rahmat
09:20-09:40 A Configurable OFDM Baseband Processor for RF-UOWC System-on-Chip Adiono, Trio; Setiawan, Erwin*; Jonathan, Michael; Mulyawan, Rahmat; Sutisna, Nana; Syafalni, Infall; Popoola, Wasiu
09:40-10:00 Hammering Sound Inspection System Using HPSS and Gradient Boosting with a Wall-Climbing Robot Koyama, Nichika*
10:00-10:20 Implementation of Real Time Oscillometric Based Algorithm for Blood Pressure Measurement in Patient Monitor Adiono, Trio; Amadeus, Clarence*; Thomi, Teuku Rafifsyah; Sinaga, Sindy Novaria Cicilya

Session Room Chair
Selected Papers from APSIPA Workshop on Advanced Signal and Information Processing Room 5 -
Date Time Title Authors
06-12-2024 09:00-09:20 Automated Pseudo-Label Generation and Parallel Computing for Enhanced Few-Shot Medical Image Segmentation Do, Ha Thanh *; Nguyen Trong, Duc; Do, Tien-Dung
09:20-09:40 Enhanced Sparse Convolutional Detection Model for 3D Object Detection in Autonomous Vehicles Adapted to Traffic Conditions in Vietnam Do, Ha Thanh *; Dung, Vu Hoang; Nguyen, Kien Trung
09:40-10:00 Enhancing Cell Segmentation using Deep Learning Models by Custom Processing Techniques Do, Ha Thanh *; Nguyen, Van De; Dang Hoang, Minh Huong; Huy, Nguyễn Quang; Dinh Manh, Cuong Initail
10:00-10:20 Marker-Aware Ovarian Tumor Segmentation from Ultrasound Images Bui, Hoang-Son*; Tran, Sy-Hoang; Nguyen, Thuy-Binh; Tran, Thanh-Hai; Vu, Hai; Lan, Le Thi

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
06-12-2024 09:00-09:20 ACE-Flow: Auto Color Encoding for Enhanced Low-Light Image Restoration Qiu, Jiachen; Zuo, Yushen; Lam, Kin-Man*
09:20-09:40 PBJDT: Point-Based Joint Detection-and-Tracking Lee, Zhen-Xun; Ding, Jian-Jiun*
09:40-10:00 Capturing Dynamic Identity Features for Speaker-Adaptive Visual Speech Recognition Kashiwagi, Sara*; Tanaka, Keitaro; Morishima, Shigeo
10:00-10:20 A Byte-based GPT-2 Model for Bit-flip JPEG Bitstream Restoration Qin, Hao; SUN, Haoran; Wang, Yi*

Session Room Chair
Acoustic Scene Analysis and Signal Enhancement Based on Advanced Signal Processing and Machine Learning Room 7 -
Date Time Title Authors
06-12-2024 09:00-09:20 Successive Speaker Relative Transfer Function Estimation Through Relative Transfer Matrix in Noisy Reverberant Environments Manamperi, Wageesha*; Abhayapala, Thushara
09:20-09:40 Heavy-tailed Distributions-Based Online Semi-blind Source Separation for Nonlinear Echo Cancellation Zhang, Liyuan*; Wang, Xianrui; Yang, Yichen; Ueda, Tetsuya; Makino, Shoji; Chen, Jingdong
09:40-10:00 A Single-InputBinaural-Output Perceptual Rendering Based Speech Separation Method in Noisy Environments zheng, tianqin*; Pei, Hanchen; Pan, Ningning; Jin, Jilu; Huang, Gongping; Chen, Jingdong; Benesty, Jacob
10:00-10:20 Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human--Avatar Dialogue Systems Ishikawa, Yuto*; Take, Osamu; Nakamura, Tomohiko; Takamune, Norihiro; Saito, Yuki; Takamichi, Shinnosuke; Saruwatari, Hiroshi

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
06-12-2024 09:00-09:20 EMO-Codec: An In-Depth Look at Emotion Preservation Capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations Ren, Wenze*; Lin, Yi-Cheng; Chou, Huang-Cheng; Wu, Haibin; Wu, Yi-Chiao; Lee, Hung-yi; Lee, Chi-Chun; Wang, Hsin-Min; Tsao, Yu
09:20-09:40 Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model Park, Joonyong*; Saito, Daisuke; Minematsu, Nobuaki
09:40-10:00 Investigating the Language Independence of Voice Activity Projection Models through Standardization of Speech Segmentation Labels Sato, Yuki*; Chiba, Yuya; Higashinaka, Ryuichiro
10:00-10:20 A Preliminary Study on Analysing Mandarin Tone Values of Romance L2 Mandarin Learners Li, Wu-Hao*; Liu, Te-hsin; CHIANG, Chen Yu

Session Room Chair
Signal Processing for Drone Audition & Recent Advances in Intelligent Signal Processing Room 1 -
Date Time Title Authors
06-12-2024 10:40-11:00 Drone audition: dataset and methods for ground surface material classification using drone noise in outdoor environment Yano, Tsubasa*; Yen, Benjamin; Nakadai, Kazuhiro
11:00-11:20 Seismic-ionospheric Precursor Prediction Using Deep Learning Pham, Tung Bach*; Chang, Pao-Chi; Wang, Jia-Ching
11:20-11:40 Swarm Active Audition System with Robots and Drones for a Search and Rescue Task Nakadai, Kazuhiro*; Kumon, Makoto; Sasaki, Yoko; Hoshiba, Kotaro; Yen, Benjamin

Session Room Chair
Converging AI and Computer Vision: Innovations and Potential Room 2 -
Date Time Title Authors
06-12-2024 10:40-11:00 RepViT Based Lightweight Architecture for Distracted Driving Detection Jian, Muwei*; Ling, Yukun
11:00-11:20 HSIC as Information Compression for Training Deep Neural Network Sofi, Roshan Birjais*; Wang, Kevin I-Kai; Abdulla, Waleed
11:20-11:40 Zero-Shot Learning for Haze Removal Using Fusion of Near-Infrared and Color Images Kato, Onhi*; Kubota, Akira
11:40-12:00 Color Enhancement for the Colorblind Using Color Correction Intensity Map and Pix2pix Image Conversion Komatsu, Shu*; Kubota, Akira

Session Room Chair
Multimedia Processing Systems in the AI Era Room 3 -
Date Time Title Authors
06-12-2024 10:40-11:00 Detecting Abnormal Machine Sounds Using An Ensemble Approach with Data Augmentation Techniques Chan, Po-Cheng*; Lu, Chung-li; Wang, Jia-Ching
11:00-11:20 Leveraging Semi-Supervised Learning with BEATs Feature Extraction and Bi-GRU Classification on Heterogeneous Datasets Chen, Wei-Yu; Lu, Chung-li; Chan, Po-Cheng*; Chuang, Hsiang Feng; cheng, yu-han; Wang, Jia-Ching
11:20-11:40 Leveraging Attention Mechanisms for Breast Cancer Diagnosis akumalla, Brahma reddy*; Pham, Tung Bach; Zhuang, Yung-Yu; Prihasto, Bima; Chang, Pao-Chi; Wang, Jia-Ching
11:40-12:00 Enhanced Detection of Illegally Parked Vehicles Using YOLO and Good Feature to Track Methods Maftuh Alwafi, Fauzan; Mugi Pratama, Boby; Le, Phuong Thi; Prihasto, Bima*; Wang, Jia-Ching

Session Room Chair
Embedded and Real-Time Systems for AI and Signal Processing Applications Room 4 -
Date Time Title Authors
06-12-2024 10:40-11:00 Exploration Robot Based On YOLOv8 Algorithm Syafalni, Infall*; Winasta Sinisuka, Angelica; Kalam Amal Tauhid, Dwi; Ahmad, Farrel; Alif Putra Yasa, Muhammad; Alexander Wen, Steven; Setiawan, Erwin; Sutisna, Nana; Adiono, Trio
11:00-11:20 Optimizing Deep Q-Network for Shortest Path Computation of Mobile Robot Agents Sumarudin, A*; Sutisna, Nana; Syafalni, Infall; Riyanto Trilaksono, Bambang; Adiono, Trio
11:20-11:40 Leveraging IoT and Machine Learning for Efficient Rice Stock Monitoring and Prediction Sutisna, Nana*; Prawira Nugroho, Aditya; Jeffrey, Christopher; Ramadhana, Rizky; Mahendra, Ronggur; Jonathan, Michael; Syafalni, Infall; Adiono, Trio
11:40-12:00 Comparative Evaluation of Fine-Tuned Hybrid Transformer and Band-Split Recurrent Neural Networks for Music Source Separation Kalang Al Qalyubi, Ken; Ahmadi, Nur*; Puji Lestari, Dessi

Session Room Chair
Selected Papers from APSIPA Workshop on Advanced Signal and Information Processing Room 5 -
Date Time Title Authors
06-12-2024 10:40-11:00 Enhancing Shear Wave Propagation Analysis in Tissue with Directional Filtering of Reflected Waves Luong, Hai Quang*; Tran, Nghia Duc; Nguyen, Hiep; Sinh Cong, Lam; Tran, Duc-Tan
11:00-11:20 Structural Analysis of Asian and African Rice Panicles via Transfer Learning Dinh, Tran Hiep*
11:20-11:40 New approach for Alzheimer's disease classification using topographic maps and deep learning model Le, Quoc Anh*; Thinh, Nguyen hong
11:40-12:00 M-IRRA: A Multilingual Model for Text-based Person Search Tran, Phong Ngoc Hung; Phan, Thi-Hoai; Nguyen, Thuy-Binh; Do, Ngoc-Diep; Nguyễn, Quân Hồng; Tran, Thanh-Hai ; Duong, Thanh Thi-Hien; Le, Thi Lan*

Session Room Chair
Image, Video, and Multimedia Room 6 -
Date Time Title Authors
06-12-2024 10:40-11:00 GMNER-LF: Generative Multi-modal Named Entity Recognition Based on LLM with Information Fusion Hu, Huiyun*; Kong, Junda; Xiao, Bo; Wang, Fei; Ge, Yang; Sun, Hongzhi
11:00-11:20 WildPose: HRNet-based Lightweight and Efficient Wildlife Pose Estimation BAKANA, SIBUSISO R*; Zhang, Yongfei ; Twala, Bhekisipho
11:20-11:40 A Multi-Perceptual Learning Network for Retina OCT Image Denoising and Classification Lam, Kin-Man*

Session Room Chair
Advanced Topics for Automatic Speakers Recognition Room 7 -
Date Time Title Authors
06-12-2024 10:40-11:00 JOSEPH: PHONETIC-AWARE SPEAKER EMBEDDING FOR FAR-FIELD SPEAKER VERIFICATION JIN, Zezhong*; TU, Youzhi; Mak, Manwai
11:00-11:20 Vocal Tract Length Perturbation-based Pseudo-Speaker Augmentation Considering Speaker Variability for Speaker Verification Zou, Hengyi*; Shiota, Sayaka
11:20-11:40 Differences Between Singer and Speaker Verification: Training Singer Feature Representation Extractor Utilizing Singing Voice Characteristics Toma, Sayaka*; Ariga, Tomoki; Higuchi, Yosuke; Hayasaka, Ichiju; Shigyo, Rie; Ogawa, Tetsuji

Session Room Chair
Speech and Language Processing Room 8 -
Date Time Title Authors
06-12-2024 10:40-11:00 Peer Learning via Shared Speech Representation Prediction for Target Speech Separation Yang, Xusheng*; Zhao, Zifeng; Zou, Yuexian
11:00-11:20 Developing a Multilingual Spontaneous L2 Speech Corpus for Automated Proficiency Assessment Han, Seunghee*; Kim, Sunhee; Chung, Minhwa
11:20-11:40 Prediction of Negative User Reactions Towards System Responses During Attentive Listening Lala, Divesh*; Inoue, Koji; Kawahara, Tatsuya
11:40-12:00 Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition Chen, Jianan*; Chu, Chenhui; Li, Sheng; Kawahara, Tatsuya

Session Room Chair
Few-shot Vision, Language, and Multimedia Processing under LLMs Room 9 -
Date Time Title Authors
06-12-2024 10:40-11:00 A Noisy Context Optimization Approach for Chinese Spelling Correction Zhang, Guangwei; Xiong, Yongping; Li, Ruifan*
11:00-11:20 GVDIE: A Zero-Shot Generative Information Extraction Method for Visual Documents Based on Large Language Models Qi, Siyang*; Wang, Fei; Sun, Hongzhi; Ge, Yang; Xiao, Bo
11:20-11:40 META: Text Detoxification by leveraging METAmorphic Relations and Deep Learning Methods Choo, Alika*; Pal, Arghya; Rajanala, Sailaja; Sen, Arkendu
11:40-12:00 Visual semantic alignment network based on pre-trained ViT for few-shot image classification Zhang, Jiaming; Wu, Jijie; Li, Xiaoxu*

Session Room Chair
Poster Room 10 -
Date Time Title Authors
04-12-2024 11:00-12:20 Speech Depression Recognition from the Self-reference Effect Using LSTM with ResNet Lu, Xiaoyong*
11:00-12:20 Temporal-Spatial Correlation Analysis for Ship-Radiated Noise Based on Random Matrix Theory Feng, Qing*; Wu, Zhiqiang; Li, Xuebin; Shen, Heping; Liu, Shang; Tang, Min; Feng, Quansheng
11:00-12:20 Annotation-free Fine-tuning for Unsupervised Anomalous Sound Detection Guo, Kai*; Xie, Xiang; Zhang, Fengrun
11:00-12:20 Knowledge Augmented Attention Gating Embedding for Link Prediction Chen, Zewei; Shuhong, Chen; Li, Chen; Zheng, Xianwei*; He, Minfan; li, xutao
11:00-12:20 Detecting Coronary Artery Stenosis from Cardiac CT Images using 3D CNNs Aono, Masaki*
11:00-12:20 Effective Speech Data Augmentation Method To Improve Customer Service Representative Speech Recognition System Performance Bak, Huiyong*; Jeong, Changhyeon
11:00-12:20 Clock Reference Synchronization Techniques In Space Information Networks Liu, Lei*
11:00-12:20 LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Error Correction Using LLM Li, Sheng*; Ko, Yuka; Ito, Akinori
11:00-12:20 A Two-Stage Wall Parameters Estimation Algorithm for MIMO Through-the-Wall Radar Li, Zhirun*; Guo, Shisheng; Chen, Jiahui; Zhu, Zhihao; Qiu, Chen; Guolong, Cui; Xiang, Yutao
11:00-12:20 Tiny Object Detection Enhancement for Large-Scale Remote Sensing Imagery Zhang, Tianwei*; Gao, Lianru
11:00-12:20 Robust Watermarking via Dual Guidance Zhang, Yuhang; Li, Yuanman*; Dong, Li; Li, Xia
11:00-12:20 Region Aware Framework for Constrained Image Splicing Detection and Localization Cao, Haokun; Li, Yuanman*; Li, Xia

Session Room Chair
Poster Room 10 -
Date Time Title Authors
04-12-2024 14:00-16:00 Handling Missing Data in Limited-View Photoacoustic Tomography Using Compressive Sensing Algorithm-Based Deep Learning John, Mary; Barhumi, Imad*
14:00-16:00 Keyword spotting for dialectal speech and Introduction of wav2vec2.0 Ariga, Tomohiro*; Minakawa, Reo; Itoh, Yoshiaki; Lee, Shi-wook; Kojima, Kazunori
14:00-16:00 LCMV-based Scan-and-Sum Beamforming for Region Source Extraction Yasue, Aoto*; Yen, Benjamin; Itoyama, Katsutoshi; Nakadai, Kazuhiro
14:00-16:00 Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising Fujita, Yoto*; Nugraha, Aditya Arie; Di Carlo, Diego; Bando, Yoshiaki; Fontaine, Mathieu; Yoshii, Kazuyoshi
14:00-16:00 Performance Evaluation of Acoustic Echo and Noise Canceller with Variable-Step-Size Shared-Error NLMS Algorithm under Double-Talk Conditions Iwai, Kenta*; Nishiura, Takanobu
14:00-16:00 Augmented sound-image perception using pre-virtual-leading ultrasounds based on precedence effect Imanaka, Ryota*; Geng, Yuting; Nakayama, Masato; Nishiura, Takanobu
14:00-16:00 Virtual multi-boosted amplitude modulation toward high-pressure audible sound with parametric array loudspeakers Ikezaki, Yoto*; Geng, Yuting; Nakayama, Masato; Nishiura, Takanobu
14:00-16:00 Analyzing House Music: Relations of Audio Features and Musical Structure Wulf, Justin Tomoya; Kitahara, Tetsuro*
14:00-16:00 Teager Energy Cepstral Coefficients for Spoken Language Identification Shah, Arth Juhul*; Yadav, Savita Hiralal; Patil, Hemant
14:00-16:00 Deep Speech Synthesis from Multimodal Articulatory Representations Wu, Peter*; Yu, Bohan; Scheck, Kevin; Black, Alan; Krishnapriyan, Aditi S; Chen, Irene Y; Schultz, Tanja; Watanabe, Shinji; Anumanchipalli, Gopala Krishna
14:00-16:00 A Parameter-free model for long-term concrete creep prediction Li, Conghui*; Lim, Chern Hong; Wang, Xin
14:00-16:00 Voice Liveness Detection Using Linear Frequency Residual Cepstral Coefficients Shah, Arth Juhul*; Mandaviya, Nandini; Patil, Hemant
14:00-16:00 An isolated Vietnamese Sign Language Recognition method using a fusion of Heatmap and Depth information based on Convolutional Neural Networks Nguyen, Phuoc Xuan; Nguyen, Thi-Huong; Tran, Duc-Tan; Bui, Tien Son Tien; Nguyen, VanToi*
14:00-16:00 GILED: Lesion Detection of Gastrointestinal Tract from Endoscopic Images and Medical Notes Hoang, Vu-An*; Tran, Minh-Hanh; Dao, Viet Hang; Tran, Thanh-Hai
14:00-16:00 Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label Yutani, Tsugumasa*

Session Room Chair
Poster Room 10 -
Date Time Title Authors
04-12-2024 16:20-18:00 Gamma-VAE: Speech representation based on VAE assuming gamma distribution for both latent variables and observation Imaichi, Nanako*; Nakashika, Toru
16:20-18:00 Does Brain Atlas Choice Matter? An Empirical Study in Alzheimer's Diagnosis Using FDG-PET Images Pham, MINH TUAN; Adel, Mouloud; Trung, Nguyen Linh*; Guedj, Eric
16:20-18:00 Transformer Attention Matrix Multiplication Design using 4x4 Systolic Arrays Afif, Muhammad Sayyid *; Syafalni, Infall; Sutisna, Nana; Adiono, Trio
16:20-18:00 Quefrency Approach to Audio Deepfake Detection Singhal, Kanishq; Goyal, Aditya; Gupta, Priyanka*
16:20-18:00 A SEMI-SUPERVISED LOW-LIGHT IMAGE ENHANCEMENT WITH COLOR GUIDANCE Wang, Yuxin*; Yang, Yang
16:20-18:00 Cloud Removal in Hyperspectral Satellite Images Using Low-rank Tensor Completion Vo, Chuong Hoang*; Truong, Mai Thanh Nhat; Lee, Chul
16:20-18:00 Block Refinement Learning for Improving Early Exit in Autoregressive ASR Kawata, Naotaka*; Orihashi, Shota; Suzuki, Satoshi; Tanaka, Tomohiro; Ihori, Mana; Makishima, Naoki; Yamane, Taiga; Masumura, Ryo
16:20-18:00 Color Guided Disease Segmentation for Plant Images Jang, Soyeon*; Kim, Jong-Ok
16:20-18:00 Performance Optimization in the Cascade of VAD and ASR Systems: A Study on Evaluation and Alignment Strategies Lin, Zhentao; Chen, Zihao*; Zeng, Bi; Chen, Leqi; Cai, Jia

Session Room Chair
Poster Room 10 -
Date Time Title Authors
05-12-2024 10:20-12:00 StylebookTTS: Zero-Shot Text-to-Speech Leveraging Unsupervised Style Representation Yoon, Juhwan*; Lim, Hyungseob; Cha, Hyeonjin; Kang, Hong-Goo
10:20-12:00 GENERATING PHONETIC TRANSCRIPTIONS FOR KOREAN ENGLISH L2 LEARNERS USING MULTIPLE SELF-SUPERVISED-MODEL-BASED ASR SYSTEMS AND ROVER METHOD Kim, Jong In*
10:20-12:00 Adaptive Time-Varying Graph Learning for Traffc Flow Data Based on Anomaly Moment Detection Shuhong, Chen; Chen, Zewei; Li, Chen; Zheng, Xianwei*; He, Minfan; li, xutao
10:20-12:00 Cuisine Image Synthesis with Improved Multiscale GANs Guided by CLIP Xia, Weiyi*; Fujita, Satoru
10:20-12:00 A Novel LLM-based Two-stage Summarization Approach for Long Dialogues yin, yuan jhe J*; Chen, Bo-Yu; Chen, Berlin
10:20-12:00 Impulse response transforming method to control distance perception based on direct-to-reverberant energy ratio Takahashi, Toru*; Nakayama, Masato
10:20-12:00 Data-Driven Sound Field Reproduction for Higher-Order Mode Matching Using a Circular Loudspeaker Array Kawase, Keiko*; Sato, Gen; Tsunokuni, Izumi; Ikeda, Yusuke
10:20-12:00 Layer-Wise Feature Distillation with Unsupervised Multi-Aspect Optimization for Improved Automatic Speech Assessment Wu, Chung-Wen*; Chen, Berlin
10:20-12:00 Sparse Blind Deconvolution and Demixing via Block Majorization-Minimization Chen, Mengting*; Zhao, Ziping

Session Room Chair
Poster Room 10 -
Date Time Title Authors
05-12-2024 14:00-15:20 An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition Lai, Songjiang*; Cheung, Tsun-Hin
14:00-15:20 Detecting Abnormal Machine Sounds Using An Ensemble Approach with Data Augmentation Techniques Chan, Po-Cheng*; Lu, Chung-li; Wang, Jia-Ching
14:00-15:20 Learning a Sequence of Cursive-Style Japanese Characters in Classical Literary Works Fujita, Satoru*; Oyama, Keizo
14:00-15:20 Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection Cheung, Tsun-Hin*
14:00-15:20 Development of Simple Algorithm to Detect and Filter Motion Artifact Noise in Non-Invasive Blood Pressure (NIBP) Measurement Adiono, Trio; Muhlis, Rd. Elviana La'salina; Amadeus, Clarence*; Sinaga, Sindy Novaria Cicilya
14:00-15:20 MYMV: A Music Video Generation System with User-preferred Interaction Lee, Kyungjune*; Jang, Mingyu; Huh, Jungwoo; Lee, Jeonghaeng; Choi, Seok Keun; Lee, Sanghoon
14:00-15:20 Text-guided Visual Prompt Tuning with Masked Images for Facial Expression Recognition Dong, Rongkang*; Yang, Cuixin; Lam, Kin-Man
14:00-15:20 Fine-Grained Privacy-Preserving Image Retrieval in Cloud Environment Liang, Jing; Wang, Libo; LI, PEIYA*
14:00-15:20 Measurement of Relative Transfer Function for Own Voice in Head-Mounted Microphone Array Kazama, Kyoka*; Nakashima, Taishi; Ono, Nobutaka
14:00-15:20 Enhancing Early Plant Disease Detection: 1D to 2D Spectral Transformations Mohd Hilmi Tan, Mas Ira Syafila*; Wong, Lai-Kuan; Loh, Yuen Peng; Pee, Chih-Yang

Session Room Chair
Poster Room 10 -
Date Time Title Authors
05-12-2024 16:40-18:00 KhmerFormer: Multi-Scale CNNs-Transformer with External Attention for Ancient Khmer Palm Leaf Isolated Glyph Classification Thuon, Nimol*; Du, Jun
16:40-18:00 DDPMVC: Non-parallel any-to-many voice conversion using diffusion encoder Hatakeyama, Ryuichi*; Okuda, Kohei; Nakashika, Toru
16:40-18:00 MGVul: a Multi-Granularity Detection Framework for Software Vulnerability Zhao, Xiangyu*; Yanjun, Li; Zha, Zhengpeng; Ling, Zhen-Hua
16:40-18:00 Dysarthria Severity Classification Using Phase Based Features of LP Residual Mannepalli, Rohini Sri*; Pusuluri, Aditya; Patil, Hemant
16:40-18:00 A Joint Graph Signal and Laplacian Denoising Network Inspired by Majorization-Minimization Zhang, Zepeng; Zhao, Ziping*
16:40-18:00 Comparative Analysis of Glottal and Vocal Tract Features in Dysarthria Geeta Sai Sahasra, Indukuri ; Kadwasra, Swapna; Srivastava, Arushi*; Pusuluri, Aditya; Patil, Hemant
16:40-18:00 Contrast-Aware DCT for Image Enhancement with JPEG Compatible Coding Hayashi, Kohei*; Honda, Soichiro; Kamei, Hirokazu; Maeda, Yoshihiro; Fukushima, Norishige
16:40-18:00 Non-blind Deblurring Using Probabilistic Models and Spatial Adaptive Restoration Liao, Chun-Lin; Ding, Jian-Jiun*; Shih, Chun-Jen
16:40-18:00 Comparative Analysis of Voice Mimicry Attacks by High- and Low-Skilled Imitators on Speaker Verification Systems Iwano, Koji*; Komuro, Wakana; Gomi, Manami
16:40-18:00 Multi-band Satellite Image Analysis for Multi-label Classification Abdul Rauf, Sarah Shahmina ; Mohd Hilmi Tan, Mas Ira Syafila; Loh, Yuen Peng*

Session Room Chair
Poster Room 10 -
Date Time Title Authors
06-12-2024 9:00-10:20 LoFLAT: Local Feature Matching using Focused Linear Attention Transformer Cao, Naijian; He, Renjie*; Dai, Yuchao; He, Mingyi
9:00-10:20 Inference Efficient Source Separation Using Input-dependent Convolutions Seki, Shogo*; Li, Li
9:00-10:20 High and Low Frequency Region Separation Method for Adaptive Image Expansion Luo, Shao-Yun; Chen, Kuei-Chen; Ding, Jian-Jiun*; Lee, Cheng-Che; Lee, Hsin-Jung
9:00-10:20 Unleashing Attributes-content Adaptation with Multi-color Spaces for Food Photo Aesthetic Assessment Hidayati, Shintami C*; Firdaus, Muhammad; Dianto, Riki; Sarworsri, Sarworsri
9:00-10:20 An Explainable Raman Spectral Classification Pipeline via NMF and SHAP: A Case Study of Pen Ink Colors Lapsatid, Pongpon; Deepaisarn, Somrudee*; Eiamchai, Pitak
9:00-10:20 Pressure Matching Using Data-Driven Estimation for Sound Fields and Transfer Functions Horikoshi, Koki*; Sato, Gen; Tsunokuni, Izumi; Ikeda, Yusuke
9:00-10:20 Acoustic model adaptation in noisy and reverberated scenarios using multi-task learned embeddings Raikar, Aditya; Soni, Meet; Panda, Ashish*; Kopparapu, Sunil Kumar
9:00-10:20 Generalized SpecAugment: Robust Online Augmentation Technique for End-to-End Automatic Speech Recognition Soni, Meet; Panda, Ashish*; Kopparapu, Sunil Kumar
9:00-10:20 ComplexFace: A Public Visible-Thermal Face Dataset with Real-Life Complexity He, Jiajin*; Dong, Chengxi; Cai, Yunqi; Wang, Dong

Session Room Chair
Poster Room 10 -
Date Time Title Authors
06-12-2024 10:40-12:00 PPHiFi-TTS: Phonetic Preserved High-Fidelity Text-to-Speech for Long-Term Speech Dependencies Purohit, Ravindrakumar M.*; Vaghera, Dharmendra; Shah, Arth Juhul; Patil, Hemant
10:40-12:00 Physics-Informed Neural Networks for Estimation of Scattered Sound Fields with Boundary Condition Onizawa, Ryosuke*; Sato, Gen; Tsunokuni, Izumi; Ikeda, Yusuke
10:40-12:00 Cross Lingual Speech Representation for Infant Cry Classification Chaudhari, Hiya*; Shah, Arth Juhul; Patil, Hemant
10:40-12:00 Data-Driven Physics-Informed Neural Network for Sound Field Estimation in Rooms of Arbitrary Size Sato, Gen*; Ikeda, Yusuke
10:40-12:00 GPGAN-VC: Enhancing Voice Conversion using Gradient Penalty Purohit, Ravindrakumar M.*; Vaghera, Dharmendra; Patil, Hemant
10:40-12:00 Improved Cassava Plant Disease Classification with Leaf Detection Chai, Ming Xuan; Fam, Yao Deng; Octaviano, Quinito Norman; Pee, Chih-Yang*; Wong, Lai-Kuan; Mohd Hilmi Tan, Mas Ira Syafila; See, John
10:40-12:00 A Study on Packet-Level Index Modulation Using Frequency Offsets within a LoRaWAN Channel ohta, mai*; Matsuura, Hiroki; Fujii, Takeo
10:40-12:00 Teager Energy Cepstral Coefficient for Audio Deepfake Detection Mahyavanshi, Ritik Pankaj *; Reddy, Mahesh; Shah, Arth Juhul; Patil, Hemant
10:40-12:00 Development and Evaluation of a Semi-autonomous Parallel Attentive Listening System Lala, Divesh*; Inoue, Koji; Kawai, Haruki; Pang, Zi Haur; Elmers, Mikey; Kawahara, Tatsuya
10:40-12:00 New approach on Smiling faces with Domain Transfer in Latent Space Siu, Wan-Chi*; DUAN, Mingfei; Hui, Chun Chuen
10:40-12:00 High-Quality Facial Pose Generation with Latent Space Processing Siu, Wan-Chi*; Cheng, Wing-Ho; Chan, H Anthony
10:40-12:00 Agent Attention Feature Reconstruction Network for Fine-Grained Few-Shot Image Classification Chang, Dongfei; Wu, Jijie; Li, Xiaoxu*

Session Room Chair
Tutorial Room 1 -
Date Time Title Speakers
3-Dec 09:30-11:30 [T01] EEG Signal Processing and Machine Learning Saeid (Saeed) Sanei
13:00-15:00 [T03] Human-Centric RF Sensing: Pose Estimation, ECG Monitoring and Self-Supervised Learning Yan Chen, Dongheng Zhang, Zhi Lu
15:30-17:30 [T04] Emerging Topics for Speech Synthesis: Versatility and Efficiency Yuki Saito, Shinnosuke Takamichi, Wataru Nakata

Session Room Chair
Tutorial Room 2 -
Date Time Title Speakers
3-Dec 09:30-11:30 [T02] From Statistical to Causal Inferences for Time-Series and Tabular Data Pavel Loskot



More details can be found at Tutorial

Session Room Chair
Winter School - Mingyi He, Yuan Wu, Yuanman Li
Date Time Title Authors
3-Dec 13:00-14:00 Overview of Neural Network AI Mingyi He
14:00-15:00 Hopfield Neural Network Fundamental for Machine Learning Mingyi He
15:30-16:30 Deep Learning for Image forensics Bonnie Law
16:30-17:30 Generative Modeling and Learning for Conversational AI Jen-Tzung Chien



More details can be found at Winter School

Session Room Chair
Keynote - -
Date Time Title Speaker
4-Dec 09:40-10:40 Rate-Distortion Optimization in Video/Image Compression: From Temporal Dependency Formulation to Learning-based Modeling Zhu Ce
5-Dec 09:00-10:00 Learning from Unreliable Sources via Crowdsourcing Georgios Giannakis
15:40-16:40 AI and Cognitive Health Helen Meng

More details can be found at Keynote

Session Room Co-Chairs
Women's Forum Room 1 Mingyi He, Bonnie Law
Date Time Title Speakers
5-Dec 12:20-12:40 Engineering Her Future, Engineering Our Future Helen Meng
12:40-13:00 My working life as a women in Engineering Sansanee Auephanwiriyakul
13:00-13:20 A few suggestions for our young women professionals Hong (Vicky) Zhao

More details can be found at Women's Forum


Session Room Chair
Industrial Forum Room 4 Chris Gwo Giun Lee
Date Time Title Speaker
4-Dec 14:00-14:35 Research, Clinical, and Business Challenges of AI and Machine Learning Applications in Medicine – A Case Study in Metastatic Cancer and Infectious Diseases Detections by Microscopic Imaging. Yusen Eason Lin
14:35-15:10 Smart Rings: Pioneering Biomedical Technologies for Transformative Healthcare Applications Hao Wu
15:10-15:45 An Industry Perspective: Video Analytics meets Generative AI Jianquan Liu
15:45-16:00 Panel Discussion: AI Frontiers: From Cloud to Edge and Biomedical -

More details can be found at Industrial Forum