CAMFusion: Context-Aware Multi-Modal Fusion Framework for Detecting Sarcasm and Humor Integrating Video and Textual Cues
CAMFusion: Context-Aware Multi-Modal Fusion Framework for Detecting Sarcasm and Humor Integrating Video and Textual Cues
Blog Article
Detecting sarcasm and humor in Bangla videos requires a deep understanding of both linguistic and cultural contexts.Existing methodologies Assessment of the quality of the healing process in experimentally induced skin lesions treated with autologous platelet concentrate associated or unassociated with allogeneic mesenchymal stem cells: preliminary results in a large animal model often fall short due to their reliance on single-modal data, which limits their ability to capture the intricate interplay between visual and textual cues.To address these challenges, we propose a new multi-modal framework designed explicitly for Bangla sarcasm and humor detection.This framework includes developing a novel dataset comprising diverse video clips annotated across humor, sarcasm, and normal categories.This dataset is vital for training and evaluating models tailored to the Bangla language.
Our approach introduces the Context-Aware Multi-modal Fusion Framework, which effectively integrates visual features extracted using Efficient Fusion Time-Distributed MobileNetV2 using 0.985M parameters with textual features processed through Bidirectional LSTM and GRU layers, achieving an accuracy of 90%.Additionally, we developed the Bangla Text Extraction Algorithm, which enhances text extraction from complex video frames, ensuring that critical contextual information is captured.Generability tests conducted on multiple Bangla multi-modal datasets demonstrate substantial performance improvements, underscoring the robustness and adaptability of our model.These findings Environmental protection tax and total factor productivity—Evidence from Chinese listed companies highlight the effectiveness of our system in detecting sarcasm and humor in Bangla videos, paving the way for advancements in computational linguistics and cultural understanding.