Talk on group meeting about Mamba-based works
Date:
The report was presented by me, focusing on “The Origins and Related Work of the Transition from SSM to Mamba.” The main content includes: 1. Necessity of SSM: A discussion on why SSM (State Space Model) is still needed, emphasizing its essence as a CNN (Convolutional Neural Network) transformed into an RNN (Recurrent Neural Network) and introducing the concept of nonlinear time-varying systems. 2. Definition of SSM: SSM is characterized as a model based on Mamba, highlighting its effectiveness in handling sequential data. 3. Transition from SSM to S4 and then to Mamba: An introduction to the efficiency of the S4 model and Mamba’s capability for linear time series modeling, discussing their differences in attention mechanisms. 4. Mamba-Based Work: A summary of various projects related to Mamba, pointing out their similarities and differences in handling patches, 1-D sequences, positional encoding, and framework structures. 5. Complexity and Computational Efficiency: An analysis of the differences in computational complexity between Transformers and Mamba, emphasizing Mamba’s advantages in processing large-sized images. You can visit PPT here.