Talks and presentations

Talk on group meeting about Egocentric Video/Image Quality Assessment

December 26, 2024

Talk, The internal group meeting, Dalian, China

This study report primarily focuses on the aesthetic assessment of images and first-person videos, presented by Li Zhizhen. The following is a summary of the main content: Introduction: The significance of aesthetic assessment and the research background are introduced. Transition from Traditional to Deep Learning: Traditional aesthetic assessment methods (such as average brightness, brightness contrast, global edge distribution, etc.) are discussed in comparison to deep learning methods, highlighting the advantages of deep learning in image aesthetic assessment. Relevant Datasets for Image Aesthetic Assessment: Various datasets, such as AADB and AVA, are introduced, comparing their characteristics, including rater IDs, the proportion of real photographs, and attribute labels. Analysis of Similarities and Differences in Aesthetic Assessment between Images and First-Person Videos: An analysis of the similarities and differences in aesthetic assessment between images and first-person videos is conducted, pointing out the uniqueness of first-person videos. Ego4D Dataset and Aesthetic Assessment Strategy for First-Person Videos: The characteristics of the Ego4D dataset are presented, including its geographical coverage and diversity of activities, along with five challenging tasks such as situational memory and future prediction. Conclusion and Future Outlook: The findings of the research are summarized, and future research directions in the field of aesthetic assessment are anticipated. Overall, this report explores the integration of traditional methods and deep learning in aesthetic assessment for both images and videos, emphasizing the significance of datasets and the potential for future research. You can visit PPT here.

Talk on group meeting about Mamba-based works

December 04, 2024

Talk, The internal group meeting, Dalian, China

The report was presented by me, focusing on “The Origins and Related Work of the Transition from SSM to Mamba.” The main content includes: 1. Necessity of SSM: A discussion on why SSM (State Space Model) is still needed, emphasizing its essence as a CNN (Convolutional Neural Network) transformed into an RNN (Recurrent Neural Network) and introducing the concept of nonlinear time-varying systems. 2. Definition of SSM: SSM is characterized as a model based on Mamba, highlighting its effectiveness in handling sequential data. 3. Transition from SSM to S4 and then to Mamba: An introduction to the efficiency of the S4 model and Mamba’s capability for linear time series modeling, discussing their differences in attention mechanisms. 4. Mamba-Based Work: A summary of various projects related to Mamba, pointing out their similarities and differences in handling patches, 1-D sequences, positional encoding, and framework structures. 5. Complexity and Computational Efficiency: An analysis of the differences in computational complexity between Transformers and Mamba, emphasizing Mamba’s advantages in processing large-sized images. You can visit PPT here.

Talk on group meeting about Segment Anything Model

June 28, 2024

Talk, The internal group meeting, Dalian, China

The meeting discussed the Segment Anything model, which is a groundbreaking project characterized by a vast data engine known as SA-1B, containing 100 million masks and 11 million images. This system can be easily constructed and run locally in real-time, with a processing speed of 50 milliseconds per CPU. It supports visual prompt interactions and provides multiple data annotation methods, including manual assistance, semi-automated (similar to LabelStudio), and fully automated approaches. The model’s output is divided into three parts: the overall, sub-parts, and sections. Its architecture includes a MASK decoder (similar to MaskFormer), an image encoder (ViT-H), and a prompt encoder (CLIP), supporting prompt tokens and attention mechanisms between images and tokens. You can visit PPT here.

Talk on group meeting about video stablization

September 01, 2023

Talk, The internal group meeting, Dalian, China

This ppt primarily discusses the relevant research and methods related to video stabilization and stabilization technology. The following are the main points summarized: Introduction to Video Stabilization/Stabilization: The definitions of video stabilization and anti-shake technology and their significance. The classification of techniques includes Electronic Image Stabilization (EIS), Optical Image Stabilization (OIS), and Accelerometer-based Stabilization (AIS), each with its advantages and disadvantages. The anti-shake approach of AIS includes motion estimation, smoothing processing, and stable generation. Application Fields: Video stabilization technology has wide-ranging applications in several fields, such as the OIS anti-shake processing in the iPhone 14 Pro. Recent Notable Works: A list of significant research achievements in the field of video stabilization in recent years, including 3D video stabilization, depth camera stabilization, and online video stabilization. Core Concepts of MeshFlow and Real-time Video-Stitching: MeshFlow focuses on online video stabilization with minimal latency. Real-time Video-Stitching involves the computation of affine transformations and accumulation transformations among multiple cameras. Core Concept of FuSta. You can visit here.

Talk on group meeting about ChatVideo

July 13, 2023

Talk, The internal group meeting, Dalian, China

The report, published by Yifei Cao on July 13, 2023, discusses why Visual ChatGPT is unsuitable for handling video tasks and presents key insights. It mentions the relevant components of OmiTracker and OmiVL, focusing on the generation of the Tracklets database, which specifically includes the following fields: ID (primary key), Category (trajectory category), Appearance (trajectory segment instance), Motion (motion of the trajectory segment), Trajectory (trajectory of the trajectory segment), and Audio (applicable only to trajectory segments containing complete videos). You can visit PPT here.