Meta Unveils Segment Anything Model 2 (SAM 2)

In a significant leap forward for computer vision technology, Meta has introduced Segment Anything Model 2 (SAM 2), the next generation of its groundbreaking Segment Anything Model (SAM) for images. SAM 2 not only retains all the powerful features of its predecessor, such as promptable, zero-shot generalization, and fast inference, but also extends these capabilities to video. This makes SAM 2 the first unified model for real-time, promptable object segmentation in images and videos.

SAM 2 is designed to identify which pixels belong to a target object in an image or video, enabling it to segment any object and consistently follow it across all frames of a video in real-time. This breakthrough technology has the potential to unlock new possibilities in video editing and generation, as well as enable innovative experiences in mixed reality.

The development of SAM 2 was driven by the need for a model that could handle the complexities of video segmentation, where objects can move fast, change in appearance, and be concealed by other objects or parts of the scene. To achieve this, Meta built a data engine that improves the model and data via user interaction, resulting in the collection of the largest video segmentation dataset to date.

SAM 2 is available today under the Apache 2.0 license, allowing anyone to use it to build their own experiences. The model comes with a range of features including image prediction APIs that closely resemble SAM for image use cases, and a video predictor with APIs for promptable segmentation and tracking in videos.

The release of SAM 2 is a testament to Meta’s commitment to open science and the advancement of computer vision technology. By making this powerful tool widely available, Meta is enabling researchers and developers around the world to explore new capabilities and use cases, driving innovation in the field of computer vision.

In conclusion, SAM 2 represents a significant step forward in the field of computer vision, offering a unified model for real-time, promptable object segmentation in images and videos. With its advanced capabilities and open availability, SAM 2 is poised to revolutionize the way we interact with and analyze visual data.