Vision foundation models (VFMs), such as the segment anything model (SAM), allow zero-shot or interactive segmentation of visual contents; thus, they are quickly applied in a variety of visual scenes.