> The utility of optical flow (i.e.,
the instantaneous velocity of pixels [16]) toward this goal
has long been obvious, yet it has remained challenging to
upgrade flows into long-range tracks.
This sentence from the paper makes me feel a little bad that I don't understand why this goal is obvious. I am not tracking why we are tracking pixels.
Is this basically a competing technology with YOLO[1] or SAM[2]?
Object segmentation and tracking is such a natural and 'automatic' part of our visual perception that it's difficult to intuit how challenging it is to do with software.
> The utility of optical flow (i.e., the instantaneous velocity of pixels [16]) toward this goal has long been obvious, yet it has remained challenging to upgrade flows into long-range tracks.
This sentence from the paper makes me feel a little bad that I don't understand why this goal is obvious. I am not tracking why we are tracking pixels.
Is this basically a competing technology with YOLO[1] or SAM[2]?
[1]: https://en.m.wikipedia.org/wiki/You_Only_Look_Once
[2]: https://ai.meta.com/sam2/
Edit: added annotations, should've done that initially