This paper shows how to do stereo matching in the particularly difficult domain of laparoscopic surgery, where the classical stereo assumptions break down: tissue is textureless or specular, illumination changes rapidly with camera motion, and the scene contains thin instruments with sharp depth discontinuities. A transformer-based matching module is used to aggregate context across the whole image, combined with a new laparoscopic stereo dataset, producing noticeably better depth estimates than CNN-based stereo networks designed for driving scenes.

No comments:
Post a Comment
Note: only a member of this blog may post a comment.