In this paper we propose a method for estimating geometry, lighting and albedo from a single image of an uncontrolled outdoor scene. To do so, we combine state-of-the-art deep learning based methods for single image depth estimation and inverse rendering. The depth estimate provides coarse geometry that is refined using the inverse rendered surface normal estimates. Combined with the inverse rendered albedo map, this provides a model that can be used for novel view synthesis with both viewpoint and lighting changes. We show that, on uncontrolled outdoor images, our approach yields geometry that is qualitatively superior to that of the depth estimation network alone and that the resulting models can be re-illuminated without artefacts.
Note: the published version of the paper contains some errors in the equations. The author preprint version linked here corrects these errors and is consistent with the linked implementation in the github repository.