Refining 3D Human Texture Estimation from a Single Image.

Bilkent University

RefineTex estimates high-quality 3D human texture from low-resolution images

Abstract

Estimating 3D human texture from a single image is essential in graphics and vision. It requires learning a mapping function from input images of humans with diverse poses into the parametric uv space and reasonably hallucinating invisible parts. To achieve a high-quality 3D human texture estimation, we propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network. Both offsets and the deformable convolution are deeply supervised.

Additionally, we describe a novel cycle consistency loss that improves view generalization. We further propose to train our framework with an uncertainty-based pixel-level image reconstruction loss, which enhances color fidelity.

We compare our method against the state-of-the-art approaches and show significant qualitative and quantitative improvements.

Method


The overall framework. We introduce a deformable convolution-based refinement module where offsets are learned via an attention-based deep network Xu et. al. 2021. TThis framework can handle the challenges of mapping unaligned spatially diverse input images into fixed parametric uv coordinates.

Results

RefineTex predictions and others

Input

HPBTT

RSTG

Texformer

Ours

Input

HPBTT

RSTG

Texformer

Ours

Input

HPBTT

RSTG

Texformer

Ours

Input

HPBTT

RSTG

Texformer

Ours

Input

HPBTT

RSTG

Texformer

Ours

Input

HPBTT

RSTG

Texformer

Ours

Related Links

In the paper, we compared our results mainly with the following ones:

Texformer. We also would like to thank the authors for open-sourcing their code. We built our codebase on the authors' released code.

We additionally compared our work with HPBTT and RSTG

Acknowledgement

A. Dundar was supported by Marie Skłodowska-Curie Individual Fellowship.

>