Recent inversion methods have shown that real images can be inverted into StyleGAN's latent space and numerous edits can be achieved on those images thanks to the semantically rich feature representations of well-trained GAN models. However, extensive research has also shown that image inversion is challenging due to the trade-off between high-fidelity reconstruction and editability.
In this paper, we tackle an even more difficult task, inverting erased images into GAN's latent space for realistic inpaintings and editings. Furthermore, by augmenting inverted latent codes with different latent samples, we achieve diverse inpaintings. Specifically, we propose to learn an encoder and mixing network to combine encoded features from erased images with StyleGAN's mapped features from random samples. To encourage the mixing network to utilize both inputs, we train the networks with generated data via a novel set-up. We also utilize higher-rate features to prevent color inconsistencies between the inpainted and unerased parts.
We run extensive experiments and compare our method with state-of-the-art inversion and inpainting methods. Qualitative metrics and visual comparisons show significant improvements.
First stage framework includes trainable image encoder and mixing network, and frozen StyleGAN's mapping and generator networks. Our encoder takes an erased image and binary mask to embed the image into StyleGAN's latent space.
We also sample a latent code via the mapping network to achieve stochasticity. The mixing network combines the available information of the erased image from the encoder and the missing part from the mapping network.
The mixed encoded latent representations are fed to the Generator via the instance normalization layers to output the fake image. There is a final step where the input image and the fake image are combined based on the mask.
We compare our inpainting results with the state-of-the-art models below and show the diverse inpainting and editing results of our model on the FFHQ dataset. Moreover, inpainting results on the AFHQ dataset are shown in the last row.