CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

TL;DR: We propose CaPa, a novel method for generating high-quality 4K textured mesh under only 30 seconds, providing 3D assets ready for commercial applications such as games, movies, and VR/AR.

Teaser Video

Pipeline Overview

Abstract

The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce CaPa, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.

Examples

Input

Ours (~30 sec)

Unique3D (~2 min)

SF3D (~10 sec)

We compare CaPa with state-of-the-art Image-to-3D methods. Here, all the assets are converted to polygonal mesh, using its official code. CaPa significantly outperforms both geometry stability and visual fidelity, especially for the back and side view.

Scalability & Adaptability

PBR-aware 3D asset Generation

Light Environment:

Combined with the off-the-shelf color-to-PBR decomposition model (e.g., RGB-X), CaPa's pipeline can generate high-quality PBR-aware 3D assets.

Texture Editing

Original	Edited w/ text prompt ("orange sofa, orange pulp")

CaPa's texturing stage can be further extended to the 3D texture editing from the text prompt.

Citation

BibTeX

@article{heo2025capa,
title = {CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation},
author = {Hwan Heo and Jangyeong Kim and Seongyeong Lee and Jeong A Wi and Junyoung Choi and Sangjun Ahn},
journal = {arXiv preprint arXiv:2501.09433},
year = {2025},
}