CaPa: Carve-n-Paint Synthesis
for Efficient 4K Textured Mesh Generation

Online Demo is Open! & Stay tuned for more powerful model releases!
Hwan Heo, Jangyeong Kim, Seongyeong Lee, Jeong A Wi, Junyoung Choi, Sangjun Ahn
Graphics AI Lab, NCSOFT Research

TL;DR: We propose CaPa, a novel method for generating high-quality 4K textured mesh under only 30 seconds, providing 3D assets ready for commercial applications such as games, movies, and VR/AR.

Teaser Video


Pipeline Overview

CaPa Pipeline

Abstract

The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce CaPa, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.

Examples

Input
Ours (~30 sec)
Unique3D (~2 min)
SF3D (~10 sec)

We compare CaPa with state-of-the-art Image-to-3D methods. Here, all the assets are converted to polygonal mesh, using its official code. CaPa significantly outperforms both geometry stability and visual fidelity, especially for the back and side view.


Scalability & Adaptability


PBR-aware 3D asset Generation


Texture Editing

Original Edited w/ text prompt ("orange sofa, orange pulp")


Citation


BibTeX

@article{heo2025capa,
  title = {CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation},
  author = {Hwan Heo and Jangyeong Kim and Seongyeong Lee and Jeong A Wi and Junyoung Choi and Sangjun Ahn},
  journal = {arXiv preprint arXiv:2501.09433},
  year = {2025},
}