ImaginateAR: AI‑Assisted In‑Situ Authoring in Augmented Reality

Jaewook Lee; Filippo Aleotti; Diego Mazala; Guillermo Garcia‑Hernando; Sara Vicente; Oliver James Johnston; Isabel Kraus‑Liang; Jakub Powierza; Donghoon Shin; Jon E. Froehlich; Gabriel Brostow; Jessica Van Brummelen

Abstract

While augmented reality (AR) enables new ways to play, tell stories, and explore ideas rooted in the physical world, authoring personalized AR content remains difficult for non-experts, often requiring professional tools and time. Prior systems have explored AI-driven XR design but typically rely on manually defined VR environments and fixed asset libraries, limiting creative flexibility and real-world relevance. We introduce ImaginateAR, the first mobile tool for AI-assisted AR authoring to combine offline scene understanding, fast 3D asset generation, and LLMs—enabling users to create outdoor scenes through natural language interaction. For example, saying “a dragon enjoying a campfire” (P7) prompts the system to generate and arrange relevant assets, which can then be refined manually. Our technical evaluation shows that our custom pipelines produce more accurate outdoor scene graphs and generate 3D meshes faster than prior methods. A three-part user study (N=20) revealed preferred roles for AI, how users create in freeform use, and design implications for future AR authoring tools. ImaginateAR takes a step toward empowering anyone to create AR experiences anywhere—simply by speaking their imagination.

Video

Example Creations

We first showcase AR scenes authored using ImaginateAR. These include 6 scenes created by the research team for the purposes of proof by demonstration, as well as 24 scenes authored by the participants in our user study (N=20 + 4 pilot) during a free-form authoring phase.

A proof by demonstration showing screenshots of ImaginateAR with various AR assets in real-world scenes, such as a colorful ‘silly hat’ on a real-world, metallic pig statue.

Six example creations from our technical evaluation, situated in a park, schoolyard, playground, shopping center, and backyard. Each scene was first generated with AI tools, then refined with light manual adjustments to reflect typical ImaginateAR use. Some are whimsical (A, F), while others are educational (C, E) or playful (B, D).

A collage of 24 AR scenes created by participants during Part 2 of the user study. Examples include ‘a dog enjoying a campfire’, ‘animal kingdom with cat, horse, and fox’, and ‘a dancing T-Rex on grass’.

AR experiences created by participants (N=20 + 4 pilot) while interacting with the ImaginateAR prototype. Users were encouraged to create freely without limitations. 'PP' denotes pilot participants and 'P' study participants.

System Overview

ImaginateAR integrates three technical innovations: (1) Outdoor scene understanding using enhanced OpenMask3D with GPT‑4o semantic labeling and HDBSCAN clustering to build structured scene graphs; (2) Fast 3D mesh generation via GPT‑4o prompt expansion, reference image synthesis, segmentation (DIS) and mesh lifting (InstantMesh); (3) LLM‑driven speech interaction, enabling users to place and refine assets through natural spoken commands in real time.

Diagram of the 3D scene understanding pipeline. Point cloud becomes an initial set of masks through 3D mask prediction, which becomes semantic point cloud thorugh filtering and classification, then becomes final set of masks through clustering. Result is 3D bounding boxes of objects’ corresponding masks.

Diagram of the 3D scene understanding pipeline. Given an input point cloud, we first estimate 3D masks. Next, we assign a semantic label to each mask using a VLM and propagate the label to all points within the mask, producing a semantic point cloud. We then cluster nearby points with the same label to infer the final set of 3D masks, from which we extract 3D bounding boxes. For visualization, we show only the bounding boxes, not the underlying masks. The Pavement box is enclosed within the Road box and is therefore not visible.

Result of the 3D scene understanding module on Vase, House, and Garden 3D environments. Different real-world features such as flowers, sidewalks, trees, and grass are represented as 3D bounding boxes.

Results of the 3D scene understanding module. For each of the three scans—Vase, House, and Garden—we visualize the input point cloud (left) and the final set of labeled 3D bounding boxes inferred by our scene understanding pipeline (right). We also report the total time (in minutes) required to estimate the scene graph for each scan. Note that some bounding boxes may be enclosed within others and may therefore be occluded.

Example 3D asset generation. A simple prompt, ‘a roman statue’ is prompt boosted with visual examples to include beneficial words like `detailed’. This goes through text-to-image generation, then image-to-3D for the final generated mesh.

Example of 3D asset generation. Given a user prompt, we first apply prompt boosting, then use Dall-E 2 to generate a consistent image by editing the center region of a white canvas. The image is then lifted to 3D using InstantMesh. The 'Bad' example (right) illustrates a failure case because it would produce a partial 3D object (i.e., only the dragon’s head). Prompt boosting helps avoid such incomplete generations.

User Study & Evaluation

We evaluated ImaginateAR through a technical assessment and a three-part user study (N=20) in a public park. Our scene understanding pipeline outperformed the base OpenMask3D model and ablated variants, while our asset generation pipeline matched state-of-the-art quality with a faster, sub-minute runtime. The user study included: a comparison task across three authoring modes—manual, AI-assisted, and AI-decided—to explore control vs. automation (Part 1); a free-form phase where participants designed their own AR experiences (Part 2); and a co-design session reflecting on AI’s role and envisioning future features (Part 3). Overall, participants enjoyed creating diverse AR scenes and favored a hybrid approach—using AI for rapid, creative generation while retaining manual control for customization. Examples of both research team–created and user-authored scenes appear earlier on this page. For more details, please see our paper.

A comparison of bounding boxes between the Ground Truth, OpenMask3D, and Ours. It is clear that OpenMask3D has an excessive number of bounding boxes compared to the Ground Truth and Ours.

From left to right: bounding boxes from the ground truth, OpenMask3D, and our proposed method. OpenMask3D predicts a large number of masks, resulting in excessive bounding boxes that over-represent the same scene objects. In contrast, our method produces fewer, more accurate boxes. (Box colors are arbitrary and can be ignored.)

UI layout of ImaginateAR. Users can access different features of ImaginateAR through buttons such as lightbulb icon for brainstorming and mic icon for speaking to an AI agent.

Different screen captures of the ImaginateAR's mobile interface showing the UI layout and functionalities. Users can access manual, AI-assisted, and AI-decided modes across different features through buttons on the screen.

Resources

Paper

Supplemental

BibTeX

If you find this work useful for your research, please cite:

@inproceedings{lee2025imaginatear,
  author = {Lee, Jaewook and Aleotti, Filippo and Mazala, Diego and Garcia‑Hernando, Guillermo and Vicente, Sara and Johnston, Oliver James and Kraus‑Liang, Isabel and Powierza, Jakub and Shin, Donghoon and Froehlich, Jon E. and Brostow, Gabriel and Van Brummelen, Jessica},
  title = {ImaginateAR: AI‑Assisted In‑Situ Authoring in Augmented Reality},
  year = {2025},
  isbn = {9798400720376},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3746059.3747635},
  doi = {10.1145/3746059.3747635},
  booktitle = {Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology},
  location = {Busan, Republic of Korea},
  series = {UIST '25},
}

Acknowledgements

We thank the ImaginateAR research team and study participants. ImaginateAR builds on earlier work by many of the same authors (e.g., CoCreatAR). Specific funding and acknowledgements are referenced in the full preprint.

ImaginateAR

AI‑Assisted In‑Situ Authoring in Augmented Reality

Accepted to UIST 2025

Jaewook Lee^1* Filippo Aleotti^2* Diego Mazala² Guillermo Garcia‑Hernando² Sara Vicente² Oliver Johnston² Isabel Kraus‑Liang² Jakub Powierza² Donghoon Shin¹ Jon E. Froehlich¹ Gabriel Brostow^2,3 Jessica Van Brummelen²

Abstract

Video

Example Creations

System Overview

User Study & Evaluation

Resources

Paper

Supplemental

BibTeX

Acknowledgements

ImaginateAR

AI‑Assisted In‑Situ Authoring in Augmented Reality

Accepted to UIST 2025

Jaewook Lee1* Filippo Aleotti2* Diego Mazala2 Guillermo Garcia‑Hernando2 Sara Vicente2 Oliver Johnston2 Isabel Kraus‑Liang2 Jakub Powierza2 Donghoon Shin1 Jon E. Froehlich1 Gabriel Brostow2,3 Jessica Van Brummelen2

Abstract

Video

Example Creations

System Overview

User Study & Evaluation

Resources

Paper

Supplemental

BibTeX

Acknowledgements

Jaewook Lee^1* Filippo Aleotti^2* Diego Mazala² Guillermo Garcia‑Hernando² Sara Vicente² Oliver Johnston² Isabel Kraus‑Liang² Jakub Powierza² Donghoon Shin¹ Jon E. Froehlich¹ Gabriel Brostow^2,3 Jessica Van Brummelen²