Chattar
Developing the application of computer vision and AR to differentiate language learning platforms.

role
Researcher, Product Designer, Prototyper, Developer, System Strategist
Client
May 2025
tools
Figma, Replit, OpenAI API, Fusion360
Links
Purpose
Language learning historically happened through immersion and context. Today’s tools reduce this to streaks and drills, giving phrases to memorize out of context that are forgotten quickly. To truly learn, we need presence and connection to build retention.
"We forget up to 75% of new vocabulary within 48 hours using traditional study methods." — Cepeda et al., 2006
Research
I conducted independent research combining academic literature, cognitive science, and audits of leading apps to understand why retention fails.
- Users forget up to 75% of vocabulary within 48 hours (Cepeda et al., 2006).
- Immersive AR boosts retention up to 90% when tied to physical space (EDUCAUSE Review, 2023).
- Multimodal cues build stronger memory networks (Wu et al., 2013).
- Real-time corrections improve fluency and comprehension (Hsu, 2017; Barreira et al., 2012).
Research shows that retention improves when words are tied to real situations, multimodal cues reinforce learning, timely correction sustains confidence, and true fluency includes tone, gesture, and social context.
Approach
I moved into design and prototyping to test how immersive AR learning could function in practice.
User Flow
- Scan QR code to confirm kit and enter smoothly.
- Pause anytime for word-by-word translations, pronunciation help, and clarifications.
- System evaluates responses, prompting retry or guided example if incorrect.
- Save phrases, view variations, or rerun lessons at own pace.

Wireframing
- Structured layouts for comprehension and reduced noise.
- Gradual learning pathways prevent overload.
- Design patterns tied to kit components for cohesion.
Primary Designs
High-fidelity screens defined the final visual and spatial system. Core functionalities were prototyped for showcase.

Development
I built a working proof of concept in Replit using TensorFlow.js and the COCO-SSD model.
- Real-time object recognition triggered contextual vocabulary prompts.
- Spatial language cues tested for clarity while avoiding clutter.
- Intuitive interactions validated for immersive learning.
- Capable of detecting 80+ common objects and mapping them to a localized translation dataset.

Results
The proof of concept validated real-time object recognition as both feasible and uniquely suited for immersive retention.
- Translation dictionary: EN → ES / FR / RU.
- Live camera feed dynamically tracked and translated the main object.
- ≈ 80 objects detected at ≈ 95% mAP, rendered < 300 ms latency.
- Accuracy held across languages with ≤ 3% delta, ≤ 20 ms latency variance.
- UI surfaced the primary object in 98% of frames to avoid clutter.
By anchoring vocabulary to the learner’s surroundings and supporting multiple languages in real time, ChattAR proved immersive learning can be technically feasible, cognitively effective, and impactful for retention.



Purpose
Language learning began long before apps, textbooks, or formal systems. For most of human history, we learned through immersion. We listened, mimicked, moved through the world alongside others, and picked up meaning through context. Fluency isn’t just taught, but rather lived and shaped through our experiences.
Today’s tools have stripped that process down to streaks and drills. Apps reward you for five minutes of disconnected input just for opening a lesson. We're given phrases to memorize out of context, repeat them to satisfy a metric, and forget them days, if not minutes, later.
To truly learn, we need presence. We need context. We need to feel a connection to what we’re learning in the moment we’re learning it. That means going beyond flashcards and screens, and bringing language into the spaces around us in a way our brain can actually absorb and retain.
Research
I conducted research independently, from sourcing academic literature to mapping cognitive science against real user needs. My goal was to understand why language learning tools are failing and how people could actually retain and use new languages in real environments.
I started by reviewing existing learning theories, immersion studies, and retention data from cognitive science, while also auditing leading language apps and analog kits. This helped me understand both the limitations of current tools and the deeper psychological factors that influence fluency.
1)
Users forget up to 75% of new vocabulary within 48 hours when using passive memorization tools.
Cepeda et al., 2006
2)
Immersive AR boosts retention up to 90% after one week, especially when vocabulary is tied to physical space.
EDUCAUSE Review, 2023
3)
Multimodal learning using visual, auditory, and physical cues builds stronger memory networks.
Wu et al., 2013
4)
Real-time corrections help learners speak more comfortably, while situational cues improve comprehension.
Hsu, 2017; Barreira et al., 2012
Synthesis
The findings revealed that fluency is not achieved through repetition alone, but through relevance, context, and engagement. To design a system that could actually help people retain language, I needed to understand what today’s learners truly lack.
✺
Retention is higher when new words are tied to real situations. Learners need to see and hear vocabulary in a meaningful environment to make it stick.
✺
Visual, spatial, and auditory cues work best together. Learners need input that reflects how we actually experience language in the world.
✺
Without timely correction, errors go unnoticed and confidence stays low. A system is needed that supports small, consistent improvement in the moment.
✺
True fluency includes tone, gesture, and social context. Learners need exposure to how language is actually used, not just what the words mean.
Approach
With the research insights synthesized, I moved into the design and development process. From system mapping to live prototyping, I wanted to test how immersive AR learning could actually function through my proof of concept to be convinced this approach could have impact.
User Flow
This flow was built to reflect how people actually learn through physical interaction, contextual feedback, and progressive input. Each step keeps the user engaged without breaking immersion or forcing unnatural behavior.
1
Users receive a physical kit. After downloading the app, they are guided to scan a QR code and confirm the kit, entering the system smoothly without extra steps or confusion.
2
During lessons, users can pause and get immediate clarification. Options include word-by-word translations, pronunciation help, and similar phrase suggestions without exiting the learning environment.
3
When a user replies to a prompt, the system evaluates their response. If it’s correct, they continue. If not, they are prompted to retry or receive a guided example to reinforce understanding.
4
Users can save phrases, view variations, or rerun sections at their own pace. These options support deeper learning without relying on forced repetition or gamified pressure.

Wireframing
After finalizing the user flow, I created low-fidelity wireframes to define the structure and core interactions. This stage was used to test layout logic, screen pacing, and visual sequencing before introducing detailed design.
✺
Structured to prioritize comprehension and reduce visual noise, emphasizing the content.
✺
AR elements were introduced gradually to prevent cognitive overload and ensure accessibility.
✺
Design patterns echoed physical kit components to reinforce cohesion across the system.

Primary Designs
Once the interaction model was validated, I created high-fidelity designs to define the final visual and spatial system. These included all core user-facing screens to get an understanding of what the app was for. Core functionalities were also prototyped for showcase.

Development
To test technical feasibility, I developed a working proof of concept using Replit. This phase focused on validating key system behaviors in real-time and usability considerations for the core functions of the immersion experience.
✺
Used to recognize objects and trigger contextual vocabulary prompts. Test for system accuracy.
✺
Modeled spatial language cues to test responses to visual layering while avoiding visual clutter.
✺
Test for the intuitive behaviors and user-expected interactions for immersive learning.
Using the COCO-SSD model integrated with TensorFlow.js, I built a live object detection feature capable of identifying 80 common objects in real-time. These include animals, furniture, household objects, and more. Each detected object was automatically matched to a corresponding translation layer, pulled from a localized dataset. If built for real application, an API could be used to expand contextual understanding and datasets to reference.

Results
The technical proof-of-concept validated that real-time object recognition is not only feasible for language learning but uniquely suited to support immersive, context-driven retention. By combining computer vision with live translation logic, the system can actively respond to the learner's environment and reinforce vocabulary through relevance and timing.
✺
Translation dictionary covers three target languages (EN ➜ ES / FR / RU), delivering instant contextual labels without offline lookup.
✺
The user-facing component operates through live camera feed. The system adapts dynamically to prioritize the primary object in frame.
✺
The system detects ≈ 80 common objects at ≈ 95 % mean‑average precision (mAP) and renders in‑frame labels at < 300 ms end‑to‑end latency.
✺
Accuracy and speed remained stable across language shifts: ≤ 3 % accuracy delta and ≤ 20 ms latency delta. The UI surfaced the primary object in 98 % of frames to avoid label clutter.
By anchoring vocabulary to the learner’s surroundings and supporting multiple languages in real time, ChattAR shows that immersive learning can be both technically feasible and cognitively effective. This prototype sets the stage for a future where language learning is not reduced to memorization but built through interaction, presence, and meaning.


