-->
The journey to create accessible, emotionally intelligent AI avatars has reached a significant milestone with our D-24 development phase. Our team has made substantial progress in bridging the gap between complex AI technology and user-friendly avatar creation, enabling anyone—regardless of technical background—to build and interact with their own AI companions through face-to-face conversational interfaces.
On April 30th, we will present a demo to our VCs, showcasing our method for cloning our mentor and instructor, Adam Paulisick. We believe this will effectively demonstrate our capabilities, as the VCs are already familiar with him.
Our recent focus has been on optimizing the entire pipeline for real-world deployment:
In the following section, we will showcase some videos highlighting our recent achievements.
To ensure our avatars can engage in fully conversational interactions, we have to generate videos each time the avatar needs to speak and provide smooth transition between responses. We have developed a QA system that enables the avatar to respond dynamically, complemented by static reference animations. By utilizing neural network interpolation, we achieve smooth transitions between expressions, which require approximately 10 seconds of processing time.
First, our baseline static animations provide a foundation for consistent character presentation.
This demonstration showcases our QA model’s ability to generate contextually appropriate facial animations based on real-time conversational input.
By using neural network interpolation, we achieve smooth transitions between expressions, which require approximately 10 seconds of processing time.
To enable creators to design AI avatars with personalized voices, we developed functions that ensure the avatars can accurately synchronize lip movements with various audio inputs.
Here’s an exclusive glimpse into our backend processing pipeline. At present, when creators submit their Gemini API Key, Avatar Image, and Voice Reference, we can generate a brief response that animates the image to simulate speech while matching the provided voice reference.
Frontend Development:
Backend Processing:
AI Components:
Current performance benchmarks show:
Rendering Speed:
Video Integration:
Real-time Interaction:
Our team is actively working on:
Performance Optimization:
User Interface Enhancement:
Component Integration:
User Interface Development:
Performance Enhancement:
The D-24 phase represents a significant step toward our goal of democratizing AI avatar creation. While we’ve made substantial progress in key areas like facial animation and emotional expression, we continue to push the boundaries of what’s possible in real-time AI interaction. Future work may include RAG-Optimization and further customization options.
We invite developers and enthusiasts to follow our progress and contribute to this exciting journey. Stay tuned for more updates as we move closer to our vision of accessible, emotionally intelligent AI companions.
Note: This is a technical progress report. For user-friendly guides and general information about our AI companions, please visit our main product pages.