AI-powered visual assistance platform that provides real-time environmental navigation, image description, and color blindness tools for visually impaired users using YOLO v11 and LLM integration.
An AI-powered visual assistance platform that provides real-time environmental navigation, image description, and color blindness tools for visually impaired users using YOLO v11 and LLM integration.
Key Features
Real-time environmental navigation with custom-trained YOLO v11 object detection
Instant image description using Gemini API with voice output
Color blindness simulation tools for multiple vision deficiencies (Protanopia, Deuteranopia, Tritanopia, Achromatopsia)
Voice-controlled navigation and commands via Web Speech API
Hands-free keyboard navigation shortcuts for accessibility
Live camera streaming with Socket.IO for real-time guidance
Context-aware navigation instructions powered by GROQ LLM
Cross-platform compatibility (web/mobile) with responsive design
Open-source codebase focused on accessibility and inclusivity
Challenges & Solutions
AI Model Integration: Successfully combined YOLO v11 object detection with LLM reasoning to provide contextual, actionable navigation instructions rather than simple object identification
Real-time Performance: Implemented efficient Socket.IO streaming to handle live camera feeds while maintaining low latency for critical navigation guidance
Cross-platform Accessibility: Addressed platform-specific limitations (iOS voice restrictions) while ensuring consistent user experience across devices
Custom YOLO Training: Developed specialized object detection model trained on environmental obstacles and navigation-relevant objects for improved accuracy
Voice Interface Design: Created intuitive voice command system with clear audio feedback, essential for users who cannot see visual confirmations
Accessibility-First UI: Designed interface following WCAG guidelines with high contrast, screen reader compatibility, and keyboard-only navigation support