Bleeding Edge Tech
Hello,
I’ve built a face-based event registration and attendance system using InsightFace (GPU), where users self-register via camera. I’ve also implemented an agent to automate onboarding and attendance workflows.
I now want to move beyond a POC into a technically challenging problem with real-world complexity.
Some directions I’ve explored:
Moving from image-based (2D) recognition to video-based (temporal / 3D understanding)
Robustness under real-world conditions (lighting, occlusion, motion blur)
Real-time multi-camera identity tracking across a venue
Using LLMs to analyze event data (engagement, behavior patterns, etc.)
However, these feel like incremental extensions.
What I’m really looking for is a hard, open problem at the intersection of vision systems, real-time inference, and agentic AI —something where current approaches break down in practice (not just benchmarks).
For example:
Where do current face recognition / tracking systems fail at scale in real deployments?
Are there unsolved challenges in combining vision models with LLM-based agents for real-world decision-making?
Any known gaps between research and production systems in this space?
Would appreciate pointers to concrete, technically challenging problems worth tackling.
Thanks
Discussion in the ATmosphere