Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigfe5cuzdd5376z6qfhtm667i2ku6xgjaw7fc6hr5zb4ek6dgaflu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkuw26qvptp2"
  },
  "path": "/t/advice-needed-for-building-interviewai-a-real-time-ai-interview-feedback-project/175712#post_2",
  "publishedAt": "2026-05-02T13:33:21.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I would be careful with the emotion recognition part.\n\nFor a project like this, I would not try to build something that says the candidate is nervous, confident, honest, or anything like that from their face. That is the kind of feature that sounds impressive in a demo, but is very easy to overclaim and very hard to validate.\n\nA more useful version is to measure simple visible signals and report them back as practice feedback. Speaking pace, long pauses, looking away from the screen, too much head movement, slouching, repeated hand-to-face gestures, filler words, very short answers, and whether the answer actually addresses the question. Those are still imperfect, but at least they are observable.\n\nFor the vision part, I would start with MediaPipe or OpenCV landmark tracking. I would not train a custom model at the beginning. Face landmarks and pose landmarks are enough to get rough head direction, shoulder posture, movement stability, and hand proximity to the face. If those basic signals are noisy, a bigger model will mostly give you a more expensive noisy system.\n\nYou can run the webcam at a lower frame rate, extract landmarks every few frames, smooth the values over time, and combine those features with a speech-to-text transcript. Then the feedback layer can use the measured signals plus the transcript instead of pretending the camera alone understands interview quality.\n\nThe part people usually underestimate is evaluation. Do not judge the system by whether the generated feedback sounds nice. Record some mock interviews, manually label simple things, and compare against that. Did it detect long pauses? Did it detect looking away? Did it confuse normal movement with bad posture? Keep the labels boring and concrete.\n\nI would also avoid psychological judgments in the output. Saying “you looked away from the screen during much of the answer” is fine. Saying “you lacked confidence” is mostly a guess dressed up as AI.",
  "title": "Advice needed for building InterviewAI: a real-time AI interview feedback project"
}