{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreieruzcnr3fnqsoalkf5f2ptf66s5jpkovun6fsxzohvr62hcqc5ju",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmku2ty6mby2"
  },
  "path": "/t/is-ai-really-necessary-in-fitness-app-development-today/176158#post_3",
  "publishedAt": "2026-05-24T01:23:52.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Fuller et al. 2020",
    "“Keeping Pace with Wearables”",
    "Schyvens et al. 2024",
    "person-generated wearable data quality",
    "non-wear, missing data, and artifacts",
    "DACIA framework",
    "WHO guidance on ethics and governance of AI for health",
    "transparency for machine-learning-enabled medical devices",
    "orthosomnia",
    "COM-B model",
    "Behavior Change Technique Taxonomy",
    "Just-in-Time Adaptive Interventions",
    "Fuller et al. 2020, JMIR mHealth",
    "Doherty et al. 2024, “Keeping Pace with Wearables”",
    "Baron et al. 2017",
    "Cho et al. 2021",
    "Van Der Donckt et al. 2024",
    "DACIA framework, npj Digital Medicine",
    "Nahum-Shani et al. 2017",
    "Michie et al. 2011",
    "Michie et al. 2013",
    "TRIPOD+AI, BMJ 2024",
    "WHO 2021 guidance",
    "FDA transparency principles",
    "NIST AI RMF 1.0"
  ],
  "textContent": "Same here. I use a cheap consumer wearable almost every day, mostly for sleep duration, rough activity trends, and a quick look at basic vitals. It is genuinely useful for that. But I almost never use the “AI coach” or “AI insights” part, because most of the time I do not need a philosophical interpretation of how many hours I slept. I just want to know: did I sleep enough, did I move enough, and is anything obviously different from my usual baseline?\n\nI discussed this with ChatGPT for a while, and my current conclusion is something like this:\n\n* * *\n\nI do **not** think the interesting question is simply:\n\n> “Is AI necessary in fitness apps?”\n\nA better question may be:\n\n> “What would an _honest_ AI system for fitness look like, given the current limits of consumer wearables, sports science, behavior change science, and everyday life?”\n\nAnd once I phrase it that way, I become much less excited about the usual “AI personal trainer” pitch, but more interested in a much more boring kind of AI.\n\nMaybe the useful version is not an “AI fitness coach.”\n\nMaybe it is an **honest fitness data assistant**.\n\n## 1. Consumer wearable data is useful, but it is not clinical-grade truth\n\nCheap wearables are useful. I do not want to dismiss them.\n\nA smartwatch or ring can be good enough for rough sleep duration, step count, resting heart rate trends, activity reminders, and long-term self-monitoring. That alone can be valuable.\n\nBut we should not confuse this with data collected under medical or research conditions.\n\nConsumer devices are usually working with:\n\n  * cheap, low-power sensors\n  * imperfect skin contact\n  * motion artifacts\n  * missing data\n  * battery gaps\n  * non-wear periods\n  * proprietary algorithms\n  * device firmware/app updates\n  * uncertain ground truth\n  * user behavior that is not protocol-controlled\n\n\n\nThere is a large literature on this already. For example, Fuller et al. 2020 reviewed commercial wearables for step count, heart rate, and energy expenditure. The rough picture is: step count and heart rate can be useful in many settings, but accuracy varies, and energy expenditure is much weaker.\n\nA newer broad review, “Keeping Pace with Wearables”, makes the same general point at a larger scale: consumer wearables are promising, but accuracy depends on the metric, the device, the population, and the context.\n\nSleep is another good example. A 2024 systematic review comparing Fitbit Charge 4, Garmin Vivosmart 4, and WHOOP against polysomnography found that these devices can be useful, but sleep-stage estimation still has real limits: Schyvens et al. 2024.\n\nSo I think the first rule for fitness AI should be:\n\n> Do not treat consumer wearable data as if it were a clinical measurement.\n\nFor casual self-monitoring, it is often useful.\nFor strong personalized coaching, it is much weaker.\n\n## 2. “Garbage in, garbage out” is a serious problem here\n\nThe usual AI hype assumes that if we feed enough wearable data into an AI model, the model will become a good coach.\n\nBut in fitness, the input data may not contain the thing we actually want to know.\n\nA wrist wearable can measure or estimate things like:\n\n  * steps\n  * heart rate\n  * heart rate variability\n  * sleep duration\n  * sleep stages\n  * activity type\n  * GPS pace/distance\n  * calories burned\n  * maybe SpO2\n  * maybe skin temperature\n\n\n\nBut for serious fitness decisions, we often want something closer to:\n\n  * actual muscle tension\n  * tendon load\n  * joint stress\n  * local tissue fatigue\n  * mechanical load per muscle group\n  * movement quality\n  * pain risk\n  * recovery capacity\n  * technique degradation\n  * whether the current stimulus is appropriate for this person\n\n\n\nThose are much harder to measure.\n\nA cheap wrist device cannot really tell how much load your quadriceps tendon took during squats, whether your lower back was compensating, whether a muscle group was close to failure, or whether a movement pattern increased injury risk.\n\nThere are research tools for parts of this: EMG, NIRS, force plates, motion capture, instrumented treadmills, bar velocity trackers, force sensors, lab-grade metabolic carts, and so on. But that is not the same as a cheap everyday wearable.\n\nSo the issue is not just that the AI is not smart enough. The issue is also physical:\n\n> The sensor layer is often too far away from the real fitness variable we care about.\n\nThat is why I think the “AI coach” claim often jumps too far ahead of the measurement stack.\n\n## 3. Fitness is harder than some other wearable use cases\n\nWearables can be more useful in cases where the measured signal is closer to the target.\n\nFor example, trend monitoring of SpO2 in some respiratory contexts may be useful as a warning signal or as part of a care plan, though of course not as a replacement for medical judgment. In that kind of case, even if the absolute value is imperfect, a trend or drop may still be meaningful enough to investigate.\n\nBut fitness is different.\n\nFitness advice usually requires a chain like this:\n\n\n    sensor signal\n      -> inferred metric\n      -> physiological interpretation\n      -> training interpretation\n      -> user-specific context\n      -> safe recommendation\n      -> behavior change\n      -> long-term adaptation\n\n\nEvery arrow in that chain is hard.\n\nFor a cheap wearable, the system may know that my heart rate increased. But why?\n\n  * exercise intensity?\n  * heat?\n  * poor sleep?\n  * caffeine?\n  * stress?\n  * dehydration?\n  * illness?\n  * anxiety?\n  * sensor artifact?\n  * loose watch strap?\n  * different workout type?\n  * unusually high fatigue?\n\n\n\nAnd even if the system correctly detects “more strain,” what should it recommend?\n\n  * rest?\n  * walk?\n  * lift lighter?\n  * do mobility work?\n  * sleep earlier?\n  * eat more?\n  * do nothing?\n  * ask a clinician?\n  * ignore the signal because the data quality is poor?\n\n\n\nThat is not simply an AI problem. It is a measurement problem, a physiology problem, a behavior problem, and a daily-life-context problem.\n\n## 4. The missing data problem is not random\n\nAnother problem is that consumer wearable data is not 24/7/365 reliable.\n\nPeople remove devices when charging, showering, sleeping, exercising, traveling, feeling sick, irritated by the strap, or simply tired of tracking. And those missing periods may be exactly the important periods.\n\nMissing data is not just an empty hole. It can be informative.\n\nMaybe the user removed the device because:\n\n  * they slept badly\n  * they were sick\n  * they were stressed\n  * they were traveling\n  * the device was uncomfortable\n  * they did a sport where the watch was annoying\n  * they did not want to look at the data\n  * they forgot to charge it because life was chaotic\n\n\n\nThere is literature on this too. See, for example, reviews on person-generated wearable data quality and practical work on real-world wearable data problems such as non-wear, missing data, and artifacts.\n\nAn honest fitness AI should not just smooth over missing data and continue speaking confidently.\n\nIt should sometimes say:\n\n> “The data is too incomplete today, so I will not make a training recommendation.”\n\nThat is boring. But it is honest.\n\n## 5. The useful AI may be a pipeline, not one magic LLM\n\nI think a serious version of this should not be:\n\n\n    wearable data -> LLM -> advice\n\n\nThat is too dangerous and too vague.\n\nA more honest architecture would be something like:\n\n\n    wearable data\n      -> data quality check\n      -> non-wear / artifact / missingness detection\n      -> metric-specific reliability score\n      -> personal baseline comparison\n      -> context check\n      -> risk/scope check\n      -> intervention policy\n      -> natural language explanation\n      -> user correction / feedback\n\n\nThis is why I like the idea of a “fitness AI pipeline” more than a single “AI coach model.”\n\nThe DACIA framework for digital biomarkers is relevant here. It breaks the path from wearable sensor data to useful action into stages: Data, Aggregation, Contextualization, Interpretation, and Actions.\n\nThat seems much closer to what a serious fitness AI needs.\n\nThe LLM should probably be near the end of the pipeline, not at the beginning. Its job should be to explain checked information in a clear, cautious, human-readable way — not to magically infer the user’s body state from noisy sensor data.\n\n## 6. The AI should explain its reasoning in small, readable pieces\n\nOne daily habit of honesty is giving a simple reason that the other person can understand.\n\nFor example, not this:\n\n> “You are under-recovered. Take a rest day.”\n\nBut this:\n\n> “Your sleep duration was shorter than your usual baseline, and your resting heart rate is slightly higher than usual. However, the device cannot measure actual muscle or joint load. If you also feel tired, a lighter day or rest day is reasonable.”\n\nThat is much better.\n\nA good output format might be:\n\n\n    Conclusion:\n    Light exercise or rest may be reasonable today.\n\n    Why:\n    Sleep was shorter than your usual baseline, and resting heart rate is slightly elevated.\n\n    Limits:\n    This device does not measure actual muscle load, joint stress, pain, or subjective fatigue.\n\n    Options:\n    Rest, take a short walk, or do the planned workout at lower intensity.\n\n\nThis gives the user something to inspect.\n\nThe user can say:\n\n> “Actually, I feel fine today.”\n\nor:\n\n> “The heart rate was probably high because I had coffee.”\n\nor:\n\n> “Yes, I am tired, I will take it easy.”\n\nThat is important. The AI should not override the user’s bodily experience. It should help the user reason.\n\nThis also connects to general guidance around transparency and explainability in health AI. The WHO guidance on ethics and governance of AI for health emphasizes transparency, explainability, human autonomy, safety, and accountability. The FDA / Health Canada / MHRA principles on transparency for machine-learning-enabled medical devices also emphasize communicating information that affects risk, outcomes, context of use, and user understanding.\n\nFitness apps are not always medical devices, but the same spirit matters when an app starts giving health-related advice.\n\n## 7. Maybe the real value is not “do more,” but “worry less”\n\nMany fitness apps implicitly push users to do more:\n\n  * walk more\n  * train more\n  * close the rings\n  * improve the score\n  * optimize sleep\n  * optimize recovery\n  * optimize calories\n  * optimize everything\n\n\n\nBut humans cannot optimize everything. Daily life is already full.\n\nA useful AI might do the opposite:\n\n  * ignore this metric\n  * do not worry about sleep stages\n  * calories burned is too noisy to use as a precise target\n  * your weekly activity trend is enough\n  * today’s data quality is poor, so do not over-interpret it\n  * you already trained enough this week\n  * rest is a valid option\n  * if you are tired, do less\n  * do not add another habit right now\n\n\n\nThis is less glamorous than an AI coach, but probably more useful.\n\nThere is also a known concern around over-tracking. The term orthosomnia was proposed for cases where people become overly concerned with sleep tracker data and “perfect” sleep scores. Even if that is a specific sleep-tracking concept, the broader warning applies to fitness data too:\n\n> More measurement does not automatically mean better self-regulation.\n\nSometimes the best AI feature is to reduce the number of things the user has to think about.\n\n## 8. Fitness advice is behavior-change design, not just recommendation\n\nEven if the AI gives scientifically reasonable advice, it may still be useless if the user cannot act on it.\n\n“Go for a 30-minute run” may be good advice in the abstract. But maybe the user is:\n\n  * on a train\n  * at work\n  * exhausted\n  * wearing the wrong clothes\n  * caring for family\n  * in pain\n  * sleep deprived\n  * in bad weather\n  * socially unable to exercise in that context\n  * mentally overloaded\n\n\n\nThis is where behavior-change frameworks matter.\n\nThe COM-B model says behavior depends on Capability, Opportunity, and Motivation. Fitness apps often over-focus on Motivation, but Opportunity may be the real bottleneck: time, place, equipment, social context, and energy.\n\nThe Behavior Change Technique Taxonomy is also relevant because it reminds us that “advice” is not one thing. Goal-setting, self-monitoring, feedback, prompts, action planning, social support, and habit formation are different intervention components.\n\nSo a good fitness AI should not merely ask:\n\n> “What is the optimal workout?”\n\nIt should ask:\n\n> “Is this intervention actually possible for this person today?”\n\nThat is a much harder and more honest question.\n\n## 9. JITAI is close to the real problem, but hard\n\nThe idea of Just-in-Time Adaptive Interventions is very relevant: give the right kind and amount of support, at the right time, based on the person’s changing state and context.\n\nThat sounds exactly like what people want from AI fitness coaching.\n\nBut it is hard.\n\nA system has to know:\n\n  * when to intervene\n  * when not to intervene\n  * what signal is reliable\n  * what signal is not reliable\n  * what the user can actually do now\n  * whether the intervention will help\n  * whether the intervention will annoy the user\n  * whether the intervention might increase anxiety\n  * whether the system should recommend action, rest, or silence\n\n\n\nSo I think consumer fitness AI should start with something weaker than a full JITAI:\n\n> detect obvious deviations, give low-risk suggestions, and stay silent when uncertain.\n\nThat may sound modest, but it is already useful.\n\n## 10. I would define “honest fitness AI” like this\n\nAn honest fitness AI should:\n\n  1. **Check data quality first.**\nIf the device was not worn properly, if the data is missing, or if the metric is unreliable, weaken or suppress the advice.\n\n  2. **Prefer personal trends over universal claims.**\n“Compared with your usual baseline” is often safer than “your sleep is bad” or “your recovery is low.”\n\n  3. **Separate observation, inference, and advice.**\n“Sleep was shorter” is different from “you are under-recovered,” which is different from “you should rest.”\n\n  4. **Explain the basis briefly.**\nEvery recommendation should have a small “why” and a small “limits” section.\n\n  5. **Never pretend to measure what it cannot measure.**\nA wrist wearable usually does not measure actual muscle load, tendon stress, joint load, or injury risk directly.\n\n  6. **Use low-risk suggestions by default.**\nRest, short walks, reducing intensity, checking subjective fatigue, or simply logging data are safer than strong workout prescriptions.\n\n  7. **Respect user context and agency.**\nThe user may know something the device does not: pain, mood, stress, schedule, caffeine, illness, family obligations, or simply “today is not the day.”\n\n  8. **Know when to shut up.**\nSilence is a feature. Not every data point needs interpretation.\n\n  9. **Be transparent about scientific limits.**\nThe system should not exceed the current state of measurement science, exercise science, or behavior-change evidence.\n\n  10. **Escalate instead of pretending.**\nIf the issue sounds medical — chest pain, fainting, severe breathlessness, unusual symptoms — the app should not “coach.” It should advise seeking appropriate medical help.\n\n\n\n\n## 11. My preferred product would be boring\n\nHonestly, the first good version may not be a daily AI coach.\n\nIt might be a weekly review assistant.\n\nSomething like:\n\n\n    This week:\n    - Sleep duration was about 35 minutes shorter than your usual weekly average.\n    - Activity was roughly stable.\n    - Workout frequency dropped from 3 sessions to 2.\n    - Wednesday sleep data is incomplete, so sleep-stage analysis is not used.\n\n    Main thing to watch:\n    - Sleep duration, not sleep stages.\n\n    Low-risk options:\n    - Keep workouts the same, but avoid increasing intensity this week.\n    - Add one short walk if you have time.\n    - If tired, do nothing extra.\n\n\nThat is not sexy.\n\nBut it is probably more useful than:\n\n> “Your AI coach has generated an optimized personalized training plan.”\n\nBecause the weekly review assistant is not pretending to know more than it knows.\n\n## 12. So my answer to the original question is: AI can help, but only if it becomes more modest\n\nI do not think AI is “necessary” for every fitness app.\n\nA simple, reliable app that tracks sleep duration, steps, workouts, and trends may be more valuable than an overconfident AI coach.\n\nBut I also do not think AI is useless.\n\nAI could be useful if it is designed as:\n\n> an uncertainty-aware, evidence-limited, low-risk, user-respecting explanation layer on top of wearable data.\n\nNot a magical trainer.\n\nNot a doctor.\n\nNot a personality that nags you.\n\nMore like:\n\n> “Here is what changed. Here is the weak evidence. Here is what I cannot know. Here are safe options. You decide.”\n\nThat is the kind of fitness AI I would actually trust.\n\n## Useful references / starting points\n\n  * Consumer wearable validity and reliability: Fuller et al. 2020, JMIR mHealth\n  * Broad consumer wearable accuracy review: Doherty et al. 2024, “Keeping Pace with Wearables”\n  * Sleep tracker validation against PSG: Schyvens et al. 2024\n  * Sleep-tracking anxiety / orthosomnia: Baron et al. 2017\n  * Wearable data quality / person-generated data: Cho et al. 2021\n  * Real-world wearable data issues: Van Der Donckt et al. 2024\n  * Digital biomarker pipeline thinking: DACIA framework, npj Digital Medicine\n  * Just-in-Time Adaptive Interventions: Nahum-Shani et al. 2017\n  * COM-B / Behavior Change Wheel: Michie et al. 2011\n  * Behavior Change Technique Taxonomy: Michie et al. 2013\n  * Prediction model reporting with AI/ML: TRIPOD+AI, BMJ 2024\n  * WHO health AI ethics/governance: WHO 2021 guidance\n  * FDA / Health Canada / MHRA transparency principles for ML-enabled medical devices: FDA transparency principles\n  * NIST AI risk management: NIST AI RMF 1.0\n\n\n\nSo my short version is:\n\nAI in fitness is not automatically bad, but the honest version should probably be much more boring than the marketing version.\n\nThe useful near-term product is not “AI understands your body.”\n\nIt is:\n\n> “AI helps you avoid over-interpreting weak data.”",
  "title": "Is AI Really Necessary in Fitness App Development Today?"
}