{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibtl7qv6yklkfv7xso7pofyif2356zmp7tft6znrfhnhffh5hzuji",
    "uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3mib56f5yala2"
  },
  "path": "/t/thinking-clearly-about-association-studies-risk-factors-and-causal-salad-included/28679#post_4",
  "publishedAt": "2026-03-29T13:31:44.000Z",
  "site": "https://discourse.datamethods.org",
  "tags": [
    "Initial data analysis"
  ],
  "textContent": "This one one way to think about stages and goals of research, with some overlap between them.\n\n  * Initial data analysis which is blinded to any linkage between X and Y and should proceed both experimental and observational data analysis\n  * Descriptive studies, which deal mainly with characterizing univariate distributions and relationships among Xs using correlation matrices and variable clustering and other unsupervised learning techniques\n  * Descriptive studies of relationships between individual Xs and Y without prediction or inference\n  * Prediction, where explanations are not very important and parsimony is not usually a goal\n  * Association studies that try to find associations between some Xs and Y that are not trivial, i.e., are not explained by boring Xs. These are often improperly used to make causal statements, or may be used to inform a causal analysis, assuming that a factor needs to have a nonzero association with Y before you care about causation.\n  * Development of causal diagrams using only subject matter knowledge, then feeds into the next item\n  * Formal causal inference\n\n\n\nAn open question is whether association studies help or hurt the last two steps, i.e., whether empirical association analysis should inform the development of the causal diagram.\n\nThere there is the always important issue of how prospectively the study needs to be designed in the first place. It is extremely important to elicit expert advice about potential confounders before revealing to the experts which variables are available in the dataset. This avoids data availability bias.",
  "title": "Thinking Clearly about Association Studies (Risk Factors and Causal Salad included)"
}