{
  "$type": "site.standard.document",
  "content": {
    "$type": "site.standard.content.markdown",
    "text": "Inspired by [Convoys](https://better.engineering/convoys/), I've tried to model conversion rates in SQL.\n\nThe following macro computes the Kaplan-Meier survival curves and conversion rates given two timestamps and the groups. You can then generate these tables at runtime and plot them in your favorite BI tool.\n\n```sql\n{% macro conversion_rate(relation, id, created, converted, groups, time_unit=\"minute\") %}\n\nwith durations as (\n    select\n        *,\n        timestamp_diff(coalesce({{ converted }}, current_timestamp()), {{ created }}, {{ time_unit }}) as duration\n    from {{ relation }}\n),\n\ntotal_subjects as (\n    select\n        {{ groups }} as g,\n        count(distinct {{ id }}) as num_subjects\n    from durations\n    group by 1\n),\n\ndaily_conversions as (\n    select\n        duration,\n        {{ groups }} as g,\n        count(distinct {{ id }}) as total,\n        sum(if( {{ converted }} is null, 0, 1)) as conversions\n    from durations\n    group by 1, 2\n    order by 1 asc\n),\n\ncumulative_conversions as (\n    select\n        duration,\n        daily_conversions.g,\n        total,\n        conversions,\n        total_subjects.num_subjects - coalesce(sum(total) over (partition by daily_conversions.g order by duration asc rows between unbounded preceding and 1 preceding), 0) as at_risk\n    from daily_conversions\n    left join total_subjects on daily_conversions.g = total_subjects.g\n),\n\nfinal as (\n    select\n        duration,\n        g,\n        at_risk,\n        total,\n        conversions,\n        at_risk - conversions - coalesce(lead(at_risk, 1) over (partition by g order by duration asc), 0) as censored,\n        exp(sum(ln(1 - conversions / at_risk)) over (partition by g order by duration asc rows between unbounded preceding and current row)) as survival_proba,\n        100 * (1 - exp(sum(ln(1 - conversions / at_risk)) over (partition by g order by duration asc rows between unbounded preceding and current row))) as conversion_pct,\n        sum(conversions / at_risk) over (partition by g order by duration asc rows between unbounded preceding and current row) as cumulative_hazard\n    from cumulative_conversions\n    where conversions > 0 and duration < 60 * 2\n)\n\nselect * from final\n\n{% endmacro %}\n```\n\nThe usage then is as simple as:\n\n```sql\nwith dataset as (\n    select\n        id,\n        trial_started_at as created_at,\n        converted_at,\n        plan as g\n    from {{ ref('trials') }}\n)\n\nselect * from ({{\n    conversion_rate(\n        relation=\"dataset\",\n        id=\"id\",\n        created=\"created_at\",\n        converted=\"converted_at\",\n        groups=\"g\",\n        time_unit=\"minute\"\n    )\n}})\n```\n\nI'm sure this can be improved and simplified. For now, is good enough and it works! If you're looking for an alternative in Looker, the folks at Montreal Analytitcs have you covered: [How to Leverage Product Survival Curves in Looker](https://blog.montrealanalytics.com/how-to-leverage-product-survival-curves-in-looker-9a31663d4ae6).",
    "version": "1.0"
  },
  "description": "Inspired by Convoys, I've tried to model conversion rates in SQL. The following macro computes the Kaplan-Meier survival curves and conversion rates given two timestamps and the groups. You can then generate these tables at runtime and plot them in your favorite BI tool. The u...",
  "path": "/dbt-survival",
  "publishedAt": "2022-02-14T00:00:00.000Z",
  "site": "at://did:plc:4z5i7njrld66ew36htufcwry/site.standard.publication/3mo43d2tmt2ov",
  "textContent": "Inspired by Convoys, I've tried to model conversion rates in SQL.\n\nThe following macro computes the Kaplan-Meier survival curves and conversion rates given two timestamps and the groups. You can then generate these tables at runtime and plot them in your favorite BI tool.\n\nThe usage then is as simple as:\n\nI'm sure this can be improved and simplified. For now, is good enough and it works! If you're looking for an alternative in Looker, the folks at Montreal Analytitcs have you covered: How to Leverage Product Survival Curves in Looker.",
  "title": "Kaplan-Meier Survival Curves in dbt"
}