Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiehskmnykl5ic4bww73l5ubcmyccpvejsbepvojcqxfku7lqfxdfq",
    "uri": "at://did:plc:pi6woz4d47bkuws673w2il2r/app.bsky.feed.post/3mhfgy4yngeb2"
  },
  "path": "/t/rfc-sibyl-time-series-analysis-in-haskell/13823#post_1",
  "publishedAt": "2026-03-18T19:55:38.000Z",
  "site": "https://discourse.haskell.org",
  "tags": [
    "DataHaskell",
    "@mchav",
    "@daikonradish",
    "here"
  ],
  "textContent": "Hello! I’ve recently begun work on a new time series analysis and forecasting library called **Sibyl** that will implement functions and models comparable to `statsmodels` in Python or `forecast` in R: things like ARIMA/SARIMA, exponential smoothing, classical decomposition, and automatic model selection via the Hyndman-Khandakar algorithm. I’m working with the folks over at DataHaskell (@mchav, @daikonradish) to flesh out the initial specifications and begin drafting design plans.\n\nDuring initial design discussions, we ran into a roadblock around a major UX choice. My initial approach splits the library into two layers:\n\n  * An **unsafe facade** layer for notebook users and statisticians, imported with `import Sibyl`. Functions throw runtime errors on failure rather than returning `Either`. The goal is R-style convenience.\n  * A **safe layer** for production pipelines, where users import modules individually (`import Sibyl.Safe.TimeSeries`, `import Sibyl.Models.SARIMAX`, etc.) and handle error types explicitly.\n\n\n\nHere’s what each looks like in practice. A notebook user or scripter might write:\n\n\n    main :: IO ()\n    main = do\n      raw <- D.readCsv \"./data/sales.csv\"\n      result <- fromDataFrame \"date\" \"sales\" raw\n        |> fit (ARIMA (1, 1, 1))\n        |> forecast 12\n        |> toDataFrame\n      D.writeCsv \"./artifacts/forecast.csv\" result\n    -- Nice and convenient!\n\n\nAnd the production pipeline version, where error handling is explicit:\n\n\n    module Pipeline where\n\n    import qualified Sibyl.Safe.TimeSeries as TS\n    import qualified Sibyl.Models.SARIMAX  as SARIMAX\n    import qualified Sibyl.Model           as M\n    import qualified DataFrame             as D\n\n    main :: IO ()\n    main = do\n      raw <- D.readCsv \"./data/sales.csv\"\n      case TS.fromDataFrame \"date\" \"sales\" raw of\n        Left err     -> putStrLn $ \"Bad series: \" ++ show err\n        Right series ->\n          case SARIMAX.fitSARIMAWith settings series of\n            Left err    -> putStrLn $ \"Fit failed: \" ++ show err\n            Right model -> do\n              M.summarize model\n              let fc = SARIMAX.forecastSARIMA 12 model\n              D.writeCsv \"./artifacts/forecast.csv\" (TS.toDataFrame (M.point fc))\n              putStrLn \"Done.\"\n      where\n        settings = SARIMAX.defaultSARIMASettings\n          { SARIMAX.sarimaP      = 1, SARIMAX.sarimaD      = 1, SARIMAX.sarimaQ      = 1\n          , SARIMAX.sarimaBigP   = 0, SARIMAX.sarimaBigD   = 1, SARIMAX.sarimaBigQ   = 1\n          , SARIMAX.sarimaPeriod = 12, SARIMAX.sarimaMethod = SARIMAX.CSSML\n          }\n\n\nA third option has also come up in our discussions: using `DataKinds` and type families to encode model orders at the type level, so that `Fitted ('ARIMA 1 1 1)` and `Fitted ('ARIMA 2 1 0)` are different types. This gives much stronger guarantees, like for example, SARIMAX’s requirement for future regressors at forecast time becomes a type-level constraint rather than a runtime check. The tradeoff is that it makes interactive use in GHCi or a notebook harder, since model orders would need to be known at compile time.\n\nThe tension we keep running into is this: the new-to-Haskell R or Python user wants minimal friction, sensible defaults, and something that “just works”, per se. They’d likely use the unsafe facade and never touch `Either`. The Haskell engineer wants (?) strong guarantees, composability, and wants the type system to do a lot of work rather than just wrapping `error` calls. By trying to serve both audiences…we risk ending up with something that fully satisfies neither. Too un-ergonomic for the statistician, not strong enough for the seasoned Haskeller.\n\nSo, as we work through the initial stages of this library, I was wondering if I could get some initial comments and questions answered by anyone who has thoughts on all this; especially:\n\n  * Is the two-layer approach a reasonable compromise, or is this two half-baked APIs?\n  * For those of you who do statistical work in Haskell: what would you want out of a library like this?\n  * Are there prior examples in the Haskell ecosystem that handle this well? Libraries that serve both “quick and dirty” and “production-grade” users?\n  * Does the `DataKinds` approach seem worth the added complexity?\n\n\n\nThe GitHub repo is here if you want to see what exists so far. Happy to answer any questions! I’m also an intermediate Haskell programmer, and it’s possible I’m missing something really obvious in the details for this implementation. I welcome any and all feedback, as every opinion will help me better identify how best to write this library to serve the most amount of people",
  "title": "[RFC] Sibyl: Time Series Analysis in Haskell"
}