Raw Record Source

{
  "path": "/juliacon-2024-workshops",
  "site": "at://did:plc:gfrmhdmjvxn2sjedzboeudef/site.standard.publication/3md7ylshxzk2y",
  "$type": "site.standard.document",
  "title": "JuliaCon 2024 Workshops",
  "content": {
    "$type": "site.standard.document#markdown",
    "value": "# JuliaCon 2024\n\nIt's workshop day!\n\n## Parallel processing with Dagger.jl\n\n[Dagger.jl](https://github.com/JuliaParallel/Dagger.jl) is an extremely cool tool. I used Dagger sometime in 2018 I think, but I didn't really have a good distributed computing problem to solve. \n\nJulian Samaroo and [Przemysław Szufel](https://szufel.pl/) presented the workshop. Here's the [workshop materials](https://github.com/jpsamaroo/DaggerWorkshop2024).\n\nMy takeaway was this: Dagger is _fucking crazy_. Essentially, it unifies a bunch of forms of parallel computation: multithread, multiprocess, and GPU. You provide Dagger a collection of resources (such as threads, worker processes, or GPUs) and it handles the scheduling of tasks on those resources. \n\nDagger will pretty much auto-magically figure out things like memory movement between processes -- for cheap tasks, you want to keep data within-process to minimize memory movement, but in some cases a worker may be overloaded and it may be cheaper to move memory to a different worker.\n\nThe simple version of Dagger resembles Julia's [standard task workflow](https://docs.julialang.org/en/v1/base/parallel/):\n\n```julia\nt = Dagger.@spawn 1+2\n@show t\nfetch(t)\n```\n\n`t` here is a `DTask`, which represents a task that will execute on some parallel resource. `fetch(t)` will block and return the result of the task.\n\nDagger will also construct a DAG (hence the name DAGger) of your computation -- you can construct an arbitrary set of tasks, and each task will be handed off to another process upon completion. Take this for example:\n\n```julia\n# Multiple dependencies and parallelism\nx = rand(5000)\na = Dagger.@spawn x .+ 1\nb = Dagger.@spawn a .* 2\nc = Dagger.@spawn a ./ 2 # b and c are independent and be run parallel\nd = Dagger.@spawn b .- c\nfetch(d)\n```\n\nAbove, `b` and `c` are independent and can be run in parallel. `d` depends on both `b` and `c`, so it will block until both are complete.\n\nGPU support is quite straightforward as well. Julia's GPU support is wonderful, and you can use any device type you need (CUDA, ROCm, Metal, oneAPI).\n\nHere's how to set up a GPU in Dagger:\n\n```julia\nusing DaggerGPU\nusing CUDA\n\n# Annoying, but we need to restart the scheduler for the below changes to take effect...\n# Will be fixed in future versions of Dagger!\nDagger.cancel!(;halt_sch=true)\n\n# Make sure that we have at least one GPU\n@assert length(CUDA.devices()) > 0 \"You don't have any NVIDIA GPUs!\"\n\n# Pick the first available GPU\nGPUArray = CuArray\nscope = Dagger.scope(;cuda_gpu=1)\n```\n\nOnce you have the `scope` that determines Dagger's available resources (in this case, a GPU), you can let Dagger handle whatever your operation is:\n\n```julia\n# Run our `sum` function on the GPU!\nA = rand(Float32, 1024)\nDagger.with_options(;scope) do\n    @show fetch(Dagger.@spawn sum(A))\nend\n```\n\nThis also handles multiple GPUs across processes. If the GPUs are full or computations are not appropriate for a GPU, they can also be dispatched to a multithreading paradigm.\n\nThere's lots of other cool stuff in the talk, including data dependencies to help the Dagger scheduler, distributed arrays, and a nifty implementation of convolutions + Conway's Game of Life.\n\nHonestly I was just amazed at how far Dagger.jl has come. They have a ton of stuff on the roadmap as well, including \n\n- DaggerGraphs.jl for partitioned distributed graph processing\n- Streaming data\n- Auto-GPU processing\n- Expanded data deps support\n- Operator fusion\n- Dagger + Enzyme autodiff"
  },
  "publishedAt": "2024-07-09T07:00:00.000Z",
  "textContent": "JuliaCon 2024\n\nIt's workshop day!\n\nParallel processing with Dagger.jl\n\nDagger.jl is an extremely cool tool. I used Dagger sometime in 2018 I think, but I didn't really have a good distributed computing problem to solve. \n\nJulian Samaroo and Przemysław Szufel presented the workshop. Here's the workshop materials.\n\nMy takeaway was this: Dagger is _fucking crazy_. Essentially, it unifies a bunch of forms of parallel computation: multithread, multiprocess, and GPU. You provide Dagger a collection of resources (such as threads, worker processes, or GPUs) and it handles the scheduling of tasks on those resources. \n\nDagger will pretty much auto-magically figure out things like memory movement between processes -- for cheap tasks, you want to keep data within-process to minimize memory movement, but in some cases a worker may be overloaded and it may be cheaper to move memory to a different worker.\n\nThe simple version of Dagger resembles Julia's standard task workflow:\n\n here is a , which represents a task that will execute on some parallel resource.  will block and return the result of the task.\n\nDagger will also construct a DAG (hence the name DAGger) of your computation -- you can construct an arbitrary set of tasks, and each task will be handed off to another process upon completion. Take this for example:\n\nAbove,  and  are independent and can be run in parallel.  depends on both  and , so it will block until both are complete.\n\nGPU support is quite straightforward as well. Julia's GPU support is wonderful, and you can use any device type you need (CUDA, ROCm, Metal, oneAPI).\n\nHere's how to set up a GPU in Dagger:\n\nOnce you have the  that determines Dagger's available resources (in this case, a GPU), you can let Dagger handle whatever your operation is:\n\nsum\n\nThis also handles multiple GPUs across processes. If the GPUs are full or computations are not appropriate for a GPU, they can also be dispatched to a multithreading paradigm.\n\nThere's lots of other cool stuff in the talk, including data dependencies to help the Dagger scheduler, distributed arrays, and a nifty implementation of convolutions + Conway's Game of Life.\n\nHonestly I was just amazed at how far Dagger.jl has come. They have a ton of stuff on the roadmap as well, including \nDaggerGraphs.jl for partitioned distributed graph processing\nStreaming data\nAuto-GPU processing\nExpanded data deps support\nOperator fusion\nDagger + Enzyme autodiff"
}