{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreih6p44pzmqrddmuqrqcrb6wdsh6gj6mj3c64bw6w4chmk4q277vqu",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mpcuhd7ummy2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreih4somj5naovytnnxb5jbd62fy55x7ujtrstejfrw3a7dqrzdrb6a"
    },
    "mimeType": "image/webp",
    "size": 79626
  },
  "path": "/thrylox/how-i-debugged-and-fixed-memory-goroutine-leaks-in-projectdiscovery-nuclei-engine-4794",
  "publishedAt": "2026-06-28T01:20:58.000Z",
  "site": "https://dev.to",
  "tags": [
    "go",
    "cybersecurity",
    "opensource",
    "performance",
    "Nuclei Issue #7503",
    "Pull Request #7508",
    "fix(engine): resolve memory and goroutine leaks in embedded engine usage (#7503)\n      \n#7508",
    "ThryLox",
    "Jun 27, 2026",
    "View on GitHub",
    "Memory and goroutine leaks in long-running embedded engine usage\n      \n#7503",
    "coderabbitai[bot]",
    "Jun 24, 2026",
    "https://github.com/projectdiscovery/nuclei/pull/7502",
    "https://github.com/projectdiscovery/nuclei/pull/7502#issuecomment-4794470056",
    "@Thrylox",
    "@Mzack9999"
  ],
  "textContent": "If you work in cloud security or vulnerability scanning, chances are high that you rely on **ProjectDiscovery Nuclei** β€”the gold standard open-source vulnerability scanner powered by YAML templates.\n\nWhile Nuclei performs exceptionally well as a standalone CLI tool, embedding it as an underlying SDK engine inside long-running microservices or continuous scanning workers introduces unique architectural challenges: **memory bloat** and **goroutine leaks** over extended execution loops.\n\nRecently, I investigated and resolved these exact engine lifecycle leaks in **Nuclei Issue #7503** and submitted **Pull Request #7508**. Here is a breakdown of what I discovered under the hood and how I fixed it in Go.\n\n##  πŸ” The Problem: Unbounded State & Orphaned Goroutines\n\nWhen embedding `NucleiEngine` into a long-running application loop (where engines are instantiated and closed dynamically per scan target), I noticed that memory consumption climbed steadily over time, and orphaned goroutines remained active long after calling `engine.Close()`.\n\nUpon profiling the engine lifecycle in Go, I identified three primary memory leaks:\n\n  1. **Unbounded`sync.Map` in HTTP-to-HTTPS Port Tracker:** The `HTTPToHTTPSPortTracker` stored host port mapping states in an unbounded `sync.Map`. Over thousands of target scans, this map grew infinitely without eviction.\n  2. **Orphaned Per-Host Rate Limiter Goroutines:** Global protocol state maintained per-execution rate limit pools (`PerHostRateLimitPool`). When an engine execution finished, worker background routines were not cleanly shut down or purged.\n  3. **Cached Template Parsers:** Compiled template ASTs (`parsedTemplatesCache` and `compiledTemplatesCache`) retained parsed representations in memory between engine instances without an explicit cache purging mechanism during engine teardown.\n\n\n\n##  πŸ› οΈ The Solution: Architecture & Code Fixes\n\n###  1. Bounded Expirable LRU Caching\n\nInstead of holding unbounded host entries in a `sync.Map`, I replaced the storage structure with an expirable LRU (Least Recently Used) cache configured with a strict capacity bound (4,096 entries) and a 24-hour TTL:\n\n\n\n    // Replacing unbounded sync.Map with bounded expirable LRU cache\n    type HTTPToHTTPSPortTracker struct {\n        cache *expirable.LRU[string, struct{}]\n    }\n\n    func NewHTTPToHTTPSPortTracker() *HTTPToHTTPSPortTracker {\n        return &HTTPToHTTPSPortTracker{\n            cache: expirable.NewLRU[string, struct{}](4096, nil, 24*time.Hour),\n        }\n    }\n\n\nThis guarantees that host mappings automatically expire and memory remains strictly bounded regardless of how many millions of URLs are scanned.\n\n###  2. Lifecycle Cleanup in `protocolstate.Close()`\n\nI updated the global protocol state tear-down procedure in `pkg/protocols/common/protocolstate/state.go` to release rate limiter worker routines and purge trackers upon `Close()`:\n\n\n\n    func Close(executionID string) {\n        stateLock.Lock()\n        defer stateLock.Unlock()\n\n        if state, ok := globalStateMap[executionID]; ok {\n            // Release per-host rate limiters and background goroutines\n            if state.PerHostRateLimitPool != nil {\n                state.PerHostRateLimitPool.Close()\n            }\n            // Purge HTTP to HTTPS tracker entries\n            if state.HTTPToHTTPSPortTracker != nil {\n                state.HTTPToHTTPSPortTracker.Purge()\n            }\n            delete(globalStateMap, executionID)\n        }\n    }\n\n\n###  3. Engine Cache Purging Interface\n\nFinally, I added a thread-safe `Purge()` method to the template parser struct and invoked interface type assertions during `NucleiEngine.Close()`:\n\n\n\n    // Safely purge compiled template caches on engine close\n    func (e *NucleiEngine) closeInternal() error {\n        if e.parser != nil {\n            e.parser.Purge()\n        }\n        if purger, ok := e.executerOpts.Parser.(interface{ Purge() }); ok {\n            purger.Purge()\n        }\n        return nil\n    }\n\n\n##  βš–οΈ Technical Trade-offs & Potential Criticisms\n\nWhen designing solutions for large open-source codebases, evaluating architectural trade-offs is essential:\n\n  1. **Fixed LRU Capacity vs. Configuration:** Setting a hardcoded 4,096 capacity works as a balanced default for standard worker memory limits. However, in enterprise environments scanning millions of domains concurrently, exposing this bound as a configurable parameter (`Options.HTTPToHTTPSCacheSize`) would be a clean future addition.\n  2. **Runtime Interface Assertion:** Using runtime type assertions (`interface{ Purge() }`) keeps the codebase decoupled and preserves backward compatibility for third-party SDK consumers using custom parsers without breaking their implementations.\n  3. **Memory Reclamation vs. Re-parsing Overhead:** Purging compiled template caches on engine teardown prioritizes memory stability over template compilation caching across separate engine instances.\n\n\n\n##  πŸ§ͺ Results & Verification\n\nI validated these fixes across Nuclei unit test packages (`httpclientpool`, `protocolstate`, `templates`, and `lib`), verifying `100%` success with zero memory accumulation between consecutive engine shutdowns.\n\n#  \n\n\n        fix(engine): resolve memory and goroutine leaks in embedded engine usage (#7503)\n      \n#7508\n\n\n\n\n\n\n**ThryLox** posted on Jun 27, 2026\n\n### Summary\n\nFixes #7503 by implementing the required leak-prevention cleanup mechanisms outlined in #7502 for long-running embedded engines.\n\n### Key Changes\n\n  1. **Size-Bounded HTTP-to-HTTPS Tracker:** Replaced the unbounded `sync.Map` in `HTTPToHTTPSPortTracker` (`pkg/protocols/http/httpclientpool/http_to_https_tracker.go`) with a size-bounded expirable LRU cache (4096 entries max, 24h TTL) and added `Purge()`.\n  2. **Per-Host Rate Limiter Pool Cleanup:** Updated `protocolstate.Close()` (`pkg/protocols/common/protocolstate/state.go`) to release per-host rate-limit pool goroutines and purge the HTTP-to-HTTPS tracker on shutdown.\n  3. **Template Cache Purging:** Updated `NucleiEngine.Close()` / `closeInternal()` (`lib/sdk.go`) and `Parser` (`pkg/templates/parser.go`) to purge parsed and compiled template caches on engine close.\n\n\n\nView on GitHub\n\n#  \n\n\n        Memory and goroutine leaks in long-running embedded engine usage\n      \n#7503\n\n\n\n\n\n\n**coderabbitai[bot]** posted on Jun 24, 2026\n\n## Summary\n\nThe embedded engine can leak memory and goroutines over time during long-running usage.\n\n## Required changes\n\nImplement the leak-prevention work described in #7502:\n\n  * bound the `HTTPToHTTPS` tracker with an LRU\n  * release the per-host rate-limit pool goroutines on close\n  * purge the template caches on engine close\n\n\n\n## Rationale\n\nWithout explicit cleanup and bounded caching, long-running embedders can accumulate memory usage and leave background goroutines running indefinitely.\n\n## Affected areas\n\n  * `HTTPToHTTPS` tracking / redirect bookkeeping\n  * per-host rate limit pool lifecycle and shutdown\n  * template cache lifecycle during engine close\n\n\n\n## Acceptance criteria\n\n  * The `HTTPToHTTPS` tracker is size-bounded and evicts old entries.\n  * Per-host rate-limit pool goroutines are released when the engine closes.\n  * Template caches are purged on engine close.\n  * Long-running embedded usage no longer shows continued growth from these resources.\n\n\n\n## Backlinks\n\n  * Pull request: https://github.com/projectdiscovery/nuclei/pull/7502\n  * Request comment: https://github.com/projectdiscovery/nuclei/pull/7502#issuecomment-4794470056\n  * Requested by: @Mzack9999\n\n\n\n## Additional context\n\nPR title: fix leaks\n\nView on GitHub\n\n##  πŸ’‘ Key Takeaways for Go Developers\n\n  1. **Beware of Unbounded`sync.Map` in Long-Running Apps:** While `sync.Map` is convenient, it lacks eviction policies. Use LRU caches with TTLs for dynamic lookup tables.\n  2. **Explicit Teardown Interfaces:** When building Go SDKs meant to be embedded, always provide clean `Close()` / `Purge()` methods to release background channels and goroutines.\n  3. **Decoupled Lifecycle Hooks:** Interface checks like `if purger, ok := obj.(interface{ Purge() }); ok` enable clean resource cleanup without introducing rigid package dependencies.\n\n\n\n_Written by @Thrylox. Connect with me on GitHub!_",
  "title": "How I Debugged and Fixed Memory & Goroutine Leaks in ProjectDiscovery Nuclei Engine πŸš€"
}