Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigyaqsqp7xxiubs75q72rzice34x3ryvwqxe4326x4dz57bn3uska",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mp7j4kdjxd22"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreicwps7r6d2nt4penripzvamxjcombyahocd5hc2jb4xbavbma6yjm"
    },
    "mimeType": "image/webp",
    "size": 59498
  },
  "path": "/khalidelokiely/building-a-zero-leak-postgres-mcp-gateway-in-go-3be6",
  "publishedAt": "2026-06-26T17:28:58.000Z",
  "site": "https://dev.to",
  "tags": [
    "mcp",
    "go",
    "security",
    "postgres",
    "compounds.id",
    "github.com/lib/pq",
    "khalidelokiely/mcp-postgres-gateway",
    "@modelcontextprotocol"
  ],
  "textContent": "The promise of agentic AI workflows introduces a critical architectural paradox: to make an LLM deeply useful, you must grant it structural awareness of your data layer. Traditional integration patterns force a losing trade-off. Either you hand an external orchestrator direct database access (risking catastrophic data egress), or you must serialize and persist your entire proprietary database schema onto third-party infrastructure. This exposure of internal domain definitions outside the secure perimeter represents a massive intellectual property leak, stalling production AI adoption in highly competitive or regulated sectors. For instance, a localized real estate consultancy managing proprietary compound metrics and high-value transactional ledgers cannot afford to expose its structural competitive edge to a shared cloud context just to run an analytical prompt.\n\nTo bridge this gap, backend teams must shift toward an architectural pattern where the data plane isolates schema definitions and executes only the commands explicitly defined by the MCP server, delivering pre-approved aggregations without ever leaking raw data layouts upstream. This article demonstrates how to build a zero-leak database proxy in Go using the Model Context Protocol (MCP) over a secure `stdio` transport layer. By decoupling the LLM from direct database access, you will implement a live gateway that executes two core tasks: **Dynamic Schema Reflection** to auto-generate tool manifests programmatically, and **Analytical Egress Hardening** to ensure the external AI agent never touches a raw database row.\n\nThe project follows a standard go folder layout - `cmd/` for the entrypoint, `pkg/db` for the Postgres connection and logic. This isn’t a framework requirement, just a convention that keeps schema reflection, query execution and MCP transport cleanly separated. You can flatten this into a single file for prototyping.\n\n###  The MCP zero-leak architecture\n\nThree things make this gateway zero-leak;\n\n  1. **Schema Visibility** : What it’s allowed to see from the data source (`EXPOSED_TABLES`)\n  2. **Aggregation:** What it’s allowed to compute\n  3. **Tool Registration to the MCP server** : How these become callable by the LLM.\n\n\n\n###  The Schema **Visibility**\n\nSchema visibility step utilizes Postgres’ `information_schema.columns` table to actually fetch column metadata from the database - instead of having to hardcode or dump it out of our database every time the LLM needs to know about what schema structure is available in our data layer.\n\nIn `pkg/db/postgres.go` we create an `InspectExposedSchema` function that returns a slice of type `ColumnMetadata` which can eventually be passed into the LLM context window.\n\n\n\n    package db\n\n    import (\n        \"database/sql\"\n    )\n\n    // ColumnMetadata defines a single column in our postgres database\n    type ColumnMetadata struct {\n        TableName  string\n        ColumnName string\n        DataType   string\n    }\n\n    // InspectExposedSchema reads structural layout data dynamically from the system catalog.\n    func InspectExposedSchema(db *sql.DB, exposedTables []string) ([]ColumnMetadata, error) {\n        query := `\n            SELECT table_name, column_name, data_type\n            FROM information_schema.columns\n            WHERE table_schema = 'public'\n            AND table_name = ANY($1)\n            ORDER BY table_name, column_name;`\n\n        rows, err := db.Query(query, exposedTables)\n        if err != nil {\n            return nil, err\n        }\n        defer rows.Close()\n\n        var metadata []ColumnMetadata\n        for rows.Next() {\n            var col ColumnMetadata\n            if err := rows.Scan(&col.TableName, &col.ColumnName, &col.DataType); err != nil {\n                return nil, err\n            }\n            metadata = append(metadata, col)\n        }\n\n        if err := rows.Err(); err != nil {\n            return nil, err\n        }\n\n        return metadata, nil\n    }\n\n\nIn the project we’ve setup a `.env` file with the following variable:\n\n\n\n    EXPOSED_TABLES=compounds,sales_ledger\n\n\nThis variable is read and passed into the `InspectExposedSchema` function to fetch only those tables that we’ve explicitly whitelisted for visibility.\n\nIt’s worth dwelling on why this is a deny-by-default allowlist rather than an exposed-by-default filter.\n\nA more generic approach would remove the filter entirely, but in a regulated FinTech or real estate platform, that's not a hypothetical risk. Staging tables, audit logs, or a `users` table with national ID numbers would become visible to the orchestrator the moment they're created, with zero code change and zero review. The allowlist isn't extra friction, it's the only thing standing between \"the LLM sees what we intended\" and \"the LLM sees whatever the last migration happened to leave lying around.”\n\n####  Important note on the choice of environment variables in this article:\n\nIn this article we’re only highlighting a single filter level (table level). But a more production-ready design would include a deeper deny list on a more granular level for columns such as surrogate keys, create/delete/update timestamps or vector fields if you’re using PGVector.\n\n###  Aggregation\n\nRaw query access is the obvious approach — and the wrong one. Here's why the gateway pre-defines every computation the LLM is allowed to run. For this project, we are taking on one business case where the user of the LLM needs an aggregate of the total number of units sold (`units_sold`), total revenue made (`revenue_egp`) and total cancelled orders (`cancelled_orders`) for a specific `region`\n\nIn the schema provided in the repository, we have 2 entities `compounds` and `sales_ledger` . `sales_ledger` column `compound_id` is a foreign key that references compounds.id .\n\nIn many popular MCP implementations, the LLM would generally create the aggregation query and send it as plain-text for execution. This poses massive security risk - aside from `DELETE` or `DROP` statements which are naive assumptions given a read-only access. The real risk is an exhaustive `SELECT` query. There is no telling what the LLM might decide is the best path. For the majority of cases it might send the correct query for the business need directly.\n\n\n\n    -- Find aggregate of units sold, revenue, cancelled orders\n    -- relative to a select region\n    SELECT compounds.region,\n            sum(units_sold) AS TOTAL_UNITS_SOLD,\n                    sum(revenue_egp) AS TOTAL_REVENUE,\n                    sum(cancelled_orders) AS TOTAL_CANCELLED\n    FROM sales_ledger JOIN compounds ON sales_ledger.compound_id = compounds.id\n    WHERE compounds.region = ANY($1)\n    GROUP BY compounds.region\n\n\nBut if an attacker were to hijack a session or acquire access to the server running the LLM, there is no stopping them from instructing or injecting a prefix to the context window that instructs the LLM to pull raw data to the server and process it instead of aggregate it.\n\nA more secure gateway only allows the LLM to know what it must know - without any possibility of further hijacking.\n\nIn `pkg/db/queries.go` we initialize a Queries struct and constructor for it which accepts a `*sql.DB` connection:\n\n\n\n    type Queries struct {\n        db *sql.DB\n    }\n\n    func NewQueries(db *sql.DB) *Queries {\n        return &Queries{\n            db: db,\n        }\n    }\n\n\nThen we create the result struct for the first type of aggregation which consists of all the fields that represent a single record out from the above query.\n\n\n\n    type RegionalMetricsResult struct {\n        Region          string  `json:\"region\"`\n        UnitsSold       int     `json:\"unitsSold\"`\n        TotalRevenue    float64 `json:\"totalRevenue\"`\n        CancelledOrders int     `json:\"cancelledOrders\"`\n    }\n\n\nFinally, we create `FindRegionalMetrics` method on `Queries` struct with a pointer receiver:\n\n\n\n    func (q *Queries) FindRegionalMetrics(ctx context.Context, regions []string) ([]RegionalMetricsResult, error) {\n        query := `SELECT compounds.region,\n                             sum(units_sold) AS TOTAL_UNITS_SOLD,\n                                 sum(revenue_egp) AS TOTAL_REVENUE,\n                                 sum(cancelled_orders) AS TOTAL_CANCELLED\n                    FROM sales_ledger JOIN compounds ON sales_ledger.compound_id = compounds.id\n                    WHERE compounds.region = ANY($1)\n                    GROUP BY compounds.region`\n\n        rows, err := q.db.QueryContext(ctx, query, pq.Array(regions))\n\n        if err != nil {\n            return nil, err\n        }\n\n        defer rows.Close()\n\n        var result []RegionalMetricsResult\n\n        for rows.Next() {\n            var col RegionalMetricsResult\n            if err := rows.Scan(&col.Region, &col.UnitsSold, &col.TotalRevenue, &col.CancelledOrders); err != nil {\n                return nil, err\n            }\n\n            result = append(result, col)\n        }\n\n        if err := rows.Err(); err != nil {\n            return nil, err\n        }\n\n        return result, nil\n    }\n\n\n`pq.Array` is required here because Go's `database/sql` doesn't natively serialize a string slice to Postgres's `ANY($1)` array syntax - the `lib/pq` driver wrapper handles that translation.\n\nThe `FindRegionalMetrics` and any similar method absolutely doesn’t have to know about who is calling it. It doesn’t care if the caller is an MCP server or a CRUD API server. It is pure business logic that constricts and abstracts flow from the underlying data store, essentially telling the LLM what it is allowed to do with the data.\n\nThis is also true in case your team decides to create a more complex and dynamic aggregate implementation - The end goal remains the same: You give the LLM a sparse set of information proxies that cannot be abused even if an attacker gains access.\n\n###  Registering the functions as MCP Tools\n\nNow comes the part where we register these tools as discoverable and usable utilities to the LLM.\n\nFor this project, we are using `github.com/mark3labs/mcp-go` to register MCP tools and run the MCP server.\n\nFirst, we define a small helper that serializes any result type to indented JSON before returning it to the MCP transport layer. Using `any` as the input type means this same function works for every tool response — schema metadata, regional metrics, or any future query result.\n\n\n\n    func formatResult(v any) string {\n        b, _ := json.MarshalIndent(v, \"\", \"  \")\n        return string(b)\n    }\n\n\nIn production, the marshal error should be handled explicitly. For this gateway, marshaling failures on known struct types are effectively impossible, but the pattern should be hardened before shipping.\n\nThe library makes it easy to add a descriptor for the tools using the `mcp.NewTool`method.\n\nFor the `list_tables` tool - initialize a tool name and the description:\n\n\n\n        listTablesTool := mcp.NewTool(\"list_tables\",\n            mcp.WithDescription(\"Lists all available database schemas and field structures without exposing raw database records.\"),\n        )\n\n\nThen use the `AddTool` method to actually make the tool usable and utilize the `InspectExposedSchema` function we created above:\n\n\n\n        s.AddTool(listTablesTool, func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {\n            cols, err := db.InspectExposedSchema(database, exposedTables)\n            if err != nil {\n                return mcp.NewToolResultError(fmt.Sprintf(\"Failed to map system constraints: %s\", err.Error())), nil\n            }\n\n            return mcp.NewToolResultText(formatResult(cols)), nil\n        })\n\n\nThe first line of the function has 2 important things to note:\n\n\n\n         cols, err := db.InspectExposedSchema(database, exposedTables)\n\n\n`database` is a variable holding the `*sql.DB` instance.\n\n`exposedTables` is the largely configurable `.env` `EXPOSED_TABLES` variable we introduced earlier. This tells the InspectExposedSchema to only pull the information for the explicitly allowed tables.\n\nNext, comes the aggregate method registration. First, initialize the `Queries` struct:\n\n\n\n        queries := db.NewQueries(database)\n\n\nThe `FindRegionalMetrics` expects a slice of strings for its second argument `regions []string` . The `get_metrics` MCP tool can be configured in the `mcp.NewTool` method to annotate that this tool requires a string slice:\n\n\n\n        metricsTool := mcp.NewTool(\"get_metrics\",\n            mcp.WithDescription(\"Retrieves metrics for specified geographical regions\"),\n\n            // Define your slice parameter here\n            mcp.WithArray(\"region\",\n                mcp.Required(), // <-- This marks the parameter as required in the JSON Schema\n                mcp.Description(\"A list of regions to filter metrics by (e.g. ['New Cairo', 'North Coast'])\"),\n            ),\n        )\n\n\nThe `mcp.WithArray` tells the MCP server to expect a json array.\n\nNext add the tool:\n\n\n\n        s.AddTool(metricsTool, func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {\n            regions, err := request.RequireStringSlice(\"region\")\n            if err != nil {\n                return mcp.NewToolResultError(err.Error()), nil\n            }\n\n            result, err := queries.FindRegionalMetrics(ctx, regions)\n            if err != nil {\n                return mcp.NewToolResultError(err.Error()), nil\n            }\n\n            return mcp.NewToolResultText(formatResult(result)), nil\n        })\n\n\nThe first line:\n\n\n\n            regions, err := request.RequireStringSlice(\"region\")\n\n\nIs important because in the tool description, we only hinted at providing an `Array` . This method `request.RequireStringSlice`enforces a typed `Array` translating to a go `StringSlice` .\n\nAs covered in the previous section, `pq.Array` handles the Go-to-Postgres array serialization that `database/sql` doesn't provide natively.\n\nThe MCP server now exposes exactly two tools - no more, no less. The LLM can discover what exists and compute what's permitted. Everything else in the database remains invisible.\n\n###  Wiring it all Together\n\nNow to see the entire structure come to life, we wire together all that was built above into an entrypoint.\n\nAs mentioned previously we’re using `github.com/mark3labs/mcp-go` to spin up an MCP server instead of building one from scratch.\n\nIn this project the `main.go` is located in a standard path `cmd/gateway/main.go` . The full main.go looks like this:\n\n\n\n    package main\n\n    import (\n        \"context\"\n        \"database/sql\"\n        \"encoding/json\"\n        \"fmt\"\n        \"log\"\n        \"mcp-postgres-gateway/pkg/db\"\n        \"os\"\n        \"strings\"\n\n        \"github.com/joho/godotenv\"\n        _ \"github.com/lib/pq\" // CRITICAL: Must be explicitly imported here to register the driver\n        \"github.com/mark3labs/mcp-go/mcp\"\n        \"github.com/mark3labs/mcp-go/server\"\n    )\n\n    func formatResult(v any) string {\n        b, _ := json.MarshalIndent(v, \"\", \"  \")\n        return string(b)\n    }\n\n    func main() {\n        err := godotenv.Load(\".env\")\n        // Initialize Postgres Connection\n        connStr := os.Getenv(\"DATABASE_URL\")\n\n        exposedTables := strings.Split(os.Getenv(\"EXPOSED_TABLES\"), \",\")\n        if len(exposedTables) == 0 {\n            log.Fatal(\"EXPOSED_TABLES environment variable is not set\")\n        }\n\n        if connStr == \"\" {\n            log.Fatal(\"DATABASE_URL environment variable is not set\")\n        }\n\n        database, err := sql.Open(\"postgres\", connStr)\n        if err != nil {\n            log.Fatalf(\"Database initialization failure: %v\", err)\n        }\n        defer database.Close()\n\n        // Establish the MCP Core Server Block\n        s := server.NewMCPServer(\"domainai-gateway\", \"1.0.0\")\n\n        // 1. Tool 1 Implementation: Expose Schema Table Information\n        listTablesTool := mcp.NewTool(\"list_tables\",\n            mcp.WithDescription(\"Lists all available database schemas and field structures without exposing raw database records.\"),\n        )\n\n        s.AddTool(listTablesTool, func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {\n            cols, err := db.InspectExposedSchema(database, exposedTables)\n            if err != nil {\n                return mcp.NewToolResultError(fmt.Sprintf(\"Failed to map system constraints: %s\", err.Error())), nil\n            }\n\n            return mcp.NewToolResultText(formatResult(cols)), nil\n        })\n\n        // Data tools\n        queries := db.NewQueries(database)\n\n        metricsTool := mcp.NewTool(\"get_metrics\",\n            mcp.WithDescription(\"Retrieves metrics for specified geographical regions\"),\n\n            // Define your slice parameter here\n            mcp.WithArray(\"region\",\n                mcp.Required(), // <-- This marks the parameter as required in the JSON Schema\n                mcp.Description(\"A list of regions to filter metrics by (e.g. ['US', 'EU'])\"),\n            ),\n        )\n\n        s.AddTool(metricsTool, func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {\n            regions, err := request.RequireStringSlice(\"region\")\n            if err != nil {\n                return mcp.NewToolResultError(err.Error()), nil\n            }\n\n            result, err := queries.FindRegionalMetrics(ctx, regions)\n            if err != nil {\n                return mcp.NewToolResultError(err.Error()), nil\n            }\n\n            return mcp.NewToolResultText(formatResult(result)), nil\n        })\n\n        // Start the Server to communicate natively over standard IO channels\n        log.Println(\"MCP Gateway initialized. Establishing communication channel over Stdio...\")\n        if err := server.ServeStdio(s); err != nil {\n            fmt.Fprintf(os.Stderr, \"Server crash anomaly: %v\\n\", err)\n            os.Exit(1)\n        }\n    }\n\n\n\nIn this implementation, `.env` loading failures are intentionally non-fatal. The application falls back to system environment variables, which is the correct behavior in containerized deployments where `.env` files aren't present.\n\nNotice how we must import github.com/lib/pq using alias `_` for side effects. Once registered `database/sql` knows exactly how to handle the postgres protocol behind the scenes when you initialize a connection.\n\nAlso notice this block of code:\n\n\n\n        connStr := os.Getenv(\"DATABASE_URL\")\n\n        exposedTables := strings.Split(os.Getenv(\"EXPOSED_TABLES\"), \",\")\n        if len(exposedTables) == 0 {\n            log.Fatal(\"EXPOSED_TABLES environment variable is not set\")\n        }\n\n        if connStr == \"\" {\n            log.Fatal(\"DATABASE_URL environment variable is not set\")\n        }\n\n\n\nThe application deliberately terminates the program if `DATABASE_URL` is not found in environment. But more notably this pattern is also enforced early in the program when no `EXPOSED_TABLES` are set. This can be helpful to save network resources and give an early failure signal if your MCP server communicates with the database service across another network or if the database service is a microservice in your ecosystem.\n\nTo test the MCP service, you can spin up a quick, on demand MCP inspector UI in your browser by running this npx command:\n\n\n\n    npx -y @modelcontextprotocol/inspector go run cmd/gateway/main.go\n\n\nThis should open up an MCP inspector tab in your browser.\n\n####  Test the tools\n\nRunning `list_tables` tool should yield an output similar to this:\n\n\n\n    [\n      {\n        \"TableName\": \"compounds\",\n        \"ColumnName\": \"developer\",\n        \"DataType\": \"character varying\"\n      },\n      {\n        \"TableName\": \"compounds\",\n        \"ColumnName\": \"id\",\n        \"DataType\": \"integer\"\n      },\n      {\n        \"TableName\": \"compounds\",\n        \"ColumnName\": \"launch_year\",\n        \"DataType\": \"integer\"\n      },\n      {\n        \"TableName\": \"compounds\",\n        \"ColumnName\": \"name\",\n        \"DataType\": \"character varying\"\n      },\n      {\n        \"TableName\": \"compounds\",\n        \"ColumnName\": \"region\",\n        \"DataType\": \"character varying\"\n      },\n      {\n        \"TableName\": \"compounds\",\n        \"ColumnName\": \"total_units\",\n        \"DataType\": \"integer\"\n      },\n      {\n        \"TableName\": \"sales_ledger\",\n        \"ColumnName\": \"cancelled_orders\",\n        \"DataType\": \"integer\"\n      },\n      {\n        \"TableName\": \"sales_ledger\",\n        \"ColumnName\": \"compound_id\",\n        \"DataType\": \"integer\"\n      },\n      {\n        \"TableName\": \"sales_ledger\",\n        \"ColumnName\": \"id\",\n        \"DataType\": \"integer\"\n      },\n      {\n        \"TableName\": \"sales_ledger\",\n        \"ColumnName\": \"quarter\",\n        \"DataType\": \"character varying\"\n      },\n      {\n        \"TableName\": \"sales_ledger\",\n        \"ColumnName\": \"revenue_egp\",\n        \"DataType\": \"numeric\"\n      },\n      {\n        \"TableName\": \"sales_ledger\",\n        \"ColumnName\": \"units_sold\",\n        \"DataType\": \"integer\"\n      }\n    ]\n\n\nRunning the `get_metrics` tool with the input of `[\"New Cairo\", \"North Coast\"]` Should yield the below aggregated metrics for each region.\n\n\n\n    [\n      {\n        \"region\": \"New Cairo\",\n        \"unitsSold\": 165,\n        \"totalRevenue\": 1245000000,\n        \"cancelledOrders\": 3\n      },\n      {\n        \"region\": \"North Coast\",\n        \"unitsSold\": 12,\n        \"totalRevenue\": 180000000,\n        \"cancelledOrders\": 4\n      }\n    ]\n\n\nThe LLM received aggregated metrics - totals, not rows. It knows New Cairo sold 165 units. It has no path to the individual transaction records that produced that number. That's the boundary the gateway enforces.\n\n###  The takeaway\n\nThe Go ecosystem is underrepresented in MCP tooling — most implementations lean on Python or TypeScript. But the real gap isn't language choice. It's architectural discipline.\n\nAn MCP gateway that lets the LLM construct its own queries is only as secure as the LLM's judgment - and judgment is exactly what attackers exploit. The pattern in this article inverts that assumption: the gateway defines what's computable, the LLM executes within those boundaries, and raw data never crosses the perimeter.\n\nThis isn't a limitation of the architecture. It's the feature.\n\nThe full implementation is available at khalidelokiely/mcp-postgres-gateway. Clone it, point it at your own Postgres instance, and extend `queries.go` with the aggregations your business logic actually needs. The schema reflection and transport layer stay unchanged — only the computations you choose to expose are yours to define.",
  "title": "Building a Zero-Leak Postgres MCP Gateway in Go"
}