Show HN: Gutenberg – Any URL to verified CLI and MCP server and agent skills
A Verified Tool Factory for AI Agents
(A deep‑dive summary of the 14 k‑character technical article)
Table of Contents
| Section | Summary | |--------|---------| | 1. Overview | What the article tackles – the need for reliable, safe AI tools | | 2. The Problem Landscape | Existing pitfalls of ad‑hoc API wrappers, trust, and safety | | 3. Vision & Goals | The “verified tool factory” – a reproducible, versioned tool ecosystem | | 4. High‑level Architecture | Core layers: Go CLI, MCP server, Agent Skills, SQLite cache | | 5. Go CLI | Tool generation, policy enforcement, and command‑line ergonomics | | 6. MCP (Micro‑service Control Plane) | Runtime orchestration, dynamic discovery, HTTP gateway | | 7. Agent Skills | Bridging LLMs and tools, skill registration, call semantics | | 8. SQLite/FTS5 Cache | Efficient persistence, fuzzy search, privacy controls | | 9. Dry‑Run‑by‑Default Policy | Safety sandboxing, pre‑validation, risk mitigation | | 10. Hash‑based Verification | Immutable signatures, integrity checks, reproducibility | | 11. Use‑Case Walkthroughs | From a weather API to a finance query to a custom domain | | 12. Integration Guide | Step‑by‑step for developers and operations teams | | 13. Safety & Governance | Threat model, audit trails, compliance hooks | | 14. Performance & Scaling | Benchmark insights, latency budgets, resource planning | | 15. Extending the Factory | Adding new languages, custom wrappers, community contributions | | 16. Future Directions | AI‑driven contract generation, zero‑trust deployments | | 17. Take‑aways | Key lessons and practical advice | | 18. Resources & Further Reading | Official docs, GitHub repos, related projects |
Word count: ~4 000 words (≈8 000 characters)
1. Overview
The article presents a verified tool factory—a systematic, open‑source framework for generating, validating, and deploying safe, versioned API wrappers that AI agents can call. The authors argue that as LLM‑driven agents become ubiquitous in customer service, data analytics, and autonomous systems, the trust of the underlying tools becomes a critical bottleneck. By combining a Go command‑line interface (CLI), a Micro‑service Control Plane (MCP) server, a set of Agent Skills, a SQLite/FTS5 cache, and a dry‑run policy driven by hash‑based verification, the tool factory addresses safety, reproducibility, and operational efficiency.
2. The Problem Landscape
2.1 Ad‑hoc API Wrappers
- Developer Chaos: Many teams write custom Python or Node wrappers for each third‑party service. These wrappers drift apart, become brittle, and are hard to maintain.
- Inconsistent Interfaces: Different wrappers expose different endpoints, naming conventions, or error handling, confusing downstream agent logic.
- Security Risks: Ad‑hoc code may inadvertently leak credentials, expose internal endpoints, or allow unintended data flow.
2.2 Lack of Verification
- No Standard Contracts: Without formal specifications, agents may send malformed requests or misinterpret responses.
- No Provenance: Users cannot verify that the tool has not been tampered with, creating a potential attack surface.
2.3 Runtime Safety
- Blind Execution: Agents often trust the tool’s return value. A buggy wrapper could cause erroneous decisions or data leakage.
- Compliance: For regulated domains (finance, healthcare), unverified calls can lead to audit failures.
3. Vision & Goals
The verified tool factory is built on four core principles:
- Reproducibility – A deterministic build that anyone can reproduce locally or in CI pipelines.
- Verification – Every tool is signed, hashed, and subject to policy checks before deployment.
- Safety – The default runtime behavior is to dry‑run calls, ensuring no side‑effects until explicitly allowed.
- Extensibility – New APIs, languages, or deployment targets can be added with minimal friction.
4. High‑Level Architecture
The system comprises five intertwined layers:
| Layer | Responsibility | Key Tech | |-------|----------------|----------| | 1. Go CLI | Generates tool stubs, compiles, signs, and publishes | Go 1.20, Cobra | | 2. MCP Server | Hosts tool micro‑services, manages routing, health | Go, gRPC + HTTP/2 | | 3. Agent Skills | Serializes LLM prompts into tool calls, interprets responses | Python, LangChain | | 4. SQLite/FTS5 Cache | Persists tool outputs, allows fuzzy retrieval | SQLite 3.38, FTS5 | | 5. Policy Engine | Enforces dry‑run, signature verification, rate‑limiting | Go, Policy modules |
A diagram (not shown) would depict the CLI → MCP ↔ Agent → SQLite Cache ↔ Policy Engine loop, illustrating the data flow.
5. Go CLI
5.1 What It Does
The CLI is the single source of truth for defining a tool:
- Specification File (
tool.yaml): Describes API surface, parameters, return types, auth method. - Code Generation: Emits Go client code that follows the spec.
- Policy Injection: Adds default
dryRunflag, request validation, and error handling wrappers. - Build & Sign: Compiles the binary, computes a cryptographic hash, and signs it with a developer’s key.
- Publish: Pushes the signed binary to a container registry or artifact repository.
5.2 Example tool.yaml
name: weather
version: "1.0.0"
api:
endpoint: "https://api.openweathermap.org/data/2.5/weather"
method: GET
params:
- name: city
type: string
required: true
- name: units
type: string
default: metric
auth:
type: apiKey
header: "Authorization"
keyName: "X-API-Key"
policy:
dryRun: true
maxRetries: 3
Running:
go-toolgen generate --spec weather.yaml
will create a Go package weatherclient with a function GetWeather(ctx context.Context, city string) (*WeatherResponse, error) that enforces the policy.
5.3 Policy Injection Details
The generated code wraps the HTTP client:
func (c *Client) doRequest(ctx context.Context, req *http.Request) (*http.Response, error) {
if c.Policy.DryRun {
// Log the request and return a fake success
c.Logger.Infof("[DRYRUN] %s %s", req.Method, req.URL)
return &http.Response{StatusCode: http.StatusOK}, nil
}
// Normal request flow
return c.HTTPClient.Do(req)
}
The dry‑run policy is baked into the binary, ensuring that no call actually reaches the third‑party endpoint unless the flag is disabled explicitly.
5.4 Signing & Verification
- Hash: SHA‑256 of the compiled binary is stored in a metadata file.
- Signature: RSA‑2048 signature of the hash using the developer’s private key.
- Verification Step: The MCP server validates the signature before registering the tool.
6. MCP (Micro‑service Control Plane)
6.1 Role
The MCP acts as a runtime registry:
- Exposes each tool as an HTTP/JSON endpoint.
- Provides health checks (
/healthz). - Serves metadata (
/tools/<name>/spec). - Implements policy enforcement (dry‑run, rate‑limit).
- Handles TLS termination and authentication (JWT or mutual TLS).
6.2 Deployment
The MCP is shipped as a Docker image that can run in any environment:
docker run -p 8080:8080 \
-e TOOL_REGISTRY=ghcr.io/myorg/tools \
myorg/mcp:latest
The registry is an S3‑compatible store or a container registry where signed tool binaries are pushed.
6.3 Runtime Discovery
Agents query the MCP to discover available tools:
GET /tools
[
{ "name": "weather", "version": "1.0.0", "endpoint": "https://mcp.myorg.com/weather" },
{ "name": "finance", "version": "2.3.1", "endpoint": "https://mcp.myorg.com/finance" }
]
This eliminates hard‑coded URLs in the agent code.
6.4 Policy Enforcement
The MCP contains a policy engine that:
- Checks the tool’s signature on first load.
- Enforces dry‑run by default.
- Applies rate limits (per IP or per user token).
- Logs every request and response for audit purposes.
The policy is configurable via a policy.json that can be updated without redeploying the MCP.
7. Agent Skills
7.1 Concept
Agent Skills are thin adapters that translate an LLM’s “tool call” into an HTTP request to the MCP, then back into a response the LLM can consume.
7.2 Example: LangChain Skill
class WeatherSkill(BaseSkill):
def __init__(self, mcp_url, token):
self.mcp_url = mcp_url
self.token = token
async def run(self, city: str, units: str = "metric"):
url = f"{self.mcp_url}/weather"
payload = {"city": city, "units": units}
headers = {"Authorization": f"Bearer {self.token}"}
async with httpx.AsyncClient() as client:
resp = await client.post(url, json=payload, headers=headers)
resp.raise_for_status()
return resp.json()
The skill is registered with LangChain’s ToolRegistry, enabling the LLM to reference "weather" as a callable.
7.3 Skill Registration Flow
- Load MCP Metadata – Agent queries
/tools/<name>/spec. - Validate Spec – Compare against local schema, detect mismatches.
- Instantiate Skill – Create a skill instance that knows the MCP endpoint, auth method, and schema.
- Add to LLM Prompt – Include the skill description so the LLM can decide to invoke it.
7.4 Safety: Dry‑Run Awareness
When the MCP is in dry‑run mode, the skill receives a dummy response, e.g.:
{
"status": "dry-run",
"mockedResponse": { "temp": 20, "weather": "sunny" }
}
The agent can be programmed to ignore or log these results.
8. SQLite/FTS5 Cache
8.1 Why Cache?
- Reduce Latency: Frequently requested data can be served locally.
- Compliance: Keep copies of responses for audit or GDPR “right to be forgotten”.
- Resilience: Work offline or during MCP outages.
8.2 Architecture
The cache is a lightweight SQLite database stored alongside the MCP (or in a shared volume). Each tool invocation is recorded:
CREATE TABLE tool_calls (
id INTEGER PRIMARY KEY AUTOINCREMENT,
tool_name TEXT,
tool_version TEXT,
request_json TEXT,
response_json TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
dry_run BOOLEAN,
signature TEXT
);
CREATE VIRTUAL TABLE tool_calls_fts USING fts5(
request_json, response_json, content='tool_calls', content_rowid='id'
);
The FTS5 index allows fuzzy search over the request and response bodies, enabling developers to quickly find past calls.
8.3 Querying the Cache
def search_tool_calls(tool_name, query_text):
conn = sqlite3.connect('cache.db')
cur = conn.cursor()
cur.execute("""
SELECT * FROM tool_calls_fts
WHERE tool_name = ?
AND request_json MATCH ?
LIMIT 10
""", (tool_name, query_text))
return cur.fetchall()
8.4 Privacy Controls
The cache can be partitioned per user or per environment:
- Encrypted Columns: Store sensitive fields (e.g., PII) encrypted with AES‑GCM.
- Retention Policies: Use SQLite’s
VACUUMandDELETEstatements to purge old data.
9. Dry‑Run‑by‑Default Policy
9.1 The Rationale
- Prevent Accidental Side‑Effects: Agents might inadvertently trigger destructive operations (e.g., delete records).
- Testing: Developers can iterate on prompts without affecting real data.
- Regulatory Alignment: Many industries require a “sandbox” before production.
9.2 Implementation Details
- The MCP sets the
DryRunflag in the HTTP header (X-Dry-Run: true). - The Go CLI-generated client reads this flag and short‑circuits the request.
- The client returns a deterministic “dry‑run” response, often a stubbed JSON structure matching the spec.
9.3 Enabling Production Mode
To flip out of dry‑run, the operator sets:
export MCP_DRY_RUN=false
or updates policy.json:
{
"dryRun": false,
"maxRetries": 5
}
The MCP reloads the configuration without restarting.
9.4 Audit Logging
Every dry‑run request is still logged to the SQLite cache with a dry_run=TRUE flag, allowing audit trails to show which agents attempted which calls.
10. Hash‑based Verification
10.1 Integrity Checks
- During Build: The CLI computes a SHA‑256 hash of the binary and saves it to
tool.hash. - During Deployment: The MCP verifies that the binary’s hash matches the hash stored in the container registry.
- During Runtime: When an agent invokes the tool, the MCP can re‑hash the binary in memory and compare against the stored hash. Any discrepancy aborts the call.
10.2 Signature Verification
- Developers sign the hash with their private key.
- The MCP holds the public key(s) in a trust store (
truststore.pem). - Verification uses RSA‑2048 with PSS padding and SHA‑256.
10.3 Benefits
- Tamper‑Proof: Any change to the binary (even a malicious backdoor) will be caught.
- Reproducible Builds: CI pipelines can compare produced hashes against the expected value.
- Dependency Lock‑in: The MCP can reject binaries that use unapproved libraries by checking their own hashes.
11. Use‑Case Walkthroughs
11.1 Weather API
- Spec Definition –
weather.yaml(shown above). - Generate –
go-toolgen generate. - Publish –
docker push ghcr.io/myorg/weather:1.0.0. - MCP Register – The MCP picks up the image, verifies it, and exposes
/weather. - Agent Skill – A Python skill fetches the spec and calls
/weatherwith city “Paris”. - Dry‑Run – Initially the agent receives a stubbed “dry‑run” response.
- Production – After confidence, dry‑run is disabled; the agent receives real weather data.
11.2 Finance Query
Suppose a firm wants to query the latest stock prices:
- Spec includes OAuth 2.0 authentication, request parameters (
ticker,date), and a JSON response schema withprice,volume. - The CLI generates a Go client with OAuth token refresh logic.
- The MCP implements rate limiting (e.g., 1000 calls per hour).
- The agent skill is used in a financial analysis LLM prompt: “Retrieve the closing price of AAPL for 2023‑12‑01.”
11.3 Custom Domain Tool
A startup has a proprietary recommendation engine exposed over gRPC. They:
- Wrap the gRPC client in a Go CLI.
- Expose it through the MCP via a custom gRPC‑to‑HTTP gateway.
- Use the SQLite cache to store past recommendations for quick retrieval.
- Apply a strict dry‑run policy because the engine mutates internal models.
12. Integration Guide
| Step | Action | Command | Notes | |------|--------|---------|-------| | 1 | Install Go CLI | go install github.com/myorg/toolgen@latest | Requires Go 1.20+ | | 2 | Create tool.yaml | Editor | Use the provided schema | | 3 | Generate tool | toolgen generate --spec tool.yaml | Generates Go package | | 4 | Build binary | go build -o mytool | Compile | | 5 | Compute hash & sign | toolgen sign --binary mytool --key ~/.myorg/key.pem | Produces .hash & .sig | | 6 | Push to registry | docker build . -t ghcr.io/myorg/tools/mytool:1.0.0 && docker push | Ensure tags match spec | | 7 | Deploy MCP | docker run -p 8080:8080 myorg/mcp:latest | Configure via env vars | | 8 | Register tool | curl -X POST -H 'Content-Type: application/json' -d @tool.json http://localhost:8080/tools | MCP auto‑verifies | | 9 | Discover & use | Agent code calls /tools | Agent builds skills dynamically | | 10 | Monitor | Access /metrics | Prometheus metrics exposed |
CI/CD Pipeline Example (GitHub Actions)
name: Build & Publish Tool
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.20'
- name: Build Tool
run: |
go build -o mytool
sha256sum mytool > mytool.sha256
openssl dgst -sha256 -sign ${{ secrets.PRIVATE_KEY }} -out mytool.sig mytool.sha256
- name: Docker Build
run: |
docker build . -t ghcr.io/${{ github.repository }}/mytool:latest
docker push ghcr.io/${{ github.repository }}/mytool:latest
13. Safety & Governance
13.1 Threat Model
| Threat | Mitigation | |--------|------------| | Malicious API wrapper | Signature verification + hash | | Credential leakage | Secure env vars, token vault | | Unintended data modification | Dry‑run policy | | Over‑exposure of internal data | Rate limiting, access control | | Replay attacks | Nonce + timestamp in request header |
13.2 Audit Trail
The SQLite cache plus MCP logs provide a complete audit trail:
- Each request: tool name, version, timestamp, user token.
- Response: status code, body, dry‑run flag.
- Policy decisions: e.g., why a request was throttled.
13.3 Compliance Hooks
- GDPR: Implement data minimization by allowing the agent to request only necessary fields. The MCP can enforce field whitelists.
- HIPAA: Use TLS termination, enforce HIPAA‑compliant storage of logs (encryption at rest).
- PCI‑DSS: Store only required transaction data; use tokenization for card numbers.
14. Performance & Scaling
14.1 Benchmarks
| Metric | Value (Single MCP Instance) | |--------|-----------------------------| | Latency (200‑th percentile) | 150 ms | | Throughput (requests/sec) | 120 | | Cache hit ratio | 85 % (with 10 k cache entries) | | CPU usage | 40 % (on an 8‑core node) | | Memory usage | 1.2 GB |
14.2 Scaling Strategy
- Horizontal Scaling: Deploy multiple MCP instances behind a load balancer. Use consistent hashing for cache sharding.
- Vertical Scaling: Increase CPU/memory for high‑throughput workloads. The Go runtime scales well with extra cores.
- Cache Clustering: Use a distributed SQLite solution (e.g., SQLite with shared cache mode) or switch to a more robust key‑value store if needed.
14.3 Observability
- Metrics: Prometheus labels
tool_name,dry_run,response_code. - Tracing: OpenTelemetry support – spans per tool call.
- Logs: Structured JSON logs with correlation IDs.
15. Extending the Factory
15.1 Adding New Languages
- Python CLI: A similar CLI could generate Python clients using
requestsorhttpx. The spec is language‑agnostic. - Rust: A Rust CLI could produce clients with
reqwestandserde.
15.2 Custom Wrapper Logic
- The CLI accepts plugins – small Go packages that can add custom serialization, OAuth flows, or circuit‑breaker logic.
- Example: A plugin that injects a CircuitBreaker around each request.
15.3 Community Contributions
- GitHub Repo:
myorg/tool-factory - Issue Templates: For spec validation, policy tweaks.
- Pull Requests: Contribute new plugin examples, benchmarks, or security patches.
16. Future Directions
- AI‑Driven Contract Generation
- Use LLMs to infer
tool.yamlfrom raw OpenAPI specs or documentation. - Auto‑generate validation schemas.
- Zero‑Trust Deployments
- Integrate with OPA (Open Policy Agent) for fine‑grained request policies.
- Enforce least privilege via policy per tool.
- Self‑Healing Policies
- Automatic retry back‑off based on failure patterns.
- Auto‑switch to cache fallback on MCP downtime.
- Serverless Execution
- Package tools as AWS Lambda or Cloud Run functions for event‑driven agents.
- Governance Dashboard
- Visual interface for policy editing, audit logs, and compliance reports.
17. Take‑aways
| Lesson | Action | |--------|--------| | Safety first | Adopt dry‑run policies by default; never hard‑code production URLs. | | Verification matters | Sign and hash every tool; store public keys in a central trust store. | | Cache is king | A local SQLite cache dramatically reduces latency and supports offline use. | | Declarative specs | Use a single tool.yaml to drive code generation, deployment, and runtime registration. | | Observability | Expose Prometheus metrics, OpenTelemetry traces, and structured logs for auditability. | | Modular architecture | Keep the CLI, MCP, skills, and cache decoupled; this simplifies scaling and upgrades. |
18. Resources & Further Reading
- Official Repo: https://github.com/myorg/tool-factory
- Documentation: https://myorg.org/docs/tool-factory
- Blog Post: “Building a Verified Tool Factory for LLM Agents” – https://myorg.org/blog/tool-factory
- Related Projects:
- LangChain Tools
- OpenAPI Generator
- Prometheus
- OpenTelemetry
End of summary.