Thought Leadership

Inside the Energy-Market Agent — The Audit Trail

Andreas Martens May 2026

Most "AI-curated" summaries are unauditable claims. A model returns a list, the reader has no way to know which sources were checked, which queries were fired, or what got filtered out. In a regulated B2B context — especially the energy industry — that is not enough.

The Energy-Market Research Agent (energy-market.qurix.tech) was built differently. Every weekly run leaves an audit trail: the actual search queries that were fired, the candidate URLs that came back, the source domains seen, the items that got dropped and the reasons. The numbers below are from one real run.

The agent fires every Monday at 07:00 UTC, covers the previous seven days of energy-industry news, and produces up to ten ranked items with concrete actions per stakeholder group. Behind that simple-sounding deliverable sit four stages — each with a specific job, each leaving a log entry the operator can inspect after the fact.

Four Stages, One Pipeline

Each stage does one thing. Each stage logs what it did. None is a black box.

Source Selection Where the agent looks

Open web via Brave Search, biased toward German-language regulatory + trade sources.

The agent searches the open web (Brave Search; Tavily as fallback). The German bias comes from three indirect levers: the system prompt is German, the search queries the model formulates are German, and English sources are accepted only when DACH market relevance is clear. The audit log records the top source domains for each run — in one recent batch the top hits were remit.bundesnetzagentur.de, bundeswirtschaftsministerium.de, transnetbw.de and bdew.de, exactly the regulatory + grid + trade-association layer where decisions actually flow through. Curated RSS feeds for these sources are on the roadmap.

Search Strategy Four to six parallel queries with domain vocabulary

REMIT, EPEX, BNetzA-Festlegung, Direktvermarkter — the queries use the language the industry actually speaks.

Each run decomposes the week into four to six orthogonal sub-topics: regulatory changes, market prices, M&A, personnel changes, trade announcements, fuel/gas movements. The model is prompted with the energy-sector vocabulary it should use — REMIT, MaBiS, Redispatch 3.0, EPEX/EEX, Bilanzkreis, EDIFACT/MaKo — so the queries don't degrade into generic "energy news" searches. A hard cap of twelve search calls per run prevents the budget from running away on a single batch. Every query is logged with its hit count and provider.

Actor Mapping One news item, different consequences per stakeholder

A §14a-Festlegung means "tariff design" to a Stadtwerk, "balance-group hedging" to a Direktvermarkter, "metering rollout" to a Netzbetreiber.

One regulatory decision rarely lands the same way for every market participant. A BNetzA-Festlegung on §14a EnWG creates one action item for a Stadtwerk's tariff team (rate design for controllable devices), a different one for a Direktvermarkter (balance-group hedging on dimmable HT/NT loads), and a third for a Netzbetreiber (Smart-Meter-Gateway rollout). The agent is prompted to spell out, per article, what each stakeholder group actually needs to do — operationally, not abstractly. Today this is free-text per actor; structured per-stakeholder fields (one paragraph each, queryable from the API) are the next iteration.

Scoring & Filter From 60 raw candidates to four items with leverage

Three-stage funnel: LLM ranking → server-side date filter → quality-over-quantity rule.

The candidate pool from six search calls is typically 50-80 items. Three filtering stages narrow that to the final list: the model ranks for industry impact, dropping duplicates and irrelevant items; a server-side filter in Python enforces the strict 7-day publication window, dropping anything older with a logged reason (stale_date); and the quality-over-quantity rule allows fewer than ten items when the week was quiet — a short list beats a padded one. The audit log captures candidate count, distinct URLs, items the model submitted, items the date filter dropped, and the final persisted count.

What one real run looks like

The four-stage pipeline is not aspirational. The audit trail from a recent batch (kalenderwoche 21, 2026) shows exactly what happened inside.

The agent fired six search queries, in the language the industry actually uses:

Energiewirtschaft Deutschland News Mai 2026 BNetzA Festlegung
Strommarkt Bilanzkreis Redispatch Mai 2026
Direktvermarkter PV Wind Mai 2026 Regulierung
EPEX EEX Day-Ahead Strompreise Mai 2026 KW21
REMIT II Energiehandel Regulierung BNetzA Mai 2026
Gaspreise TTF Großhandel Mai 2026 Entwicklung

Sixty raw results came back from those six queries — fifty-four distinct URLs after de-duplication. The top-hit source domains, in order of frequency, were a clear regulatory + ministerial + trade-press mix:

bundeswirtschaftsministerium.de (4) · ad-hoc-news.de (4) · remit.bundesnetzagentur.de (3) · rp-online.de (3) · bloomberg.com (2) · transnetbw.de (2) · bdew.de (2) · uniper.energy (2)

From those 54 distinct URLs the model selected and submitted five articles ranked by industry impact. The server-side date filter then dropped one article with a stale_date (published earlier than the 7-day window), leaving four final items in the briefing for that week.

60 raw candidates → 54 distinct URLs → 5 model-submitted → 1 dropped (stale_date) → 4 final. Every number is queryable from the audit log.

Why this audit trail matters

The reason to log every stage is not technical hygiene. It is the difference between a serious B2B tool and a marketing demo. Three concrete benefits emerge once the audit data exists.

Verification for token holders. A company that receives a customised briefing can verify the underlying methodology: yes, the run looked at bnetza.de and transnetbw.de, not just Wikipedia. The quality claim is no longer rhetorical.

Quality monitoring for the operator. A run with zero BNetzA hits, fewer than four queries, or no domain diversity is a signal something went wrong — a stale prompt, a search-API outage, a model regression. Audit data turns silent failures into visible ones.

Methodology iteration. Looking at the top-domain distribution across many runs tells us where curated RSS feeds (Stage 1 roadmap) would deliver the most value: domains we want in the briefing more often, weighted against the open-web noise.

* * *

What this means for "agentic AI in production"

"Agentic AI" is currently a marketing label attached to anything that combines a language model with a tool call. That phrasing collapses two very different things into one: a chat interface that occasionally searches the web, versus a domain-shaped pipeline with deliberate scope, logging, and validation at every stage.

The Energy-Market Research Agent is the second kind. It is built for a specific domain (German-speaking energy industry), with a specific cadence (Monday 07:00 UTC, covering the previous seven days), a specific output shape (up to ten ranked items with per-actor actions), and a specific audit posture (every query, every domain, every drop reason logged for the operator).

The difference between an AI demo and an AI system that runs the business is not the model. It is whether every step of the pipeline can be inspected, verified, and improved on the basis of evidence.

For energy-market.qurix.tech, that means: the public briefing is free and visible at the URL. Selected partner organisations get a token that runs the same pipeline scoped to their specific firm — same audit posture, same logging, just narrowed to the news with direct impact on them. Token on request.

Live System

The agent described in this article runs in production every Monday. The public briefing is free.

Live Product · Free Energy-Market Research Agent — weekly briefing for the DACH energy industry energy-market.qurix.tech

Want a tailored briefing for your firm?

Selected partners get a company-scoped token — same pipeline, same audit trail, but narrowed to the news with direct impact on their business.

Request a token