Bedrock Structured Outputs: From Begging to Contract

Table of Contents

  1. The System: AWS News Agent 🗞️
  2. The Problem: Asking vs. Guaranteeing
  3. Bedrock Structured Outputs: What It Is and How It Works 🔧
    1. Supported Models
  4. The Migration: Three Transformations
    1. 1. Relevance Analysis — The Most Dramatic Change
    2. 2. Social Post Generation — From 2 API Calls to 1
    3. 3. Newsletter — New Functionality without Extra Code
  5. Results: Before and After
  6. Practical Considerations
  7. Conclusion
  8. Resources 📚

I had a working system. An agent that processed the AWS RSS feed several times a day, filtered relevant news with Claude, and generated posts for LinkedIn and X. I’d built it, deployed it, and monitored its logs with some satisfaction.

And yet, there was something I disliked every time I opened the code: three methods whose sole purpose was to distrust the LLM.

_extract_json_from_text. _validate_analysis_structure. _create_fallback_analysis.

Together they added up to more than 130 lines. All that code existed to handle a single possibility: that the model would respond with something different from what I’d asked for. That it would include an apology before the JSON. That it would forget a field. That it would format the output incorrectly.

When Amazon announced Bedrock Structured Outputs, I immediately understood what I’d been doing wrong. It wasn’t a prompting problem. It was an architecture problem: I had been asking the model to be consistent, when what I needed was to guarantee it.

The System: AWS News Agent 🗞️

Before diving in, some context on the system. The agent processes the AWS RSS feed several times a day with three responsibilities:

  1. Analyze relevance of each item (0-10 score and metadata for the technical audience)
  2. Generate social content — a LinkedIn post and an X post per relevant item
  3. Compose the weekly newsletter, including email subject and preview text

Everything runs on Lambda, uses DynamoDB for state, and Bedrock with the converse API for Claude interactions.

The system worked. The problem was the amount of defensive code needed to trust its outputs.

The Problem: Asking vs. Guaranteeing

The content_analyzer.py had this system prompt:

# Antes — instrucciones de formato en lenguaje natural
system_prompts = [{
    "text": "Eres un experto analista de noticias de AWS...\n\n"
            "FORMATO DE RESPUESTA OBLIGATORIO:\n"
            "Debes responder ÚNICAMENTE con un objeto JSON válido. "
            "No incluyas explicaciones, comentarios o texto adicional.\n\n"
            "ESTRUCTURA JSON REQUERIDA:\n"
            "{\n"
            "  \"relevance\": 7,\n"
            "  \"analysis\": {\n"
            "    \"article\": true,\n"
            "    \"keyPoints\": [\"Punto clave 1\", \"Punto clave 2\"],\n"
            "    \"emojis\": [\"🚀\", \"☁️\"],\n"
            "    \"relevance\": 7\n"
            "  }\n"
            "}\n\n"
            "IMPORTANTE: Responde SOLO con el JSON. "
            "No agregues texto antes o después."
}]

That block is a natural language sentence begging the model to be consistent. The model usually was. But “usually” isn’t enough for production.

The direct consequence was this code:

# Antes — extracción defensiva de JSON
output_message = response['output']['message']['content'][0]['text']

# ¿El modelo puso texto antes del JSON? A buscar manualmente.
if not cleaned_output.startswith('{'):
    start_idx = cleaned_output.find('{')
    end_idx = cleaned_output.rfind('}')
    if start_idx != -1 and end_idx != -1:
        cleaned_output = cleaned_output[start_idx:end_idx+1]
    else:
        # Sin JSON → reintento
        continue

# ¿El JSON es parseable?
try:
    analysis = json.loads(cleaned_output)
    # ¿Tiene todos los campos?
    if self._validate_analysis_structure(analysis, news['news_id']):
        return analysis
    else:
        continue  # reintento
except json.JSONDecodeError:
    continue  # reintento

# Todos los intentos fallaron → fallback por palabras clave
return self._create_fallback_analysis(news)

And on top of that, _validate_analysis_structure (45 lines) checking types and fields, and _create_fallback_analysis (65 lines) doing keyword-based analysis when the model failed.

In total: ~130 lines of code whose sole function was managing model inconsistency.

Bedrock Structured Outputs: What It Is and How It Works 🔧

Bedrock Structured Outputs is a feature that guarantees the model’s response will be valid JSON that exactly matches a schema you define (JSON Schema Draft 2020-12).

The important word is guarantees. Not “the model will try”. Not “usually produces”. Guarantees.

The implementation is an additional parameter in the converse API request:

response = self.bedrock.converse(
    modelId=Config.BEDROCK_MODEL_ID,
    messages=messages,
    system=system_prompts,
    inferenceConfig=inference_config,
    outputConfig={                          # ← este es el cambio
        'textFormat': {
            'type': 'json_schema',
            'structure': {
                'jsonSchema': {
                    'schema': json.dumps(MY_SCHEMA),   # schema serializado
                    'name': 'schema_name',
                    'description': 'Descripción del schema'
                }
            }
        }
    }
)

Bedrock compiles the schema into a grammar and guarantees that the response meets the contract — this isn’t post-generation validation, it’s compliance during generation.

🧠 How it works internally: Bedrock validates the schema against JSON Schema Draft 2020-12, compiles a grammar (may take a few minutes the first time), and caches it for 24 hours encrypted with AWS-managed keys. Subsequent requests with the same schema have latency comparable to standard calls.

Supported Models

An important point that took me a while to discover: Amazon Nova does not support Structured Outputs.

Compatible models as of March 2026 include:

  • Anthropic: Claude Haiku 4.5, Sonnet 4.5, Opus 4.5, Opus 4.6
  • Qwen: Qwen3 series (235B, 32B, Coder)
  • DeepSeek: DeepSeek-V3.1
  • Google: Gemma 3 (12B, 27B)
  • Mistral AI: Mistral Large 3, Magistral Small
  • NVIDIA: Nemotron Nano series

Not supported: Amazon Nova (all versions), Amazon Titan.

My system used amazon.nova-2-lite for relevance analysis — originally chosen for cost. I had to migrate to Claude Haiku 4.5 to use the feature. In practice, Haiku 4.5 cost is comparable, and analysis quality improved.

The Migration: Three Transformations

1. Relevance Analysis — The Most Dramatic Change

The schema defines exactly what structure the model should return:

# Schema de análisis — definido una vez, a nivel de módulo
_ANALYSIS_SCHEMA = {
    "type": "object",
    "properties": {
        "relevance": {"type": "integer"},
        "analysis": {
            "type": "object",
            "properties": {
                "article":   {"type": "boolean"},
                "keyPoints": {"type": "array", "items": {"type": "string"}},
                "emojis":    {"type": "array", "items": {"type": "string"}},
                "relevance": {"type": "integer"}
            },
            "required": ["article", "keyPoints", "emojis", "relevance"],
            "additionalProperties": False  # ← ningún campo extra posible
        }
    },
    "required": ["relevance", "analysis"],
    "additionalProperties": False
}

With the schema defined, the analysis method simplifies radically:

# Después — sin parsing defensivo, sin fallbacks, sin validación manual
def _analyze_single_news_with_retry(self, news, system_prompts, inference_config, max_retries=3):
    for attempt in range(max_retries):
        try:
            messages = [{
                "role": "user",
                "content": [{"text": f"Título: {news['title']}\nDescripción: {news['description']}"}]
            }]

            response = self.bedrock.converse(
                modelId=Config.BEDROCK_MODEL_ID,
                messages=messages,
                system=system_prompts,
                inferenceConfig=inference_config,
                outputConfig={
                    'textFormat': {
                        'type': 'json_schema',
                        'structure': {
                            'jsonSchema': {
                                'schema': json.dumps(_ANALYSIS_SCHEMA),
                                'name': 'news_analysis',
                                'description': 'Análisis de relevancia de noticia AWS'
                            }
                        }
                    }
                }
            )

            output_message = response['output']['message']['content'][0]['text']

            if not output_message or not output_message.strip():
                continue

            # json.loads nunca lanza JSONDecodeError aquí — el schema lo garantiza
            return json.loads(output_message)

        except Exception as e:
            # Solo errores de red o servicio, no de parsing
            logger.error(f"Error en intento {attempt + 1}: {str(e)}")
            if attempt < max_retries - 1:
                continue

    return None  # Ya no hay fallback por palabras clave — si Bedrock falla, la noticia se omite

The result: from ~90 lines to ~30. And the system prompt also changes — it no longer needs format instructions:

# Después — solo criterios de negocio, sin instrucciones de formato JSON
system_prompts = [{
    "text": "Eres un experto analista de noticias de AWS...\n\n"
            "CRITERIOS DE RELEVANCIA (escala 0-10):\n"
            "• 9-10: Bedrock, GenAI, servicios de IA, serverless core\n"
            "• 7-8: RDS, Aurora, bases de datos, servicios de datos\n"
            "...\n\n"
            "CAMPOS A COMPLETAR:\n"
            "• relevance: número entero del 0-10\n"
            "• analysis.keyPoints: array de 2-3 strings con puntos clave\n"
            "• analysis.emojis: array de 2-3 emojis relevantes"
            # ← Sin mencionar JSON. Sin ejemplos de estructura.
            # El schema en outputConfig ya define el contrato.
}]

This change felt elegant: the prompt talks about business logic, the schema talks about structure. Everything in its place.

2. Social Post Generation — From 2 API Calls to 1

Before, the system generated the LinkedIn post and the X post in separate calls. The reason: without structured outputs, mixing two outputs in a single request increased the probability that the model would “get lost” in the format.

With structured outputs, that disappears:

# Schema para generación simultánea de ambos posts
_SOCIAL_CONTENT_SCHEMA = {
    "type": "object",
    "properties": {
        "linkedin_post": {"type": "string"},
        "X_post":  {"type": "string"}
    },
    "required": ["linkedin_post", "X_post"],
    "additionalProperties": False
}
def _generate_social_posts(self, news: Dict) -> Dict:
    """Una sola llamada genera LinkedIn + X garantizados."""
    # ... construcción del prompt con contexto de la noticia ...
    
    response_text = self._invoke_bedrock(prompt, output_schema=_SOCIAL_CONTENT_SCHEMA)
    return json.loads(response_text)
    # → {"linkedin_post": "...", "X_post": "..."}

The pattern that makes this work cleanly is a _invoke_bedrock with optional schema:

def _invoke_bedrock(self, prompt: str, output_schema: dict = None) -> str:
    """Invoca Bedrock. Con output_schema activa Structured Outputs."""
    converse_kwargs = {
        'modelId': Config.SOCIAL_BEDROCK_MODEL_ID,
        'messages': [{"role": "user", "content": [{"text": prompt}]}],
        'inferenceConfig': {"temperature": 0.7, "maxTokens": 2000}
    }

    if output_schema:
        converse_kwargs['outputConfig'] = {
            'textFormat': {
                'type': 'json_schema',
                'structure': {
                    'jsonSchema': {
                        'schema': json.dumps(output_schema),
                        'name': 'structured_output',
                        'description': 'Salida estructurada garantizada por Bedrock'
                    }
                }
            }
        }

    response = self.bedrock.converse(**converse_kwargs)
    return response['output']['message']['content'][0]['text']

When output_schema=None, the behavior is identical to before — useful for cases where the output is free text (like HTML generation for the newsletter).

Cost impact: with ~90 executions/month and ~10 relevant items per execution, I went from ~900 to ~450 monthly calls for social content generation. Half.

3. Newsletter — New Functionality without Extra Code

The newsletter_generator.py had a _generate_subject method that returned the email subject as a string. Fine.

But there was a field I had never implemented: the preview text, those 80-100 characters that Gmail, Outlook, and Apple Mail show below the subject before opening the email. A wasted engagement opportunity.

Adding preview text before would have required: a second Bedrock call, or more complex instructions in the prompt with the risk that the model would mix up the two fields.

With structured outputs it was straightforward:

_SUBJECT_SCHEMA = {
    "type": "object",
    "properties": {
        "subject":      {"type": "string"},  # max 60 chars
        "preview_text": {"type": "string"}   # 80-100 chars, complementa el subject
    },
    "required": ["subject", "preview_text"],
    "additionalProperties": False
}

One call, two guaranteed fields. The newsletter now automatically includes preview_text — and the next step is passing it to Mailchimp when creating the campaign so it appears in subscribers’ email clients.

Results: Before and After

  Before After
Lines of defensive parsing ~130 1 (json.loads)
Bedrock calls per item 2 1
JSONDecodeError possible Yes Impossible
Methods eliminated _extract_json_from_text, _validate_analysis_structure, _create_fallback_analysis
Preview text in newsletter Didn’t exist Auto-generated
Analysis model Nova 2 Lite Claude Haiku 4.5

The most important change doesn’t appear in that table: the mental model with which I write prompts changed. I no longer need to think about how to instruct the model to be consistent. I define the contract in code — JSON Schema — and the prompt can focus exclusively on business behavior.

Practical Considerations

The schema doesn’t replace the prompt, it complements it. The schema guarantees structure; the prompt defines behavior. If the schema has "relevance": {"type": "integer"} but the prompt doesn’t explain what scale to use, the model will invent one. Both pieces are necessary.

additionalProperties: False is important. Without it, the model can add extra fields you didn’t expect. With it, the contract is exact in both directions.

Incompatibility with Anthropic Citations. If you use the Anthropic citations feature (for referencing document fragments), you can’t combine it with Structured Outputs in the same request. Choose one or the other based on use case.

Invalid schema → immediate HTTP 400. If the schema has syntax errors, Bedrock returns an error on the call, not during generation. Useful for catching problems early.

24-hour cache. Bedrock caches the compiled grammar for each schema for 24 hours (encrypted with AWS-managed keys). The first time you use a schema it may take a few extra seconds. Subsequent requests are immediate.

Conclusion

There’s a fundamental difference between asking an LLM to be consistent and guaranteeing that it will be. For months I wrote increasingly detailed prompts, with structural examples, with warnings in all-caps. And I built defensive code to handle the cases where the model decided not to follow them.

Bedrock Structured Outputs solves that problem at the right layer. The schema lives in code, is versioned with code, and is validated as code. The prompt can speak about business logic. And the defensive parsing disappears because it no longer has any reason to exist.

The next time you open a file in a system that calls Bedrock, ask yourself: how many lines of this code exist solely to distrust the model? If the answer is more than ten, you know what to do.


Resources 📚

Exploring GenAI architectures on AWS?

I can help you design solutions with Bedrock Agents, Guardrails, and production-ready AI pipelines.

Schedule a call →
Written by

Gerardo Arroyo Arce

Solutions Architect, AWS Golden Jacket with a passion for sharing knowledge. As an active AWS Community Builders member, former AWS Ambassador, and AWS User Group Leader, I dedicate myself to building bridges between technology and people. A Java developer at heart and independent consultant, I take cloud architecture beyond theory through international conferences and real-world solutions. My insatiable curiosity for learning and sharing keeps me in constant evolution alongside the tech community.

Start the conversation