Streaming ArchitectureApache KafkaApache PulsarData EngineeringEnterprise Architecture

Enterprise Streaming Architecture: Kafka, Pulsar, and Platform Strategy

Ash Ganda • August 14, 2025 • 14 min read

Introduction

Event streaming has evolved from niche technology to enterprise infrastructure standard. What began as a solution for log aggregation now powers real-time analytics, event-driven architectures, and the connective tissue between modern distributed systems.

For CTOs evaluating streaming investments, the decisions are consequential. Streaming platforms become deeply embedded in application architectures—the data they carry, the integrations they enable, and the operational patterns they require all create long-term dependencies.

This guide examines enterprise streaming strategy: when streaming delivers value, how to evaluate platform options, and what operational maturity is required for production success.

The Strategic Case for Streaming

Beyond Batch Processing

Traditional enterprise data architectures relied on batch processing:

Collect data throughout the day
Process overnight in batch jobs
Deliver insights the next morning

This pattern served for decades but creates inherent limitations:

Decisions based on stale data
Batch job failures create cascading delays
No support for real-time customer experiences
Integration complexity with point-to-point connections

Streaming inverts this model: process data as it arrives, deliver insights immediately, react to events in real-time.

Use Cases Driving Adoption

Real-Time Analytics

Dashboards and metrics with seconds of latency:

Operational monitoring and alerting
Customer behaviour analysis
Financial transaction monitoring
Supply chain visibility

Event-Driven Architecture

Decoupled services communicating through events:

Microservices integration
Domain event propagation
Saga orchestration
CQRS implementations

The Strategic Case for Streaming Infographic

Data Integration

Central nervous system for enterprise data:

Change data capture from databases
System-to-system integration
Data lake and warehouse population
Third-party data ingestion

Stream Processing

Continuous computation on flowing data:

Fraud detection with immediate response
Personalisation engines
IoT data processing
Machine learning feature computation

Business Impact Potential

Streaming investment typically delivers value through:

Speed to Insight

Reducing analytics latency from hours to seconds
Enabling proactive rather than reactive decisions
Supporting real-time customer experiences

Operational Efficiency

Eliminating batch job maintenance burden
Reducing integration complexity through standardisation
Enabling self-service data access patterns

New Capabilities

Real-time personalisation previously impossible
Event-driven automation opportunities
Competitive differentiation through responsiveness

Platform Landscape

Apache Kafka

The market leader, originally developed at LinkedIn and now stewarded by the Apache Software Foundation with commercial support from Confluent.

Architecture Characteristics

Kafka uses a distributed commit log architecture:

Partitioned topics for parallel processing
Consumer groups for scalable consumption
Replication for fault tolerance
Log compaction for state management

Strengths

Ecosystem maturity: Kafka Connect, Kafka Streams, KSQL, extensive third-party integrations
Performance: Proven at extreme scale (trillions of messages per day at leading tech companies)
Community: Largest community, most resources, widest talent pool
Stability: Battle-tested across thousands of production deployments

Considerations

Operational complexity: Requires expertise for production management
Storage architecture: Coupled storage and compute, less flexible for some patterns
Multi-tenancy: Requires careful configuration for isolation
Geo-replication: Possible but complex to configure correctly

Deployment Options

Self-managed on infrastructure
Confluent Cloud (fully managed)
Amazon MSK (managed Kafka on AWS)
Azure Event Hubs (Kafka-compatible)
Instaclustr, Aiven, and other managed providers

Apache Pulsar

Developed at Yahoo and contributed to Apache Foundation, with commercial support from StreamNative.

Architecture Characteristics

Pulsar separates storage and compute:

Apache BookKeeper for persistent storage
Stateless broker layer for serving
Native multi-tenancy with isolation
Built-in geo-replication

Strengths

Architecture flexibility: Separated storage enables independent scaling
Multi-tenancy: Built-in tenant and namespace isolation
Geo-replication: Native support for cross-region deployment
Unified messaging: Supports both streaming and queuing patterns
Tiered storage: Automatic offloading to object storage

Considerations

Platform Landscape Infographic

Ecosystem maturity: Smaller ecosystem than Kafka, though growing
Operational complexity: More components to manage (brokers + BookKeeper)
Community size: Smaller talent pool, fewer resources
Commercial support: Less extensive than Confluent offerings

Deployment Options

Self-managed on infrastructure
StreamNative Cloud (fully managed)
DataStax Astra Streaming
Limited cloud provider managed options

Amazon Kinesis

AWS-native streaming service, tightly integrated with AWS ecosystem.

Strengths

AWS integration: Native connections to Lambda, Firehose, Analytics
Operational simplicity: Serverless model reduces management
Scaling: Automatic scaling within limits
Cost model: Pay-per-use without infrastructure overhead

Considerations

AWS lock-in: Not portable to other environments
Capability limitations: Less flexible than Kafka for complex patterns
Scale constraints: Per-shard limits require careful design
Ecosystem: Limited to AWS services and partners

Best fit: AWS-committed organisations with moderate scale requirements.

Azure Event Hubs

Microsoft’s managed streaming service with Kafka compatibility.

Strengths

Azure integration: Native connections to Azure services
Kafka compatibility: Can migrate existing Kafka workloads
Enterprise features: Built-in security and compliance
Serverless tier: Consumption-based pricing option

Considerations

Azure lock-in: Primary value within Azure ecosystem
Kafka compatibility gaps: Not 100% feature parity
Performance variance: Shared infrastructure impacts

Best fit: Azure-committed enterprises seeking managed streaming.

Google Cloud Pub/Sub

Google’s managed messaging service for event-driven systems.

Strengths

Global scale: Google’s infrastructure backing
Simplicity: Minimal operational overhead
GCP integration: Native BigQuery, Dataflow connections
Serverless: True pay-per-message pricing

Considerations

Different semantics: Not Kafka-compatible, different API model
GCP lock-in: Limited value outside GCP
Ordering limitations: Requires careful design for ordered processing

Best fit: GCP-native workloads prioritising simplicity over flexibility.

Evaluation Framework

Requirements Assessment

Before platform selection, clarify requirements:

Volume and Velocity

Messages per second (average and peak)
Message size distribution
Growth projections over 3-5 years
Burst handling requirements

Latency Requirements

End-to-end latency tolerance
Processing latency budgets
Delivery guarantee needs (at-least-once, exactly-once)

Durability and Retention

Data retention requirements
Compliance and audit needs
Replay capability requirements
Recovery time objectives

Operational Context

Team expertise and preferences
Existing infrastructure investments
Multi-cloud or hybrid requirements
Security and compliance constraints

Evaluation Framework Infographic

Comparison Matrix

Factor	Kafka	Pulsar	Kinesis	Event Hubs
Ecosystem	Excellent	Good	AWS-focused	Azure-focused
Performance	Excellent	Excellent	Good	Good
Operations	Complex	Complex	Simple	Moderate
Multi-tenancy	Configuration	Native	Limited	Good
Geo-replication	Complex	Native	Limited	Built-in
Cost at Scale	Lower	Lower	Higher	Moderate
Portability	High	High	AWS-only	Azure-primary

Decision Framework

Choose Kafka when:

Ecosystem and community support are priorities
Team has or can develop Kafka expertise
Running in multiple cloud environments
Maximum third-party integration options needed
Confluent partnership provides value

Choose Pulsar when:

Multi-tenancy is primary requirement
Geo-replication is critical from day one
Tiered storage offers significant cost benefit
Starting fresh without Kafka investment
Queue semantics needed alongside streaming

Choose Managed Cloud Services when:

Minimising operational overhead is priority
Committed to single cloud provider
Scale requirements fit within service limits
Integration with cloud-native services dominates

Architecture Patterns

Event-Driven Microservices

Streaming platforms enable decoupled service communication:

Service A → Event Topic → Service B
                       → Service C
                       → Service D

Design Considerations

Event schema management and evolution
Consumer group coordination
Error handling and dead letter patterns
Event sourcing and replay capabilities

Best Practices

Define clear event contracts with schema registry
Design for consumer failure and retry
Implement idempotency in consumers
Plan for schema evolution from the start

Change Data Capture

Stream database changes for downstream consumption:

Database → CDC Connector → Streaming Platform → Consumers

Popular CDC Tools

Debezium (open source, Kafka Connect based)
AWS DMS for Kinesis
Fivetran, Airbyte for managed CDC

Considerations

Database performance impact
Schema change handling
Ordering guarantees
Initial snapshot management

Real-Time Analytics Pipeline

Feed analytical systems with streaming data:

Sources → Streaming Platform → Stream Processing → Analytics Store
                            → Data Lake/Warehouse
                            → Real-time Dashboards

Processing Options

Kafka Streams (embedded in applications)
Apache Flink (dedicated stream processor)
Spark Streaming (batch-oriented streaming)
Cloud-native options (Kinesis Analytics, Dataflow)

Hybrid Architecture

Most enterprises combine streaming with existing patterns:

Streaming for:

Real-time event propagation
Low-latency integration
Event sourcing

Batch for:

Large-scale transformations
Historical reprocessing
Cost-optimised analytics

The goal is appropriate tool selection, not dogmatic streaming everywhere.

Production Operations

Capacity Planning

Sizing Considerations

Broker sizing depends on:

Message throughput requirements
Message size distribution
Replication factor
Consumer parallelism needs
Storage retention period

Scaling Patterns

Kafka:

Add brokers and rebalance partitions
Increase partition count (one-way operation)
Separate clusters for isolation

Pulsar:

Scale brokers independently from storage
Add BookKeeper nodes for capacity
Leverage tiered storage for cost

Cost Optimisation

Right-size retention periods
Use tiered storage where available
Optimise message formats (compression, serialisation)
Monitor and eliminate unused topics

Monitoring and Observability

Key Metrics

Platform health:

Broker availability and leadership distribution
Replication lag and under-replicated partitions
Request latency (produce, consume)
Disk and memory utilisation

Consumer health:

Consumer lag by group and partition
Processing throughput
Error rates and retries
Rebalance frequency

Tooling

Prometheus/Grafana for metrics
Confluent Control Center or equivalent
Custom dashboards for business metrics
Alerting for operational thresholds

Security Implementation

Authentication

SASL mechanisms (SCRAM, OAUTHBEARER, Kerberos)
TLS client certificates
Cloud provider IAM integration

Authorisation

ACLs for topic-level access control
RBAC for administrative operations
Integration with enterprise identity systems

Encryption

TLS for data in transit
At-rest encryption for persistent storage
Key management integration

Compliance

Audit logging for access and operations
Data classification and handling
Retention policy enforcement
Cross-border data considerations

Disaster Recovery

Recovery Objectives

Define requirements:

RPO (Recovery Point Objective): How much data loss is acceptable?
RTO (Recovery Time Objective): How quickly must service restore?

Replication Strategies

Kafka:

MirrorMaker 2 for cross-cluster replication
Confluent Replicator for commercial option
Consider replication lag in RPO calculations

Pulsar:

Native geo-replication between clusters
Synchronous or asynchronous options
Built-in failover capabilities

Recovery Procedures

Documented failover processes
Regular DR testing
Automated failover where appropriate
Client reconfiguration strategies

Organisational Readiness

Team Capabilities

Successful streaming requires expertise:

Core Skills

Distributed systems fundamentals
Platform administration and tuning
Performance analysis and optimisation
Security configuration

Development Skills

Event-driven design patterns
Schema management
Consumer implementation best practices
Testing approaches for streaming

Build vs Buy

Consider managed services if:

Limited streaming expertise available
Operational simplicity prioritised
Scale fits within managed service limits
Budget allows for premium pricing

Invest in self-managed if:

Scale economics favour self-hosting
Customisation requirements exist
Multi-cloud deployment needed
Team expertise available or buildable

Governance Framework

Platform Standards

Establish and enforce:

Topic naming conventions
Schema requirements and registry usage
Retention and partitioning standards
Security baseline configurations

Change Management

Control changes that affect consumers:

Schema evolution policies
Breaking change communication
Topic lifecycle management
Consumer coordination

Cost Management

Track and allocate costs:

Chargeback by team or application
Usage monitoring and optimisation
Budget alerting and controls

Implementation Roadmap

Phase 1: Foundation (Months 1-3)

Infrastructure

Deploy platform in non-production
Establish monitoring and alerting
Configure security baseline
Document operational procedures

Enablement

Train core team on platform
Develop internal documentation
Create starter templates
Establish support model

Pilot

Select low-risk use case
Implement end-to-end
Validate operational readiness
Gather lessons learned

Phase 2: Expansion (Months 4-9)

Production Deployment

Promote to production environment
Implement disaster recovery
Establish SLAs and monitoring
Build operational runbooks

Use Case Expansion

Onboard additional applications
Develop patterns and libraries
Create self-service capabilities
Expand team expertise

Governance

Implement topic lifecycle management
Deploy schema registry
Establish change management
Create cost tracking

Phase 3: Optimisation (Months 10-12+)

Scale and Performance

Tune for production workloads
Optimise resource utilisation
Implement auto-scaling where possible
Address operational pain points

Advanced Capabilities

Stream processing implementations
Complex event processing
Real-time ML feature stores
Event sourcing patterns

Platform Maturity

Self-service onboarding
Internal platform as a product
Continuous improvement process
Advanced monitoring and automation

Strategic Recommendations

For Greenfield Implementations

Starting fresh offers flexibility:

Start with clear use cases rather than building infrastructure seeking applications
Choose managed services initially to reduce operational burden
Establish governance early before technical debt accumulates
Plan for growth but don’t over-engineer for hypothetical scale

For Migration from Existing Systems

Moving from legacy integration:

Map current integration patterns to understand full scope
Prioritise by value and complexity for migration sequencing
Run parallel systems during transition with careful cutover
Preserve optionality for rollback during migration

Platform Strategy

For enterprise-wide streaming:

Standardise on primary platform to consolidate expertise
Allow exceptions with justification for specific requirements
Invest in platform team to support adoption
Treat platform as product with roadmap and stakeholder management

Conclusion

Streaming platforms represent fundamental infrastructure for modern enterprises. The shift from batch to real-time, from point-to-point to event-driven, from rigid to responsive—streaming enables all of these transformations.

The platform selection decision should match organisational context: technical requirements, team capabilities, existing investments, and strategic direction. Kafka remains the safe choice with maximum ecosystem support. Pulsar offers architectural advantages for specific requirements. Managed services trade flexibility for operational simplicity.

Whatever platform is selected, success depends on operational maturity, clear governance, and sustained investment in team capabilities. The technology is proven—the challenge is organisational readiness to leverage it effectively.

Sources

Confluent. (2025). State of Data Streaming Report. Confluent Research.
Apache Software Foundation. (2025). Apache Kafka Documentation. https://kafka.apache.org/documentation/
Apache Software Foundation. (2025). Apache Pulsar Documentation. https://pulsar.apache.org/docs/
Narkhede, N., Shapira, G., & Palino, T. (2023). Kafka: The Definitive Guide, 2nd Edition. O’Reilly Media.

Strategic guidance for technology leaders building real-time data infrastructure.

For the practitioner’s view on cloud, IT, and cybersecurity, explore Cloud Geeks — where strategy meets infrastructure.

My consultancy Ganda Tech Services operates three specialist divisions covering cloud infrastructure, web development, and mobile apps for Australian businesses.

About the Author

Ashish Ganda is the founder of Ganda Tech Services, a Sydney-based technology consultancy specialising in cloud infrastructure, web development, and mobile app solutions for Australian businesses.

Free Roadmap · 2026

Digital Transformation Roadmap 2026

A 12-month framework for Australian SMBs ready to modernise — phases, tools, and milestones.

Introduction

The Strategic Case for Streaming

Beyond Batch Processing

Use Cases Driving Adoption

Business Impact Potential

Platform Landscape

Apache Kafka

Apache Pulsar

Amazon Kinesis

Azure Event Hubs

Google Cloud Pub/Sub

Evaluation Framework

Requirements Assessment

Comparison Matrix

Decision Framework

Architecture Patterns

Event-Driven Microservices

Change Data Capture

Real-Time Analytics Pipeline

Hybrid Architecture

Production Operations

Capacity Planning

Monitoring and Observability

Security Implementation

Disaster Recovery

Organisational Readiness

Team Capabilities

Governance Framework

Implementation Roadmap

Phase 1: Foundation (Months 1-3)

Phase 2: Expansion (Months 4-9)

Phase 3: Optimisation (Months 10-12+)

Strategic Recommendations

For Greenfield Implementations

For Migration from Existing Systems

Platform Strategy

Conclusion

Sources

Digital Transformation Roadmap 2026

Related Posts