# NATS EventBus Integration Guide ## Overview This document provides a comprehensive guide for integrating NATS as an EventBus in Coze Studio, including architecture design, implementation details, configuration instructions, and usage guidelines. ## Integration Background ### Why Choose NATS? In Coze Studio's architecture, EventBus plays a critical role in asynchronous message delivery, including workflow execution, Agent communication, data processing pipelines, and other core functions. NATS, as a lightweight and high-performance messaging system, brings the following core advantages to Coze Studio: 1. **Lightweight**: NATS has minimal resource footprint and simple deployment architecture, perfect for cloud-native environments 2. **High Performance**: Provides low-latency, high-throughput messaging that can support Coze Studio's large-scale concurrent Agent execution 3. **Simplicity**: Clean and intuitive API that reduces development and maintenance costs 4. **JetStream Support**: Provides message persistence, replay, and stream processing capabilities through JetStream 5. **Cloud Native**: Native support for Kubernetes, easy to deploy and manage in containerized environments 6. **Security**: Built-in authentication and authorization mechanisms with TLS encryption support ### Comparison with Other MQ Systems | Feature | NATS | NSQ | Kafka | RocketMQ | Pulsar | | ---------------------- | -------------- | -------------- | -------------- | -------------- | -------------- | | **Deployment Complexity** | Very Low | Low | Medium | Medium | Medium | | **Performance** | Very High | Medium | High | High | High | | **Resource Usage** | Very Low | Low | Medium | Medium | Medium | | **Message Persistence** | JetStream | Limited | Strong | Strong | Strong | | **Message Ordering** | Supported | Weak | Strong | Strong | Strong | | **Horizontal Scaling** | Good | Medium | Good | Good | Excellent | | **Operational Complexity** | Very Low | Low | High | Medium | Medium | | **Cloud Native Support** | Excellent | Medium | Medium | Medium | Good | #### NATS Core Advantages **Lightweight and High Performance**: - **Memory Usage**: NATS server typically requires only tens of MB to handle millions of messages - **Startup Speed**: Second-level startup, perfect for microservices and containerized deployments - **Latency**: Sub-millisecond message latency, suitable for real-time scenarios - **Throughput**: Single node can handle millions of messages per second **Simplicity**: - **Simple Configuration**: Minimal configuration required to run, no complex cluster setup needed - **Clean API**: Publish/subscribe pattern is simple and intuitive with low learning curve - **Operations Friendly**: Rich monitoring and debugging tools, easy troubleshooting **Cloud Native Features**: - **Kubernetes Integration**: Official Helm Charts and Operators available - **Service Discovery**: Built-in service discovery mechanism, no external dependencies - **Elastic Scaling**: Supports dynamic cluster membership changes ## Architecture Design ### Overall Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Coze Studio │ │ NATS Server │ │ JetStream │ │ Application │ │ │ │ Storage │ ├─────────────────┤ ├─────────────────┤ ├─────────────────┤ │ Producer │───▶│ Core NATS │ │ Streams │ │ Consumer │◀───│ JetStream │◀───│ Consumers │ │ EventBus │ │ Clustering │ │ Key-Value │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` ### Message Flow Patterns NATS supports two messaging modes in Coze Studio: 1. **Core NATS**: For real-time, lightweight message delivery - Publish/Subscribe pattern - Request/Response pattern - Queue Group pattern 2. **JetStream**: For messages requiring persistence and high reliability - Stream storage - Message replay - Consumer acknowledgment mechanism ## Implementation Details ### Producer Implementation The Producer is responsible for sending messages to NATS, supporting the following features: ```go type Producer struct { nc nats.Conn js nats.JetStreamContext closed bool mu sync.RWMutex } func (p *Producer) SendMessage(ctx context.Context, topic string, message []byte) error { // Supports both Core NATS and JetStream modes if p.js != nil { // JetStream mode: supports message persistence _, err := p.js.Publish(topic, message) return err } else { // Core NATS mode: lightweight publishing return p.nc.Publish(topic, message) } } ``` ### Consumer Implementation The Consumer is responsible for receiving and processing messages from NATS: ```go func (c *Consumer) RegisterConsumer(serverURL, topic, group string, handler ConsumerHandler) error { // Choose JetStream or Core NATS based on configuration if c.useJetStream { return c.startJetStreamConsumer(ctx, topic, group, handler) } else { return c.startCoreConsumer(ctx, topic, group, handler) } } ``` #### JetStream Consumer Features - **Message Acknowledgment**: Supports manual acknowledgment mechanism to ensure successful message processing - **Retry Mechanism**: Automatic retry for failed messages with exponential backoff support - **Sequential Processing**: Single message processing to avoid complexity from batch processing - **Flow Control**: Precise message flow control to prevent consumer overload #### Core NATS Consumer Features - **Queue Groups**: Supports load-balanced message distribution - **Lightweight**: No persistence overhead, suitable for real-time message processing - **High Performance**: Extremely low message processing latency ## Configuration Guide ### Environment Variables Add the following NATS-related configurations in `docker/.env.example`: ```bash # Backend Event Bus export COZE_MQ_TYPE="nats" # Set message queue type to NATS export MQ_NAME_SERVER="nats:4222" # NATS server address # NATS specific configuration # NATS_SERVER_URL: NATS server connection URL, supports nats:// and tls:// protocols # For cluster setup, use comma-separated URLs: "nats://nats1:4222,nats://nats2:4222" # For TLS connection: "tls://nats:4222" export NATS_SERVER_URL="nats://nats:4222" # NATS_JWT_TOKEN: JWT token for NATS authentication (leave empty for no auth) export NATS_JWT_TOKEN="" # NATS_NKEY_SEED: Path to NATS seed file for NKey authentication (optional) export NATS_NKEY_SEED="" # NATS_USERNAME: Username for NATS authentication (optional) export NATS_USERNAME="" # NATS_PASSWORD: Password for NATS authentication (optional) export NATS_PASSWORD="" # NATS_TOKEN: Token for NATS authentication (optional) export NATS_TOKEN="" # NATS_STREAM_REPLICAS: Number of replicas for JetStream streams (default: 1) export NATS_STREAM_REPLICAS="1" # NATS_USE_JETSTREAM: Enable JetStream mode for message persistence and reliability (default: false) export NATS_USE_JETSTREAM="true" ``` ### Docker Compose Configuration NATS service configuration in `docker-compose.yml`: ```yaml nats: image: nats:2.10.24-alpine container_name: nats restart: unless-stopped command: - "--jetstream" # Enable JetStream - "--store_dir=/data" # Data storage directory - "--max_memory_store=1GB" # Memory storage limit - "--max_file_store=10GB" # File storage limit ports: - "4222:4222" # Client connection port - "8222:8222" # HTTP monitoring port - "6222:6222" # Cluster communication port volumes: - ./volumes/nats:/data networks: - coze-network healthcheck: test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8222/"] interval: 30s timeout: 10s retries: 3 start_period: 40s ``` ### Application Configuration Configure NATS in Coze Studio application through environment variables: ```go // Read configuration from environment variables mqType := os.Getenv("COZE_MQ_TYPE") natsURL := os.Getenv("NATS_SERVER_URL") jwtToken := os.Getenv("NATS_JWT_TOKEN") seedFile := os.Getenv("NATS_NKEY_SEED") streamReplicas := os.Getenv("NATS_STREAM_REPLICAS") // Create NATS EventBus if mqType == "nats" { config := &nats.Config{ ServerURL: natsURL, JWTToken: jwtToken, SeedFile: seedFile, StreamReplicas: streamReplicas, } eventBus, err := nats.NewProducer(config) if err != nil { log.Fatal("Failed to create NATS producer:", err) } } ``` ## Deployment Guide ### Docker Deployment 1. **Configure Environment Variables**: ```bash cp docker/.env.example docker/.env # Edit .env file, set COZE_MQ_TYPE="nats" ``` 2. **Start Services**: ```bash cd docker docker-compose up -d nats ``` 3. **Verify Deployment**: ```bash # Check NATS service status docker-compose ps nats # View NATS monitoring interface curl http://localhost:8222/varz ``` ### Kubernetes Deployment Deploy NATS using the official Helm Chart: ```bash # Add NATS Helm repository helm repo add nats https://nats-io.github.io/k8s/helm/charts/ # Install NATS helm install nats nats/nats --set nats.jetstream.enabled=true ``` ### Production Environment Configuration For production environments, the following configuration optimizations are recommended: 1. **Cluster Deployment**: ```yaml nats: cluster: enabled: true replicas: 3 ``` 2. **Persistent Storage**: ```yaml nats: jetstream: fileStore: pvc: size: 100Gi storageClassName: fast-ssd ``` 3. **Resource Limits**: ```yaml nats: resources: limits: cpu: 2000m memory: 4Gi requests: cpu: 500m memory: 1Gi ``` 4. **Security Configuration**: ```yaml nats: auth: enabled: true token: "your-secure-token" tls: enabled: true ``` ## Monitoring and Operations ### Monitoring Metrics NATS provides rich monitoring metrics accessible through HTTP endpoints: - **Server Information**: `GET /varz` - **Connection Information**: `GET /connz` - **Subscription Information**: `GET /subsz` - **JetStream Information**: `GET /jsz` ### Key Monitoring Metrics 1. **Performance Metrics**: - Message throughput (messages/sec) - Message latency (latency) - Connection count (connections) 2. **Resource Metrics**: - Memory usage (memory usage) - CPU utilization (cpu usage) - Disk usage (disk usage) 3. **JetStream Metrics**: - Stream count (streams) - Consumer count (consumers) - Storage usage (storage usage) ### Log Management NATS supports multiple log levels and output formats: ```bash # Enable debug logging nats-server --debug # Log output to file nats-server --log /var/log/nats.log # JSON format logging nats-server --logtime --log_size_limit 100MB ``` ## Performance Optimization ### Connection Pool Optimization ```go // Configure connection options opts := []nats.Option{ nats.MaxReconnects(10), nats.ReconnectWait(2 * time.Second), nats.Timeout(5 * time.Second), } nc, err := nats.Connect(serverURL, opts...) ``` ### JetStream Optimization ```go // Configure JetStream options jsOpts := []nats.JSOpt{ nats.PublishAsyncMaxPending(1000), nats.PublishAsyncErrHandler(func(js nats.JetStream, originalMsg *nats.Msg, err error) { log.Printf("Async publish error: %v", err) }), } js, err := nc.JetStream(jsOpts...) ``` ### Consumer Optimization ```go // Configure consumer options consumerOpts := []nats.SubOpt{ nats.Durable("coze-consumer"), nats.MaxDeliver(3), nats.AckWait(30 * time.Second), nats.MaxAckPending(100), } sub, err := js.PullSubscribe(topic, "coze-group", consumerOpts...) ``` ## Troubleshooting ### Common Issues 1. **Connection Failures**: - Check if NATS service is running - Verify network connectivity - Confirm port configuration is correct 2. **Message Loss**: - Check if JetStream is enabled - Verify message acknowledgment mechanism - Review error logs 3. **Performance Issues**: - Monitor resource usage - Check for message backlog - Optimize consumer configuration ### Debugging Tools NATS provides rich debugging tools: ```bash # NATS CLI tools nats server info nats stream list nats consumer list # Monitor message flow nats sub "coze.>" nats pub "coze.test" "hello world" ``` ## Best Practices ### Subject Naming Conventions Recommend using hierarchical subject naming: ``` coze.workflow.{workflow_id}.{event_type} coze.agent.{agent_id}.{action} coze.knowledge.{kb_id}.{operation} ``` ### Error Handling Implement comprehensive error handling mechanisms: ```go func (c *Consumer) handleMessage(msg *nats.Msg) { defer func() { if r := recover(); r != nil { log.Printf("Message processing panic: %v", r) msg.Nak() // Reject message, trigger retry } }() if err := c.processMessage(msg.Data); err != nil { log.Printf("Message processing error: %v", err) msg.Nak() return } msg.Ack() // Acknowledge successful message processing } ``` ### Resource Management Properly manage NATS connections and resources: ```go func (p *Producer) Close() error { p.mu.Lock() defer p.mu.Unlock() if p.closed { return nil } p.closed = true if p.nc != nil { p.nc.Close() } return nil } ``` ## Summary NATS as Coze Studio's EventBus solution provides lightweight, high-performance, and easy-to-deploy messaging capabilities. Through JetStream extensions, NATS can also provide enterprise-grade message persistence and stream processing functionality. Key advantages of choosing NATS: - **Simplicity**: Low deployment and maintenance costs - **Performance**: Extremely high message processing performance - **Cloud Native**: Perfect fit for containerized and Kubernetes environments - **Reliability**: JetStream provides message persistence and acknowledgment mechanisms - **Scalability**: Supports cluster deployment and horizontal scaling NATS is particularly suitable for the following scenarios: - Inter-service communication in microservice architectures - Real-time data stream processing - Message delivery for cloud-native applications - Low-latency messaging systems - Resource-constrained deployment environments