Troubleshooting Guide: RabbitMQ Connectivity Issues in R2R

RabbitMQ is a critical component in the R2R architecture, used for message queuing and task orchestration. Connectivity issues can disrupt the entire system. This guide will help you diagnose and resolve common RabbitMQ connectivity problems.

1. Verify RabbitMQ Service Status

First, ensure that the RabbitMQ service is running:

docker ps | grep rabbitmq

If you don’t see the RabbitMQ container running, start it:

docker-compose up -d hatchet-rabbitmq

2. Check RabbitMQ Logs

View the RabbitMQ container logs:

docker logs r2r-hatchet-rabbitmq-1

Look for error messages related to connectivity, authentication, or resource issues.

3. Verify RabbitMQ Connection Settings

Ensure that the connection settings in your R2R configuration match the RabbitMQ service:

  1. Check the SERVER_TASKQUEUE_RABBITMQ_URL environment variable in the hatchet-setup-config service.
  2. Verify that the URL format is correct: amqp://user:password@hatchet-rabbitmq:5672/

4. Common Issues and Solutions

4.1 Authentication Failures

Symptom: Logs show authentication errors.

Solution:

  1. Verify the RabbitMQ credentials:
    docker exec r2r-hatchet-rabbitmq-1 rabbitmqctl list_users
    
  2. If necessary, reset the password:
    docker exec r2r-hatchet-rabbitmq-1 rabbitmqctl change_password user newpassword
    
  3. Update the SERVER_TASKQUEUE_RABBITMQ_URL in your R2R configuration with the new credentials.

4.2 Network Connectivity

Symptom: Services can’t connect to RabbitMQ.

Solution:

  1. Ensure all services are on the same Docker network:
    docker network inspect r2r-network
    
  2. Verify that the RabbitMQ service is accessible within the network:
    docker run --rm --network r2r-network alpine ping hatchet-rabbitmq
    

4.3 Port Conflicts

Symptom: RabbitMQ fails to start due to port conflicts.

Solution:

  1. Check if the ports are already in use:
    sudo lsof -i :5672
    sudo lsof -i :15672
    
  2. Modify the port mappings in your Docker Compose file if necessary.

4.4 Resource Constraints

Symptom: RabbitMQ becomes unresponsive or crashes frequently.

Solution:

  1. Check RabbitMQ resource usage:
    docker stats r2r-hatchet-rabbitmq-1
    
  2. Increase resources allocated to the RabbitMQ container in your Docker Compose file:
    hatchet-rabbitmq:
      # ... other configurations ...
      deploy:
        resources:
          limits:
            cpus: '1'
            memory: 1G
    

4.5 File Descriptor Limits

Symptom: RabbitMQ logs show warnings about file descriptor limits.

Solution:

  1. Increase the file descriptor limit for the RabbitMQ container:
    hatchet-rabbitmq:
      # ... other configurations ...
      ulimits:
        nofile:
          soft: 65536
          hard: 65536
    

5. Advanced Troubleshooting

5.1 RabbitMQ Management Interface

Access the RabbitMQ Management Interface for detailed diagnostics:

  1. Enable the management plugin if not already enabled:
    docker exec r2r-hatchet-rabbitmq-1 rabbitmq-plugins enable rabbitmq_management
    
  2. Access the interface at http://localhost:15672 (use the credentials defined in your Docker Compose file).

5.2 Network Packet Capture

If you suspect network issues, capture and analyze network traffic:

docker run --net=container:r2r-hatchet-rabbitmq-1 --rm -v $(pwd):/cap nicolaka/netshoot tcpdump -i eth0 -w /cap/rabbitmq_traffic.pcap

Analyze the captured file with Wireshark for detailed network diagnostics.

5.3 RabbitMQ Cluster Status

If you’re running a RabbitMQ cluster, check its status:

docker exec r2r-hatchet-rabbitmq-1 rabbitmqctl cluster_status

6. Preventive Measures

  1. Implement health checks in your Docker Compose file:

    hatchet-rabbitmq:
      # ... other configurations ...
      healthcheck:
        test: ["CMD", "rabbitmqctl", "status"]
        interval: 30s
        timeout: 10s
        retries: 5
    
  2. Set up monitoring and alerting for RabbitMQ using tools like Prometheus and Grafana.

  3. Regularly backup RabbitMQ definitions and data:

    docker exec r2r-hatchet-rabbitmq-1 rabbitmqctl export_definitions /tmp/rabbitmq_defs.json
    docker cp r2r-hatchet-rabbitmq-1:/tmp/rabbitmq_defs.json ./rabbitmq_defs.json
    

By following this guide, you should be able to diagnose and resolve most RabbitMQ connectivity issues in your R2R deployment. If problems persist, consider seeking help from the RabbitMQ community or consulting the official RabbitMQ documentation for more advanced troubleshooting techniques.