Troubleshooting Guide: High Resource Usage in R2R Docker Deployments

High resource usage in R2R Docker deployments can lead to performance issues, system instability, or even service outages. This guide will help you identify the cause of high resource usage and provide steps to mitigate the problem.

1. Identifying the Issue

First, determine which resources are being overused:

1.1 Check Overall System Resources

Use the top or htop command to get an overview of system resource usage:

top

Look for high CPU usage, memory consumption, or swap usage.

1.2 Check Docker-specific Resource Usage

Use Docker’s built-in commands to check resource usage for containers:

docker stats

This will show CPU, memory, and I/O usage for each container.

2. Common Causes and Solutions

2.1 High CPU Usage

Possible Causes:

  • Inefficient queries or data processing
  • Continuous background tasks
  • Improperly configured LLM inference

Solutions:

  1. Optimize queries:

    • Review and optimize database queries, especially those involving large datasets.
    • Use indexing in Postgres and Neo4j where appropriate.
  2. Adjust background task frequency:

    • Review Hatchet workflows and adjust the frequency of recurring tasks.
    • Implement rate limiting for resource-intensive operations.
  3. LLM configuration:

    • If using local LLMs via Ollama, consider adjusting model parameters or switching to a lighter model.
    • For cloud LLMs, implement caching to reduce redundant API calls.
  4. Scale horizontally:

    • Consider distributing the workload across multiple R2R instances.

2.2 High Memory Usage

Possible Causes:

  • Memory leaks in custom code
  • Inefficient caching
  • Large in-memory datasets

Solutions:

  1. Identify memory-hungry containers:

    docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
    
  2. Analyze R2R application logs:

    • Look for patterns of increasing memory usage over time.
  3. Optimize memory usage:

    • Implement proper garbage collection in custom code.
    • Review and optimize caching strategies.
    • Consider using streaming for large data processing tasks instead of loading entire datasets into memory.
  4. Adjust container memory limits:

    • Update the Docker Compose file to set appropriate memory limits for containers:
      services:
        r2r:
          deploy:
            resources:
              limits:
                memory: 4G
      

2.3 High Disk I/O

Possible Causes:

  • Frequent logging
  • Inefficient database operations
  • Large file ingestion processes

Solutions:

  1. Monitor disk I/O:

    docker stats --format "table {{.Name}}\t{{.BlockIO}}"
    
  2. Optimize logging:

    • Reduce log verbosity for non-critical information.
    • Implement log rotation to manage file sizes.
  3. Database optimizations:

    • Ensure proper indexing in Postgres and Neo4j.
    • Optimize query patterns to reduce full table scans.
  4. File ingestion improvements:

    • Implement chunking for large file ingestion.
    • Consider using a dedicated storage service for large files.

3. Monitoring and Prevention

Implement proactive monitoring to catch resource issues early:

  1. Set up Docker monitoring:

    • Use tools like Prometheus and Grafana to monitor Docker metrics.
    • Set up alerts for when resource usage exceeds certain thresholds.
  2. Implement application-level metrics:

    • Use libraries like prometheus_client in Python to expose custom metrics.
    • Monitor key performance indicators specific to your R2R usage.
  3. Regular performance audits:

    • Periodically review resource usage patterns.
    • Conduct load testing to identify potential bottlenecks before they impact production.

4. Advanced Debugging

For persistent issues:

  1. Profile the R2R application:

    • Use Python profiling tools like cProfile or memory_profiler to identify resource-intensive code sections.
  2. Analyze Docker logs:

    docker logs r2r-container-name
    
  3. Inspect container details:

    docker inspect r2r-container-name
    
  4. Review orchestration logs:

    • Check Hatchet logs for insights into task execution and resource allocation.

5. Scaling Considerations

If high resource usage persists despite optimizations:

  1. Vertical scaling:

    • Increase resources (CPU, RAM) for the Docker host.
  2. Horizontal scaling:

    • Implement load balancing across multiple R2R instances.
    • Consider using Docker Swarm or Kubernetes for orchestration.
  3. Service separation:

    • Move resource-intensive components (e.g., database, LLM inference) to dedicated hosts.

Conclusion

High resource usage in R2R Docker deployments can be challenging but is often resolvable through careful analysis and optimization. Always ensure you have proper monitoring in place, and regularly review your deployment’s performance to catch issues early. If problems persist, don’t hesitate to reach out to the R2R community or consider consulting with cloud infrastructure experts.