R2R Troubleshooting Guide: Insufficient Instance Resources

When deploying R2R, you may encounter issues related to insufficient instance resources. This guide will help you identify, troubleshoot, and resolve these problems.

Symptoms of Insufficient Resources

  1. Containers fail to start or crash frequently
  2. Slow response times or timeouts
  3. Out of memory errors
  4. High CPU usage alerts
  5. Disk space warnings

Diagnosing Resource Issues

1. Check Docker Resource Usage

docker stats

This command shows a live stream of container resource usage statistics.

2. Check Host System Resources

top
free -h
df -h

These commands show CPU, memory, and disk usage respectively.

3. Review Container Logs

docker logs <container_name>

Look for error messages related to resource constraints.

1. Insufficient Memory

Symptom: Container exits with out of memory error or host system shows high memory usage.

Solution:

  • Increase Docker memory limit:
    docker run --memory=4g ...
    
  • For Docker Desktop, increase memory allocation in settings.
  • For cloud instances, upgrade to a larger instance type.

2. CPU Constraints

Symptom: High CPU usage, slow response times.

Solution:

  • Limit CPU usage for non-critical containers:
    docker run --cpus=0.5 ...
    
  • Upgrade to an instance with more CPU cores.

3. Disk Space Issues

Symptom: “No space left on device” errors.

Solution:

  • Clean up unused Docker resources:
    docker system prune
    
  • Increase disk space allocation for Docker (in Docker Desktop settings or cloud instance).
  • Use volume mounts for large data directories.

4. Network Resource Constraints

Symptom: Network-related timeouts or slow connections.

Solution:

  • Check and increase network resource limits:
    docker network inspect bridge
    
  • In cloud environments, ensure proper network configuration and bandwidth allocation.

R2R-Specific Resource Considerations

1. Neo4j Memory Configuration

Neo4j can be memory-intensive. Adjust its memory settings in the Docker Compose file:

environment:
  - NEO4J_server_memory_pagecache_size=2G
  - NEO4J_server_memory_heap_max__size=2G

2. Postgres with pgvector

Vector operations can be CPU-intensive. Ensure your instance has sufficient CPU resources, or consider using a managed database service.

3. Ollama for Local LLM

Local LLM inference can be very resource-intensive. Ensure your instance has:

  • At least 8GB of RAM (16GB+ recommended)
  • Sufficient disk space for model storage
  • A capable CPU or GPU for inference

4. Hatchet Engine

The Hatchet workflow engine may require significant resources depending on your workload. Monitor its resource usage and adjust as necessary.

Optimizing Resource Usage

  1. Use Resource Limits: Set appropriate CPU and memory limits for each container.
  2. Optimize Configurations: Fine-tune application configs (e.g., Neo4j memory settings, Postgres work_mem).
  3. Scale Horizontally: Consider splitting services across multiple smaller instances instead of one large instance.
  4. Use Managed Services: For production, consider using managed services for databases and other resource-intensive components.
  5. Monitor and Alert: Set up monitoring and alerting for resource usage to catch issues early.

When to Upgrade Resources

Consider upgrading your instance or allocating more resources when:

  1. You consistently see high resource utilization (>80% CPU, >90% memory).
  2. Response times are consistently slow and not improving with optimization.
  3. You’re frequently hitting resource limits and it’s affecting system stability.

Seeking Further Help

If you’ve tried these solutions and still face resource issues:

  1. Review the R2R documentation for specific resource recommendations.
  2. Check the R2R GitHub issues for similar problems and solutions.
  3. Reach out to the R2R community on Discord or GitHub for advice.
  4. Consider engaging with R2R maintainers or professional services for complex deployments.

Remember to always test in a non-production environment before making significant changes to resource allocations or instance types in a production setting.