yt-dlp-dags/ansible/playbook-README.md

# Ansible Playbooks Documentation

This document provides an overview of all available playbooks, their purpose, and how to use them operationally.

## Table of Contents

1. [Deployment Workflow](#deployment-workflow)
2. [Initial Setup Playbooks](#initial-setup-playbooks)
3. [Operational Playbooks](#operational-playbooks)
4. [Monitoring and Inspection](#monitoring-and-inspection)
5. [Common Operations](#common-operations)

## Deployment Workflow

The typical deployment workflow follows these steps:

1. **Initial Setup**: Configure machines, deploy proxies, install dependencies
2. **Master Setup**: Configure Redis, MinIO, and other infrastructure services
3. **Worker Setup**: Configure auth generators and download simulators
4. **Operational Management**: Start/stop processes, monitor status, cleanup profiles

## Initial Setup Playbooks

For a complete, automated installation on fresh nodes, you can use the main `full-install` playbook:

```bash
# Run the complete installation process on all nodes
ansible-playbook ansible/playbook-full-install.yml -i ansible/inventory.green.ini
```

The steps below are for running each part of the installation manually.

### Base Installation

```bash
# Deploy base system requirements to all nodes
ansible-playbook ansible/playbook-base-system.yml -i ansible/inventory.green.ini
```

### Proxy Deployment

```bash
# Deploy shadowsocks proxies to all nodes (as defined in cluster config)
ansible-playbook ansible/playbook-proxies.yml -i ansible/inventory.green.ini
```

### Code, Environment, and Dependencies

```bash
# Sync code to all nodes
ansible-playbook ansible/playbook-stress-sync-code.yml -i ansible/inventory.green.ini

# Generate .env files for all nodes
ansible-playbook ansible/playbook-stress-generate-env.yml -i ansible/inventory.green.ini

# Install dependencies on all nodes
ansible-playbook ansible/playbook-stress-install-deps.yml -i ansible/inventory.green.ini
```

## Operational Playbooks

### Master Node Services

Redis and MinIO are deployed as Docker containers during the initial setup (`playbook-full-install.yml`).

To start the policy enforcer and monitoring tmux sessions on the master node:
```bash
# Start policy enforcer and monitoring on master
ansible-playbook ansible/playbook-stress-manage-processes.yml -i ansible/inventory.green.ini -e "start_enforcer=true start_monitor=true"
```

To stop processes on the master node:
```bash
# Stop ONLY the policy enforcer
ansible-playbook ansible/playbook-stress-manage-processes.yml -i ansible/inventory.green.ini -e "stop_enforcer=true"

# Stop ONLY the monitor
ansible-playbook ansible/playbook-stress-manage-processes.yml -i ansible/inventory.green.ini -e "stop_monitor=true"

# Stop BOTH the enforcer and monitor
ansible-playbook ansible/playbook-stress-manage-processes.yml -i ansible/inventory.green.ini -e "stop_sessions=true"
```

### Worker Node Processes

```bash
# Start auth generators and download simulators on all workers
ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=start"

# Start ONLY auth generators on workers
ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=start-auth"

# Start ONLY download simulators on workers
ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=start-download"

# Stop all processes on workers
ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=stop"

# Stop ONLY auth generators on workers
ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=stop-auth"

# Stop ONLY download simulators on workers
ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=stop-download"

# Check status of all worker processes
ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=status"
```

### Profile Management

The `cleanup-profiles` action can be used to remove profiles from Redis. By default, it cleans up "ungrouped" profiles.

```bash
# Perform a dry run of cleaning up ungrouped profiles
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=cleanup-profiles" -e "dry_run=true"

# Clean up ungrouped profiles
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=cleanup-profiles"

# To clean up ALL profiles (destructive), set cleanup_mode=full
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=cleanup-profiles" -e "cleanup_mode=full"

# You can specify a custom setup policy file for cleanup operations
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=cleanup-profiles" -e "setup_policy=policies/my_custom_setup_policy.yaml"
```

## Monitoring and Inspection

### Status Checks

```bash
# Check status of all processes on all nodes
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=status"

# Check enforcer status on master
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=check-enforcer"

# Check profile status
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=profile-status"
```

### Log Inspection

```bash
# View tmux session output on a specific node
ssh user@hostname
tmux attach -t stress-auth-user1,user2  # For auth generator
tmux attach -t stress-download-user1,user2  # For download simulator
tmux attach -t stress-enforcer  # For policy enforcer on master
```

## Common Operations

### Code and Policy Updates

First, sync your local changes to the jump host from your development machine:
```bash
# Sync project to jump host
./tools/sync-to-jump.sh
```

Then, from the jump host, you can sync code or policies to the cluster nodes:
```bash
# Sync all application code (Python sources, scripts, etc.)
ansible-playbook ansible/playbook-stress-sync-code.yml -i ansible/inventory.green.ini

# Sync only policies and CLI configs
ansible-playbook ansible/playbook-stress-sync-policies.yml -i ansible/inventory.green.ini

# To sync files from a custom source directory on the Ansible controller, use the 'source_base_dir' extra variable:
ansible-playbook ansible/playbook-stress-sync-policies.yml -i ansible/inventory.green.ini -e "source_base_dir=/path/to/my-custom-source"
```

### Docker Image Updates

To update the `yt-dlp` docker image used by download simulators, run the following playbook. This builds the image locally on each worker node.

```bash
# Build the yt-dlp docker image locally on each worker node
ansible-playbook ansible/playbook-update-yt-dlp-docker.yml -i ansible/inventory.green.ini
```

### Adding a New Worker

1. Update `cluster.green.yml` with the new worker definition:
   ```yaml
   workers:
     new-worker:
       ip: x.x.x.x
       port: 22
       profile_prefixes:
         - "user4"
       proxies:
         - "sslocal-rust-1090"
   ```

2. Regenerate inventory:
   ```bash
   ./tools/generate-inventory.py cluster.green.yml
   ```

3. Run the full installation playbook, limiting it to the new worker:
   ```bash
   ansible-playbook ansible/playbook-full-install.yml -i ansible/inventory.green.ini --limit new-worker
   ```

4. Start processes on the new worker:
   ```bash
   ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=start" --limit new-worker
   ```

### Removing a Worker

1. Stop all processes on the worker:
   ```bash
   ansible-playbook ansible/playbook-stress-lifecycle.yml -i ansible/inventory.green.ini -e "action=stop" --limit worker-to-remove
   ```

2. Remove the worker from `cluster.green.yml`

3. Regenerate inventory:
   ```bash
   ./tools/generate-inventory.py cluster.green.yml
   ```

### Emergency Stop All

```bash
# Stop all processes on all nodes
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=stop-all"
```

### Stopping Specific Nodes

To stop processes on a specific worker or group of workers, you can use the `stop-nodes` action and limit the playbook run.

```bash
# Stop all processes on a single worker (e.g., dl003)
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=stop-nodes" --limit dl003

# Stop all processes on all workers
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=stop-nodes" --limit workers
```

### Restart Enforcer and Monitoring

```bash
# Restart monitoring and enforcer on master using default policies
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=restart-monitoring"

# Restart using a custom enforcer policy
ansible-playbook ansible/playbook-stress-control.yml -i ansible/inventory.green.ini -e "action=restart-monitoring" -e "enforcer_policy=policies/my_other_enforcer.yaml"
```