Remove intermediate
This commit is contained in:
parent
b3f8597e81
commit
de609aaecd
@ -1,249 +0,0 @@
|
||||
# Стратегия Управления Прокси и Аккаунтами
|
||||
|
||||
В этом документе описывается интеллектуальная стратегия управления ресурсами (прокси и аккаунтами), используемая в `ytdlp-ops-server`. Цель этой системы — максимизировать процент успешных операций, минимизировать блокировки и обеспечить отказоустойчивость.
|
||||
|
||||
Сервер может работать в разных ролях для поддержки распределенной архитектуры, разделяя задачи управления и задачи генерации токенов.
|
||||
|
||||
---
|
||||
|
||||
## Роли Сервиса и Архитектура
|
||||
|
||||
Сервер предназначен для работы в одной из трех ролей, указываемых флагом `--service-role`:
|
||||
|
||||
- **`management`**: Один легковесный экземпляр сервиса, отвечающий за все вызовы API управления.
|
||||
- **Назначение**: Предоставляет централизованную точку входа для мониторинга и управления состоянием всех прокси и аккаунтов в системе.
|
||||
- **Поведение**: Предоставляет только функции управления (`getProxyStatus`, `banAccount` и т.д.). Вызовы функций генерации токенов будут завершаться ошибкой.
|
||||
- **Развертывание**: Запускается как один контейнер (`ytdlp-ops-management`) и напрямую открывает свой порт на хост (например, порт `9091`), минуя Envoy.
|
||||
|
||||
- **`worker`**: Основная "рабочая лошадка" для генерации токенов и `info.json`.
|
||||
- **Назначение**: Обрабатывает все запросы на генерацию токенов.
|
||||
- **Поведение**: Реализует полный API, но его функции управления ограничены его собственным `server_identity`.
|
||||
- **Развертывание**: Запускается как масштабируемый сервис (`ytdlp-ops-worker`) за балансировщиком нагрузки Envoy (например, порт `9080`).
|
||||
|
||||
- **`all-in-one`** (По умолчанию): Один экземпляр, который выполняет как управленческие, так и рабочие функции. Идеально подходит для локальной разработки или небольших развертываний.
|
||||
|
||||
Эта архитектура позволяет создать надежную, федеративную систему, где воркеры управляют своими ресурсами локально, в то время как центральный сервис предоставляет глобальное представление для управления и мониторинга.
|
||||
|
||||
---
|
||||
|
||||
## 1. Управление Жизненным Циклом Аккаунтов (Cooldown / Resting)
|
||||
|
||||
**Цель:** Предотвратить чрезмерное использование и последующую блокировку аккаунтов, предоставляя им периоды "отдыха" после интенсивной работы.
|
||||
|
||||
### Как это работает:
|
||||
Жизненный цикл аккаунта состоит из трех состояний:
|
||||
- **`ACTIVE`**: Аккаунт активен и используется для выполнения задач. При первом успешном использовании запускается таймер его активности.
|
||||
- **`RESTING`**: Если аккаунт был в состоянии `ACTIVE` дольше установленного лимита, `AccountManager` автоматически переводит его в состояние "отдыха". В этом состоянии Airflow worker не будет выбирать его для новых задач.
|
||||
- **Возврат в `ACTIVE`**: После завершения периода "отдыха" `AccountManager` автоматически возвращает аккаунт в состояние `ACTIVE`, делая его снова доступным.
|
||||
|
||||
### Конфигурация:
|
||||
Эти параметры настраиваются при запуске `ytdlp-ops-server`.
|
||||
|
||||
- `--account-active-duration-min`: "Время работы" в **минутах**, которое аккаунт может быть непрерывно активным до перехода в `RESTING`.
|
||||
- **Значение по умолчанию:** `30` (минут).
|
||||
- `--account-cooldown-duration-min`: "Время отдыха" в **минутах**, которое аккаунт должен находиться в состоянии `RESTING`.
|
||||
- **Значение по умолчанию:** `60` (минут).
|
||||
|
||||
**Где настраивать:**
|
||||
Параметры передаются как аргументы командной строки при запуске сервера. При использовании Docker Compose это делается в файле `airflow/docker-compose-ytdlp-ops.yaml`:
|
||||
```yaml
|
||||
command:
|
||||
# ... другие параметры
|
||||
- "--account-active-duration-min"
|
||||
- "${ACCOUNT_ACTIVE_DURATION_MIN:-30}"
|
||||
- "--account-cooldown-duration-min"
|
||||
- "${ACCOUNT_COOLDOWN_DURATION_MIN:-60}"
|
||||
```
|
||||
Вы можете изменить значения по умолчанию, установив переменные окружения `ACCOUNT_ACTIVE_DURATION_MIN` и `ACCOUNT_COOLDOWN_DURATION_MIN` в вашем `.env` файле.
|
||||
|
||||
**Соответствующие файлы:**
|
||||
- `server_fix/account_manager.py`: Содержит основную логику для переключения состояний.
|
||||
- `ytdlp_ops_server_fix.py`: Обрабатывает аргументы командной строки.
|
||||
- `airflow/docker-compose-ytdlp-ops.yaml`: Передает аргументы в контейнер сервера.
|
||||
|
||||
---
|
||||
|
||||
## 2. Умная Стратегия Банов
|
||||
|
||||
**Цель:** Избежать необоснованных банов хороших прокси. Проблема часто может быть в аккаунте, а не в прокси, через который он работает.
|
||||
|
||||
### Как это работает:
|
||||
|
||||
#### Этап 1: Сначала Бан Аккаунта
|
||||
- При возникновении серьезной ошибки, требующей бана (например, `BOT_DETECTED` или `SOCKS5_CONNECTION_FAILED`), система применяет санкции **только к аккаунту**, который вызвал ошибку.
|
||||
- Для прокси эта ошибка просто фиксируется как один сбой, но сам прокси **не банится** и остается в работе.
|
||||
|
||||
#### Этап 2: Бан Прокси по "Скользящему Окну"
|
||||
- Прокси блокируется автоматически, только если он демонстрирует **систематические сбои с РАЗНЫМИ аккаунтами** за короткий промежуток времени.
|
||||
- Это является надежным индикатором того, что проблема именно в прокси. `ProxyManager` на сервере отслеживает это и автоматически банит такой прокси.
|
||||
|
||||
### Конфигурация:
|
||||
Эти параметры **жестко заданы** как константы в коде и для их изменения требуется редактирование файла.
|
||||
|
||||
**Где настраивать:**
|
||||
- **Файл:** `server_fix/proxy_manager.py`
|
||||
- **Константы** в классе `ProxyManager`:
|
||||
- `FAILURE_WINDOW_SECONDS`: Временное окно в секундах для анализа сбоев.
|
||||
- **Значение по умолчанию:** `3600` (1 час).
|
||||
- `FAILURE_THRESHOLD_COUNT`: Минимальное общее количество сбоев для запуска проверки.
|
||||
- **Значение по умолчанию:** `3`.
|
||||
- `FAILURE_THRESHOLD_UNIQUE_ACCOUNTS`: Минимальное количество **уникальных аккаунтов**, с которыми произошли сбои, чтобы забанить прокси.
|
||||
- **Значение по умолчанию:** `3`.
|
||||
|
||||
**Соответствующие файлы:**
|
||||
- `server_fix/proxy_manager.py`: Содержит логику "скользящего окна" и константы.
|
||||
- `airflow/dags/ytdlp_ops_worker_per_url.py`: Функция `handle_bannable_error_callable` реализует политику бана "только аккаунт".
|
||||
|
||||
---
|
||||
|
||||
### Расшифровка Статусов Аккаунтов
|
||||
|
||||
Вы можете просмотреть статус всех аккаунтов с помощью DAG `ytdlp_mgmt_proxy_account`. Статусы имеют следующие значения:
|
||||
|
||||
- **`ACTIVE`**: Аккаунт исправен и доступен для использования. По умолчанию, аккаунт считается `ACTIVE`, если у него не установлен конкретный статус.
|
||||
- **`BANNED`**: Аккаунт временно отключен из-за повторяющихся сбоев (например, ошибок `BOT_DETECTED`) или забанен вручную. В статусе будет указано время, оставшееся до его автоматического возвращения в `ACTIVE` (например, `BANNED (active in 55m)`).
|
||||
- **`RESTING`**: Аккаунт использовался в течение длительного времени и находится в обязательном периоде "отдыха" для предотвращения "выгорания". В статусе будет указано время, оставшееся до его возвращения в `ACTIVE` (например, `RESTING (active in 25m)`).
|
||||
- **(Пустой Статус)**: В более старых версиях аккаунт, у которого были только сбои (и ни одного успеха), мог отображаться с пустым статусом. Это было исправлено; теперь такие аккаунты корректно отображаются как `ACTIVE`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Сквозной Процесс Ротации: Как Всё Работает Вместе
|
||||
|
||||
Этот раздел описывает пошаговый процесс того, как воркер получает аккаунт и прокси для одной задачи, объединяя все вышеописанные стратегии управления.
|
||||
|
||||
1. **Инициализация Воркера (`ytdlp_ops_worker_per_url`)**
|
||||
- Запускается DAG, инициированный либо оркестратором, либо предыдущим успешным запуском самого себя.
|
||||
- Задача `pull_url_from_redis` извлекает URL из очереди `_inbox` в Redis.
|
||||
|
||||
2. **Выбор Аккаунта (Воркер Airflow)**
|
||||
- Выполняется задача `assign_account`.
|
||||
- Она генерирует полный список потенциальных ID аккаунтов на основе параметра `account_pool` (например, от `my_prefix_01` до `my_prefix_50`).
|
||||
- Она подключается к Redis и проверяет статус каждого аккаунта из этого списка.
|
||||
- Она создает новый временный список, содержащий только те аккаунты, которые **не** находятся в состоянии `BANNED` или `RESTING`.
|
||||
- Если итоговый список активных аккаунтов пуст, воркер завершается с ошибкой (если не включено автосоздание).
|
||||
- Затем из отфильтрованного списка активных аккаунтов с помощью **`random.choice()`** выбирается один.
|
||||
- Выбранный `account_id` передается следующей задаче.
|
||||
|
||||
3. **Выбор Прокси (`ytdlp-ops-server`)**
|
||||
- Выполняется задача `get_token`, которая отправляет случайно выбранный `account_id` в Thrift RPC-вызове на `ytdlp-ops-server`.
|
||||
- На сервере у `ProxyManager` запрашивается прокси.
|
||||
- `ProxyManager`:
|
||||
a. Обновляет свое внутреннее состояние, загружая статусы всех прокси из Redis.
|
||||
b. Фильтрует список, оставляя только прокси со статусом `ACTIVE`.
|
||||
c. Применяет политику бана по "скользящему окну", потенциально блокируя прокси, которые недавно слишком часто выходили из строя.
|
||||
d. Выбирает следующий доступный прокси из активного списка, используя индекс **round-robin** (по кругу).
|
||||
e. Возвращает выбранный `proxy_url`.
|
||||
|
||||
4. **Выполнение и Отчетность**
|
||||
- Теперь у сервера есть и `account_id` (от Airflow), и `proxy_url` (от его `ProxyManager`).
|
||||
- Он приступает к процессу генерации токенов, используя эти ресурсы.
|
||||
- По завершении (успешном или неудачном) он сообщает о результате в Redis, обновляя статусы для конкретного аккаунта и прокси, которые были использованы. Это влияет на их счетчики сбоев, таймеры "отдыха" и т.д. для следующего запуска.
|
||||
|
||||
Это разделение ответственности является ключевым:
|
||||
- **Воркер Airflow (задача `assign_account`)** отвечает за **случайный выбор активного аккаунта**, сохраняя при этом "привязку" (повторно используя тот же аккаунт после успеха).
|
||||
- **Сервер `ytdlp-ops-server`** отвечает за **циклический выбор (round-robin) активного прокси**.
|
||||
|
||||
---
|
||||
|
||||
## 4. Автоматический Бан Аккаунтов по Количеству Сбоев
|
||||
|
||||
**Цель:** Автоматически выводить из ротации аккаунты, которые постоянно вызывают ошибки, не связанные с баном (например, неверный пароль, проблемы с авторизацией).
|
||||
|
||||
### Как это работает:
|
||||
- `AccountManager` отслеживает количество **последовательных** сбоев для каждого аккаунта.
|
||||
- При успешной операции счетчик сбрасывается.
|
||||
- Если количество последовательных сбоев достигает заданного порога, аккаунт автоматически банится на определенный срок.
|
||||
|
||||
### Конфигурация:
|
||||
Эти параметры задаются в конструкторе класса `AccountManager`.
|
||||
|
||||
**Где настраивать:**
|
||||
- **Файл:** `server_fix/account_manager.py`
|
||||
- **Параметры** в `__init__` метода `AccountManager`:
|
||||
- `failure_threshold`: Количество последовательных сбоев до бана.
|
||||
- **Значение по умолчанию:** `5`.
|
||||
- `ban_duration_s`: Длительность бана в секундах.
|
||||
- **Значение по умолчанию:** `3600` (1 час).
|
||||
|
||||
---
|
||||
|
||||
## 5. Мониторинг и Восстановление
|
||||
|
||||
### Как Проверить Статусы
|
||||
DAG **`ytdlp_mgmt_proxy_account`** — это основной инструмент для мониторинга состояния ваших ресурсов. Он подключается напрямую к **сервису управления** для выполнения действий.
|
||||
|
||||
- **ID DAG'а:** `ytdlp_mgmt_proxy_account`
|
||||
- **Как использовать:** Запустите DAG из интерфейса Airflow. Убедитесь, что параметры `management_host` и `management_port` правильно указывают на ваш экземпляр сервиса `ytdlp-ops-management`. Для получения полного обзора установите параметры:
|
||||
- `entity`: `all`
|
||||
- `action`: `list`
|
||||
- **Результат:** В логе DAG'а будут отображены таблицы с текущим статусом всех аккаунтов и прокси. Для аккаунтов в состоянии `BANNED` или `RESTING` будет показано время, оставшееся до их активации (например, `RESTING (active in 45m)`). Для прокси будет подсвечено, какой из них является следующим `(next)` в ротации для конкретного воркера.
|
||||
|
||||
### Что Произойдет, если Все Аккаунты Будут Забанены или в "Отдыхе"?
|
||||
Если весь пул аккаунтов станет недоступен (в статусе `BANNED` или `RESTING`), система по умолчанию приостановит работу.
|
||||
- DAG `ytdlp_ops_worker_per_url` завершится с ошибкой `AirflowException` на шаге `assign_account`, так как пул активных аккаунтов будет пуст.
|
||||
- Это остановит циклы обработки. Система будет находиться в состоянии паузы до тех пор, пока аккаунты не будут разбанены вручную или пока не истечет их таймер бана/отдыха. После этого вы сможете перезапустить циклы обработки с помощью DAG'а `ytdlp_ops_orchestrator`.
|
||||
- Граф выполнения DAG `ytdlp_ops_worker_per_url` теперь явно показывает такие задачи, как `assign_account`, `get_token`, `ban_account`, `retry_get_token` и т.д., что делает поток выполнения и точки сбоя более наглядными.
|
||||
|
||||
Систему можно настроить на автоматическое создание новых аккаунтов, чтобы предотвратить полную остановку обработки.
|
||||
|
||||
#### Автоматическое Создание Аккаунтов при Исчерпании
|
||||
- **Цель**: Обеспечить непрерывную работу конвейера обработки, даже если все аккаунты в основном пуле временно забанены или находятся в "отдыхе".
|
||||
- **Как это работает**: Если параметр `auto_create_new_accounts_on_exhaustion` установлен в `True` и пул аккаунтов задан с помощью префикса (а не явного списка), система сгенерирует новый уникальный ID аккаунта, когда обнаружит, что активный пул пуст.
|
||||
- **Именование новых аккаунтов**: Новые аккаунты создаются в формате `{prefix}-auto-{уникальный_id}`.
|
||||
- **Конфигурация**:
|
||||
- **Параметр**: `auto_create_new_accounts_on_exhaustion`
|
||||
- **Где настраивать**: В конфигурации DAG `ytdlp_ops_orchestrator` при запуске.
|
||||
- **Значение по умолчанию**: `True`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Обработка Сбоев и Политика Повторных Попыток
|
||||
|
||||
**Цель:** Обеспечить гибкое управление поведением системы, когда воркер сталкивается с ошибкой, требующей бана (например, `BOT_DETECTED`).
|
||||
|
||||
### Как это работает
|
||||
Когда задача `get_token` воркера завершается с ошибкой, требующей бана, поведение системы определяется политикой `on_bannable_failure`, которую можно настроить при запуске `ytdlp_ops_orchestrator`.
|
||||
|
||||
### Конфигурация
|
||||
- **Параметр**: `on_bannable_failure`
|
||||
- **Где настраивать**: В конфигурации DAG `ytdlp_ops_orchestrator`.
|
||||
- **Опции**:
|
||||
- `stop_loop` (Самая строгая):
|
||||
- Использованный аккаунт банится.
|
||||
- URL помечается как сбойный в хэше `_fail` в Redis.
|
||||
- Цикл обработки воркера **останавливается**. "Линия" обработки становится неактивной.
|
||||
- `retry_with_new_account` (По умолчанию, самая отказоустойчивая):
|
||||
- Аккаунт, вызвавший сбой, банится.
|
||||
- Воркер немедленно повторяет обработку **того же URL** с новым, неиспользованным аккаунтом из пула.
|
||||
- Если повторная попытка успешна, воркер продолжает свой цикл для обработки следующего URL.
|
||||
- Если повторная попытка также завершается сбоем, второй аккаунт **и использованный прокси** также банятся, и цикл работы воркера останавливается.
|
||||
- `retry_and_ban_account_only`:
|
||||
- Похожа на `retry_with_new_account`, но при втором сбое банится **только второй аккаунт**, а не прокси.
|
||||
- Это полезно, когда вы доверяете своим прокси, но хотите агрессивно перебирать сбойные аккаунты.
|
||||
- `retry_without_ban` (Самая мягкая):
|
||||
- Воркер повторяет попытку с новым аккаунтом, но **ни аккаунты, ни прокси никогда не банятся**.
|
||||
- Эта политика полезна для отладки или когда вы уверены, что сбои являются временными и не вызваны проблемами с ресурсами.
|
||||
|
||||
Эта политика позволяет системе быть устойчивой к сбоям отдельных аккаунтов, не теряя URL, и в то же время обеспечивает гранулярный контроль над тем, когда банить аккаунты и/или прокси, если проблема сохраняется.
|
||||
|
||||
---
|
||||
|
||||
## 7. Логика Работы Worker DAG (`ytdlp_ops_worker_per_url`)
|
||||
|
||||
Этот DAG является "рабочей лошадкой" системы. Он спроектирован как самоподдерживающийся цикл для обработки одного URL за запуск. Логика обработки сбоев и повторных попыток теперь явно видна в графе задач DAG.
|
||||
|
||||
### Задачи и их назначение:
|
||||
|
||||
- **`pull_url_from_redis`**: Извлекает один URL из очереди `_inbox` в Redis. Если очередь пуста, DAG завершается со статусом `skipped`, останавливая эту "линию" обработки.
|
||||
- **`assign_account`**: Выбирает аккаунт для задачи. Он поддерживает **привязку аккаунта (affinity)**, повторно используя тот же аккаунт из предыдущего успешного запуска в своей "линии". Если это первый запуск или предыдущий был неудачным, он выбирает случайный активный аккаунт.
|
||||
- **`get_token`**: Основная попытка получить токены и `info.json` путем вызова `ytdlp-ops-server`.
|
||||
- **`handle_bannable_error_branch`**: Задача-развилка, которая запускается в случае сбоя `get_token`. Она анализирует ошибку и определяет следующий шаг на основе политики `on_bannable_failure`.
|
||||
- **`ban_account_and_prepare_for_retry`**: Если разрешен повтор, эта задача банит сбойный аккаунт и выбирает новый.
|
||||
- **`retry_get_token`**: Вторая попытка получить токен с использованием нового аккаунта.
|
||||
- **`ban_second_account_and_proxy`**: Если и повторная попытка завершается неудачей, эта задача банит второй аккаунт и использованный прокси.
|
||||
- **`download_and_probe`**: Если `get_token` или `retry_get_token` завершается успешно, эта задача использует `yt-dlp` для скачивания медиа и `ffmpeg` для проверки целостности файла.
|
||||
- **`mark_url_as_success`**: Если `download_and_probe` завершается успешно, эта задача записывает успешный результат в хэш `_result` в Redis.
|
||||
- **`handle_generic_failure`**: Если любая задача завершается с неисправимой ошибкой, эта задача записывает подробную информацию об ошибке в хэш `_fail` в Redis.
|
||||
- **`decide_what_to_do_next`**: Финальная задача-развилка, которая решает, продолжать ли цикл (`trigger_self_run`), остановить его корректно (`stop_loop`) или пометить как сбойный (`fail_loop`).
|
||||
- **`trigger_self_run`**: Задача, которая фактически запускает следующий экземпляр DAG, создавая непрерывный цикл.
|
||||
|
||||
@ -1,322 +0,0 @@
|
||||
# Proxy and Account Management Strategy
|
||||
|
||||
This document describes the intelligent resource management strategy (for proxies and accounts) used by the `ytdlp-ops-server`. The goal of this system is to maximize the success rate, minimize blocks, and ensure fault tolerance.
|
||||
|
||||
The server can run in different roles to support a distributed architecture, separating management tasks from token generation work.
|
||||
|
||||
---
|
||||
|
||||
## Service Roles and Architecture
|
||||
|
||||
The server is designed to run in one of three roles, specified by the `--service-role` flag:
|
||||
|
||||
- **`management`**: A single, lightweight service instance responsible for all management API calls.
|
||||
- **Purpose**: Provides a centralized endpoint for monitoring and managing the state of all proxies and accounts across the system.
|
||||
- **Behavior**: Exposes only management functions (`getProxyStatus`, `banAccount`, etc.). Calls to token generation functions will fail.
|
||||
- **Deployment**: Runs as a single container (`ytdlp-ops-management`) and exposes its port directly to the host (e.g., port `9091`), bypassing Envoy.
|
||||
|
||||
- **`worker`**: The primary workhorse for token and `info.json` generation.
|
||||
- **Purpose**: Handles all token generation requests.
|
||||
- **Behavior**: Implements the full API, but its management functions are scoped to its own `server_identity`.
|
||||
- **Deployment**: Runs as a scalable service (`ytdlp-ops-worker`) behind the Envoy load balancer (e.g., port `9080`).
|
||||
|
||||
- **`all-in-one`** (Default): A single instance that performs both management and worker roles. Ideal for local development or small-scale deployments.
|
||||
|
||||
This architecture allows for a robust, federated system where workers manage their own resources locally, while a central service provides a global view for management and monitoring.
|
||||
|
||||
---
|
||||
|
||||
## 1. Account Lifecycle Management (Cooldown / Resting)
|
||||
|
||||
**Goal:** To prevent excessive use and subsequent blocking of accounts by providing them with "rest" periods after intensive work.
|
||||
|
||||
### How It Works:
|
||||
The account lifecycle consists of three states:
|
||||
- **`ACTIVE`**: The account is active and used for tasks. An activity timer starts on its first successful use.
|
||||
- **`RESTING`**: If an account has been `ACTIVE` for longer than the configured limit, the `AccountManager` automatically moves it to a "resting" state. The Airflow worker will not select it for new jobs.
|
||||
- **Return to `ACTIVE`**: After the cooldown period ends, the `AccountManager` automatically returns the account to the `ACTIVE` state, making it available again.
|
||||
|
||||
### Configuration:
|
||||
These parameters are configured when starting the `ytdlp-ops-server`.
|
||||
|
||||
- `--account-active-duration-min`: The "action time" in **minutes** an account can be continuously active before being moved to `RESTING`.
|
||||
- **Default:** `30` (minutes).
|
||||
- `--account-cooldown-duration-min`: The "rest time" in **minutes** an account must remain in the `RESTING` state.
|
||||
- **Default:** `60` (minutes).
|
||||
|
||||
**Where to Configure:**
|
||||
The parameters are passed as command-line arguments to the server. When using Docker Compose, this is done in `airflow/docker-compose-ytdlp-ops.yaml`:
|
||||
```yaml
|
||||
command:
|
||||
# ... other parameters
|
||||
- "--account-active-duration-min"
|
||||
- "${ACCOUNT_ACTIVE_DURATION_MIN:-30}"
|
||||
- "--account-cooldown-duration-min"
|
||||
- "${ACCOUNT_COOLDOWN_DURATION_MIN:-60}"
|
||||
```
|
||||
You can change the default values by setting the `ACCOUNT_ACTIVE_DURATION_MIN` and `ACCOUNT_COOLDOWN_DURATION_MIN` environment variables in your `.env` file.
|
||||
|
||||
**Relevant Files:**
|
||||
- `server_fix/account_manager.py`: Contains the core logic for state transitions.
|
||||
- `ytdlp_ops_server_fix.py`: Parses the command-line arguments.
|
||||
- `airflow/docker-compose-ytdlp-ops.yaml`: Passes the arguments to the server container.
|
||||
|
||||
---
|
||||
|
||||
## 2. Smart Banning Strategy
|
||||
|
||||
**Goal:** To avoid unfairly banning good proxies. The problem is often with the account, not the proxy it's using.
|
||||
|
||||
### How It Works:
|
||||
|
||||
#### Stage 1: Ban the Account First
|
||||
- When a serious, bannable error occurs (e.g., `BOT_DETECTED` or `SOCKS5_CONNECTION_FAILED`), the system penalizes **only the account** that caused the error.
|
||||
- For the proxy, this error is simply recorded as a single failure, but the proxy itself is **not banned** and remains in rotation.
|
||||
|
||||
#### Stage 2: Ban the Proxy via "Sliding Window"
|
||||
- A proxy is banned automatically only if it shows **systematic failures with DIFFERENT accounts** over a short period.
|
||||
- This is a reliable indicator that the proxy itself is the problem. The `ProxyManager` on the server tracks this and automatically bans such a proxy.
|
||||
|
||||
### Configuration:
|
||||
These parameters are **hard-coded** as constants in the source code. Changing them requires editing the file.
|
||||
|
||||
**Where to Configure:**
|
||||
- **File:** `server_fix/proxy_manager.py`
|
||||
- **Constants** in the `ProxyManager` class:
|
||||
- `FAILURE_WINDOW_SECONDS`: The time window in seconds for analyzing failures.
|
||||
- **Default:** `3600` (1 hour).
|
||||
- `FAILURE_THRESHOLD_COUNT`: The minimum total number of failures to trigger a check.
|
||||
- **Default:** `3`.
|
||||
- `FAILURE_THRESHOLD_UNIQUE_ACCOUNTS`: The minimum number of **unique accounts** that must have failed with the proxy to trigger a ban.
|
||||
- **Default:** `3`.
|
||||
|
||||
**Relevant Files:**
|
||||
- `server_fix/proxy_manager.py`: Contains the sliding window logic and constants.
|
||||
- `airflow/dags/ytdlp_ops_worker_per_url.py`: The `handle_bannable_error_callable` function implements the "account-only" ban policy.
|
||||
|
||||
---
|
||||
|
||||
### Account Statuses Explained
|
||||
|
||||
You can view the status of all accounts using the `ytdlp_mgmt_proxy_account` DAG. The statuses have the following meanings:
|
||||
|
||||
- **`ACTIVE`**: The account is healthy and available for use. An account is considered `ACTIVE` by default if it has no specific status set.
|
||||
- **`BANNED`**: The account has been temporarily disabled due to repeated failures (e.g., `BOT_DETECTED` errors) or by a manual ban. The status will show the time remaining until it automatically becomes `ACTIVE` again (e.g., `BANNED (active in 55m)`).
|
||||
- **`RESTING`**: The account has been used for an extended period and is in a mandatory "cooldown" period to prevent burnout. The status will show the time remaining until it becomes `ACTIVE` again (e.g., `RESTING (active in 25m)`).
|
||||
- **(Blank Status)**: In older versions, an account that had only ever failed (and never succeeded) might appear with a blank status. This has been fixed; these accounts are now correctly shown as `ACTIVE`.
|
||||
|
||||
---
|
||||
|
||||
## 3. End-to-End Rotation Flow: How It All Works Together
|
||||
|
||||
This section describes the step-by-step flow of how a worker gets assigned an account and a proxy for a single job, integrating all the management strategies described above.
|
||||
|
||||
1. **Worker Initialization (`ytdlp_ops_worker_per_url`)**
|
||||
- The DAG run starts, triggered either by the orchestrator or by its previous successful run.
|
||||
- The `pull_url_from_redis` task fetches a URL from the Redis `_inbox` queue.
|
||||
|
||||
2. **Account Selection (Airflow Worker)**
|
||||
- The `assign_account` task is executed.
|
||||
- It generates the full list of potential account IDs based on the `account_pool` (e.g., `my_prefix_01` to `my_prefix_50`).
|
||||
- It connects to Redis and iterates through this list, checking the status of each account.
|
||||
- It builds a new, temporary list containing only accounts that are **not** in a `BANNED` or `RESTING` state.
|
||||
- If the resulting list of active accounts is empty, the worker fails (unless auto-creation is enabled).
|
||||
- It then takes the filtered list of active accounts and uses **`random.choice()`** to select one.
|
||||
- The chosen `account_id` is passed to the next task.
|
||||
|
||||
3. **Proxy Selection (`ytdlp-ops-server`)**
|
||||
- The `get_token` task runs, sending the randomly chosen `account_id` in a Thrift RPC call to the `ytdlp-ops-server`.
|
||||
- On the server, the `ProxyManager` is asked for a proxy. This happens on **every single request**.
|
||||
- The `ProxyManager` performs the following steps on every call to ensure it has the most up-to-date information:
|
||||
a. **Query Redis:** It fetches the *entire* current state of all proxies from Redis. This ensures it immediately knows about any status changes (e.g., a ban) made by other workers.
|
||||
b. **Rebuild Active List:** It rebuilds its internal in-memory list of proxies, including only those with an `ACTIVE` status.
|
||||
c. **Apply Sliding Window Ban:** It checks the recent failure history for each active proxy. If a proxy has failed too many times with different accounts, it is banned on the spot, even if its status was `ACTIVE`.
|
||||
d. **Select Proxy:** It selects the next available proxy from the final, filtered active list using a **round-robin** index.
|
||||
e. **Return Proxy:** It returns the selected `proxy_url` to be used for the token generation task.
|
||||
- **Worker Affinity**: Crucially, even though workers may share a proxy state in Redis under a common `server_identity`, each worker instance will **only ever use the proxies it was configured with at startup**. It uses Redis to check the status of its own proxies but will ignore other proxies in the shared pool.
|
||||
|
||||
4. **Execution and Reporting**
|
||||
- The server now has both the `account_id` (from Airflow) and the `proxy_url` (from its `ProxyManager`).
|
||||
- It proceeds with the token generation process using these resources.
|
||||
- Upon completion (success or failure), it reports the outcome to Redis, updating the status for both the specific account and proxy that were used. This affects their failure counters, cooldown timers, etc., for the next run.
|
||||
|
||||
This separation of concerns is key:
|
||||
- **The Airflow worker (`assign_account` task)** is responsible for the **random selection of an active account**, while maintaining affinity (re-using the same account after a success).
|
||||
- **The `ytdlp-ops-server`** is responsible for the **round-robin selection of an active proxy**.
|
||||
|
||||
---
|
||||
|
||||
## 4. Automatic Account Ban on Consecutive Failures
|
||||
|
||||
**Goal:** To automatically remove accounts from rotation that consistently cause non-bannable errors (e.g., incorrect password, authorization issues).
|
||||
|
||||
### How It Works:
|
||||
- The `AccountManager` tracks the number of **consecutive** failures for each account.
|
||||
- On any successful operation, this counter is reset.
|
||||
- If the number of consecutive failures reaches a set threshold, the account is automatically banned for a specified duration.
|
||||
|
||||
### Configuration:
|
||||
These parameters are set in the `AccountManager` constructor.
|
||||
|
||||
**Where to Configure:**
|
||||
- **File:** `server_fix/account_manager.py`
|
||||
- **Parameters** in the `__init__` method of `AccountManager`:
|
||||
- `failure_threshold`: The number of consecutive failures before a ban.
|
||||
- **Default:** `5`.
|
||||
- `ban_duration_s`: The duration of the ban in seconds.
|
||||
- **Default:** `3600` (1 hour).
|
||||
|
||||
---
|
||||
|
||||
## 5. Monitoring and Recovery
|
||||
|
||||
### How to Check Statuses
|
||||
The **`ytdlp_mgmt_proxy_account`** DAG is the primary tool for monitoring the health of your resources. It connects directly to the **management service** to perform actions.
|
||||
|
||||
- **DAG ID:** `ytdlp_mgmt_proxy_account`
|
||||
- **How to Use:** Trigger the DAG from the Airflow UI. Ensure the `management_host` and `management_port` parameters are correctly set to point to your `ytdlp-ops-management` service instance. To get a full overview, set the parameters:
|
||||
- `entity`: `all`
|
||||
- `action`: `list`
|
||||
- **Result:** The DAG log will display tables with the current status of all accounts and proxies. For `BANNED` or `RESTING` accounts, it shows the time remaining until they become active again (e.g., `RESTING (active in 45m)`). For proxies, it highlights which proxy is `(next)` in the round-robin rotation for a specific worker.
|
||||
|
||||
### Worker vs. Management Service Roles in Automatic State Changes
|
||||
|
||||
It is important to understand the distinct roles each service plays in the automatic state management of accounts and proxies. The system uses a reactive, "on-read" update mechanism.
|
||||
|
||||
- **The `worker` service is proactive.** It is responsible for putting resources into a "bad" state.
|
||||
- When a worker encounters too many failures with an account, it moves the account to `BANNED`.
|
||||
- When an account's activity timer expires, the worker moves it to `RESTING`.
|
||||
- When a proxy fails the sliding window check during a token request, the worker bans it.
|
||||
|
||||
- **The `management` service is reactive but crucial for recovery.** It is responsible for taking resources out of a "bad" state.
|
||||
- The logic to check if a ban has expired or a rest period is over is located in the `getAccountStatus` and `getProxyStatus` methods.
|
||||
- This means an account or proxy is only returned to an `ACTIVE` state **when its status is queried**.
|
||||
- Since the `ytdlp_mgmt_proxy_account` DAG calls these methods on the `management` service, running this DAG is the primary mechanism for automatically clearing expired bans and rest periods.
|
||||
|
||||
In summary, workers put resources into timeout, and the management service (when queried) brings them back. This makes periodic checks with the management DAG important for overall system health and recovery.
|
||||
|
||||
### Important Note on Unbanning Proxies
|
||||
|
||||
When a proxy is unbanned (either individually via `unban` or collectively via `unban_all`), the system performs two critical actions:
|
||||
1. It sets the proxy's status back to `ACTIVE`.
|
||||
2. It **deletes the proxy's entire failure history** from Redis.
|
||||
|
||||
This second step is crucial. Without it, the `ProxyManager`'s "Sliding Window" check would see the old failures, immediately re-ban the "active" proxy on its next use, and lead to a `NO_ACTIVE_PROXIES` error. Clearing the history ensures that an unbanned proxy gets a truly fresh start.
|
||||
|
||||
### What Happens When All Accounts Are Banned or Resting?
|
||||
If the entire pool of accounts becomes unavailable (either `BANNED` or `RESTING`), the system will effectively pause by default.
|
||||
- The `ytdlp_ops_worker_per_url` DAG will fail at the `assign_account` step with an `AirflowException` because the active account pool will be empty.
|
||||
- This will stop the processing loops. The system will remain paused until accounts are either manually unbanned or their ban/rest timers expire, at which point you can re-start the processing loops using the `ytdlp_ops_orchestrator` DAG.
|
||||
- The DAG graph for `ytdlp_ops_worker_per_url` now explicitly shows tasks for `assign_account`, `get_token`, `ban_account`, `retry_get_token`, etc., making the process flow and failure points much clearer.
|
||||
|
||||
The system can be configured to automatically create new accounts to prevent processing from halting completely.
|
||||
|
||||
#### Automatic Account Creation on Exhaustion
|
||||
- **Goal**: Ensure the processing pipeline continues to run even if all accounts in the primary pool are temporarily banned or resting.
|
||||
- **How it works**: If the `auto_create_new_accounts_on_exhaustion` parameter is set to `True` and the account pool is defined using a prefix (not an explicit list), the system will generate a new, unique account ID when it finds the active pool empty.
|
||||
- **New Account Naming**: New accounts are created with the format `{prefix}-auto-{unique_id}`.
|
||||
- **Configuration**:
|
||||
- **Parameter**: `auto_create_new_accounts_on_exhaustion`
|
||||
- **Where to set**: In the `ytdlp_ops_orchestrator` DAG configuration when triggering a run.
|
||||
- **Default**: `True`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Failure Handling and Retry Policy
|
||||
|
||||
**Goal:** To provide flexible control over how the system behaves when a worker encounters a "bannable" error (e.g., `BOT_DETECTED`).
|
||||
|
||||
### How It Works
|
||||
When a worker's `get_token` task fails with a bannable error, the system's behavior is determined by the `on_bannable_failure` policy, which can be configured when starting the `ytdlp_ops_orchestrator`.
|
||||
|
||||
### Configuration
|
||||
- **Parameter**: `on_bannable_failure`
|
||||
- **Where to set**: In the `ytdlp_ops_orchestrator` DAG configuration.
|
||||
- **Options**:
|
||||
- `stop_loop` (Strictest):
|
||||
- The account used is banned.
|
||||
- The URL is marked as failed in the `_fail` Redis hash.
|
||||
- The worker's processing loop is **stopped**. The lane becomes inactive.
|
||||
- `retry_with_new_account` (Default, Most Resilient):
|
||||
- The failing account is banned.
|
||||
- The worker immediately retries the **same URL** with a new, unused account from the pool.
|
||||
- If the retry succeeds, the worker continues its loop to the next URL.
|
||||
- If the retry also fails, the second account **and the proxy** are also banned, and the worker's loop is stopped.
|
||||
- `retry_and_ban_account_only`:
|
||||
- Similar to `retry_with_new_account`, but on the second failure, it bans **only the second account**, not the proxy.
|
||||
- This is useful when you trust your proxies but want to aggressively cycle through failing accounts.
|
||||
- `retry_without_ban` (Most Lenient):
|
||||
- The worker retries with a new account, but **no accounts or proxies are ever banned**.
|
||||
- This policy is useful for debugging or when you are confident that failures are transient and not the fault of the resources.
|
||||
|
||||
This policy allows the system to be resilient to single account failures without losing the URL, while providing granular control over when to ban accounts and/or proxies if the problem persists.
|
||||
|
||||
---
|
||||
|
||||
## 7. Worker DAG Logic (`ytdlp_ops_worker_per_url`)
|
||||
|
||||
This DAG is the "workhorse" of the system. It is designed as a self-sustaining loop to process one URL per run. The logic for handling failures and retries is now explicitly visible in the DAG's task graph.
|
||||
|
||||
### Tasks and Their Purpose:
|
||||
|
||||
- **`pull_url_from_redis`**: Fetches one URL from the Redis `_inbox` queue. If the queue is empty, the DAG run is skipped, stopping this worker's processing "lane".
|
||||
- **`assign_account`**: Selects an account for the job. It maintains **account affinity** by re-using the same account from the previous successful run in its "lane". If it's the first run or the previous run failed, it picks a random active account.
|
||||
- **`get_token`**: The primary attempt to get tokens and `info.json` by calling the `ytdlp-ops-server`.
|
||||
- **`handle_bannable_error_branch`**: A branching task that runs if `get_token` fails. It inspects the error and decides the next step based on the `on_bannable_failure` policy.
|
||||
- **`ban_account_and_prepare_for_retry`**: If a retry is permitted, this task bans the failed account and selects a new one.
|
||||
- **`retry_get_token`**: A second attempt to get the token using the new account.
|
||||
- **`ban_second_account_and_proxy`**: If the retry also fails, this task bans the second account and the proxy that was used.
|
||||
- **`download_and_probe`**: If `get_token` or `retry_get_token` succeeds, this task uses `yt-dlp` to download the media and `ffmpeg` to verify that the downloaded file is a valid media file.
|
||||
- **`mark_url_as_success`**: If `download_and_probe` succeeds, this task records the successful result in the Redis `_result` hash.
|
||||
- **`handle_generic_failure`**: If any task fails non-recoverably, this task records the detailed error information in the Redis `_fail` hash.
|
||||
- **`decide_what_to_do_next`**: A final branching task that decides whether to continue the loop (`trigger_self_run`), stop it gracefully (`stop_loop`), or mark it as failed (`fail_loop`).
|
||||
- **`trigger_self_run`**: The task that actually triggers the next DAG run, creating the continuous loop.
|
||||
|
||||
---
|
||||
|
||||
## 8. Proxy State Lifecycle in Redis
|
||||
|
||||
This section details how a proxy's state (e.g., `ACTIVE`, `BANNED`) is managed and persisted in Redis. The system uses a "lazy initialization" pattern, meaning a proxy's state is only written to Redis when it is first needed.
|
||||
|
||||
### Step 1: Configuration and In-Memory Initialization
|
||||
The server first learns about the list of available proxies from its startup configuration, not from Redis.
|
||||
|
||||
1. **Source of Truth**: Proxies are defined in the `.env` file (e.g., `CAMOUFOX_PROXIES`, `SOCKS5_SOCK_SERVER_IP`).
|
||||
2. **Injection**: The `airflow/generate_envoy_config.py` script aggregates these into a single list, which is passed to the `ytdlp-ops-server` via the `--proxies` command-line argument during Docker Compose startup.
|
||||
3. **In-Memory State**: The `ProxyManager` in `server_fix/proxy_manager.py` receives this list and holds it in memory. At this point, Redis is not involved.
|
||||
|
||||
### Step 2: First Write to Redis (Lazy Initialization)
|
||||
A proxy's state is only persisted to Redis the first time it is actively managed or queried.
|
||||
|
||||
* **Trigger**: This typically happens on the first API call that requires proxy state, such as `getProxyStatus`.
|
||||
* **Action**: The `ProxyManager` checks Redis for a hash with the key `proxies:<server_identity>` (e.g., `proxies:ytdlp-ops-airflow-service`).
|
||||
* **Initialization**: If the key does not exist, the `ProxyManager` iterates through its in-memory list of proxies and writes each one to the Redis hash with a default state of `ACTIVE`.
|
||||
|
||||
### Step 3: Runtime Updates (Success and Failure)
|
||||
The proxy's state in Redis is updated in real-time based on the outcome of token generation tasks.
|
||||
|
||||
* **On Success**: When a task using a proxy succeeds, `ProxyManager.report_success()` is called. This updates the proxy's `success_count` and `last_success_timestamp` in the Redis hash.
|
||||
* **On Failure**: When a task fails, `ProxyManager.report_failure()` is called.
|
||||
1. A record of the failure (including the account ID and job ID) is added to a separate Redis sorted set with the key `proxy_failures:<proxy_url>`. This key has a TTL and is used for the sliding window ban strategy.
|
||||
2. The proxy's `failure_count` and `last_failure_timestamp` are updated in the main Redis hash.
|
||||
* **Automatic Ban**: If the conditions for the "Sliding Window" ban are met (too many failures from different accounts in a short time), `ProxyManager.ban_proxy()` is called, which updates the proxy's `status` to `BANNED` in the Redis hash.
|
||||
|
||||
### Step 4: Observation and Manual Control
|
||||
You can view and modify the proxy states stored in Redis using the provided management tools.
|
||||
|
||||
* **Observation**:
|
||||
* **Airflow DAG**: The `ytdlp_mgmt_proxy_account` DAG (`action: list_statuses`, `entity: proxy`).
|
||||
* **CLI Client**: The `proxy_manager_client.py` script (`list` command).
|
||||
* These tools call the `getProxyStatus` API endpoint, which reads directly from the `proxies:<server_identity>` hash in Redis.
|
||||
* **Manual Control**:
|
||||
* The same tools provide `ban`, `unban`, and `reset` actions.
|
||||
* These actions call API endpoints that directly modify the `status` field for a proxy in the `proxies:<server_identity>` Redis hash.
|
||||
* The `delete_from_redis` action in the DAG provides a way to completely remove a proxy's state and failure history from Redis, forcing it to be re-initialized as `ACTIVE` on its next use.
|
||||
|
||||
### Summary of Redis Keys
|
||||
|
||||
| Redis Key Pattern | Type | Purpose |
|
||||
| :--- | :--- | :--- |
|
||||
| `proxies:<server_identity>` | Hash | The primary store for proxy state. Maps `proxy_url` to a JSON string containing its status (`ACTIVE`/`BANNED`), success/failure counts, and timestamps. |
|
||||
| `proxy_failures:<proxy_url>` | Sorted Set | A temporary log of recent failures for a specific proxy, used by the sliding window ban logic. The score is the timestamp of the failure. |
|
||||
|
||||
@ -1,97 +0,0 @@
|
||||
# YTDLP Client Side Integration
|
||||
|
||||
This document describes how to integrate and use the YTDLP client with the token service.
|
||||
|
||||
## Build
|
||||
|
||||
1. **Pull, configure and start server if needed:**
|
||||
```bash
|
||||
cd /srv/airflow_worker/
|
||||
docker login pangramia # It used to be performed beforehand otherwise ask pull password
|
||||
docker compose -f docker-compose-ytdlp-ops.yaml up -d
|
||||
docker compose -f docker-compose-ytdlp-ops.yaml logs -f
|
||||
```
|
||||
The server is bound to a certain proxy, like "socks5://sslocal-rust-1084:1084".
|
||||
|
||||
Also check that redis in bind to 0.0.0.0 in config
|
||||
|
||||
2. **Build airflow-worker with custom dependencies:**
|
||||
```bash
|
||||
cd /srv/airflow_worker/
|
||||
docker compose build airflow-worker
|
||||
docker compose down airflow-worker
|
||||
docker compose up -d --no-deps airflow-worker
|
||||
```
|
||||
|
||||
3. **Test the built-in client:**
|
||||
```bash
|
||||
# Show client help
|
||||
docker compose exec airflow-worker python /app/ytdlp_ops_client.py --help
|
||||
|
||||
# Get token and info.json
|
||||
docker compose exec airflow-worker python /app/ytdlp_ops_client.py --host 16.162.82.212 --port 9080 getToken --url 'https://www.youtube.com/watch?v=vKTVLpmvznI'
|
||||
|
||||
# List formats using saved info.json
|
||||
docker compose exec airflow-worker yt-dlp --load-info-json "latest.json" -F
|
||||
|
||||
# Simulate download using saved info.json
|
||||
docker compose exec airflow-worker yt-dlp --load-info-json "latest.json" --proxy "socks5://89.253.221.173:1084" --simulate --verbose
|
||||
|
||||
# Extract metadata and download URLs using jq
|
||||
docker compose exec airflow-worker jq -r '"Title: \(.title)", "Date: \(.upload_date | strptime("%Y%m%d") | strftime("%Y-%m-%d"))", "Author: \(.uploader)", "Length: \(.duration_string)", "", "Download URLs:", (.formats[] | select(.vcodec != "none" or .acodec != "none") | .url)' latest.json
|
||||
```
|
||||
|
||||
4. **Test Airflow task:**
|
||||
|
||||
To run the `ytdlp_client_dag_v2.1` DAG:
|
||||
|
||||
Set up required Airflow variables
|
||||
```bash
|
||||
docker compose exec airflow-worker airflow variables set DOWNLOAD_OPTIONS '{"formats": ["bestvideo[height<=1080]+bestaudio/best[height<=1080]"]}'
|
||||
docker compose exec airflow-worker airflow variables set DOWNLOADS_TEMP '/opt/airflow/downloadfiles'
|
||||
docker compose exec airflow-worker airflow variables set DOWNLOADS_PATH '/opt/airflow/downloadfiles'
|
||||
|
||||
docker compose exec airflow-worker airflow variables list
|
||||
docker compose exec airflow-worker airflow variables set TOKEN_TIMEOUT '300'
|
||||
|
||||
docker compose exec airflow-worker airflow connections import /opt/airflow/config/docker_hub_repo.json
|
||||
docker compose exec airflow-worker airflow connections delete redis_default
|
||||
docker compose exec airflow-worker airflow connections import /opt/airflow/config/redis_default_conn.json
|
||||
```
|
||||
|
||||
|
||||
**Using direct connection with task test:**
|
||||
```bash
|
||||
docker compose exec airflow-worker airflow db reset
|
||||
docker compose exec airflow-worker airflow dags reserialize
|
||||
|
||||
docker compose exec airflow-worker airflow dags list
|
||||
docker compose exec airflow-worker airflow dags list-import-errors
|
||||
docker compose exec airflow-worker airflow tasks test ytdlp_client_dag_v2.1 get_token $(date -u +"%Y-%m-%dT%H:%M:%S+00:00") --task-params '{"url": "https://www.youtube.com/watch?v=sOlTX9uxUtM", "redis_enabled": false, "service_ip": "16.162.82.212", "service_port": 9080}'
|
||||
docker compose exec airflow-worker yt-dlp --load-info-json /opt/airflow/downloadfiles/latest.json --proxy "socks5://89.253.221.173:1084" --verbose --simulate
|
||||
|
||||
docker compose exec airflow-worker airflow dags list-runs -d ytdlp_client_dag
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
or deploy using trigger
|
||||
```bash
|
||||
docker compose exec airflow-worker airflow dags list
|
||||
docker compose exec airflow-worker airflow dags unpause ytdlp_client_dag_v2.1
|
||||
|
||||
// Try UI or recheck if works from server deploy
|
||||
docker compose exec airflow-worker airflow dags trigger ytdlp_client_dag_v2.1 -c '{"url": "https://www.youtube.com/watch?v=sOlTX9uxUtM", "redis_enabled": false, "service_ip": "16.162.82.212", "service_port": 9080}'
|
||||
|
||||
```
|
||||
|
||||
|
||||
Check Redis for stored data by videoID
|
||||
```bash
|
||||
docker compose exec redis redis-cli -a XXXXXX -h 89.253.221.173 -p 52909 HGETALL "token_info:sOlTX9uxUtM" | jq -R -s 'split("\n") | del(.[] | select(. == "")) | [.[range(0;length;2)]]'
|
||||
```
|
||||
|
||||
@ -1,93 +0,0 @@
|
||||
# Airflow DAGs Explanation
|
||||
|
||||
## ytdlp_ops_worker_per_url.py
|
||||
|
||||
This DAG processes a single YouTube URL passed via DAG run configuration. It's the "Worker" part of a Sensor/Worker pattern and uses the TaskFlow API to implement worker affinity, ensuring all tasks for a single URL run on the same machine.
|
||||
|
||||
### DAG Structure and Flow
|
||||
|
||||
**Legend:**
|
||||
|
||||
* `TaskName`: An Airflow task.
|
||||
* `-->`: Successful execution flow.
|
||||
* `--(fail)-->`: Execution flow triggered by the failure of the preceding task.
|
||||
* `--(success)-->`: Execution flow triggered only if the preceding task succeeds.
|
||||
* `[Group: GroupName]`: A TaskGroup containing sub-tasks.
|
||||
|
||||
**Execution Flow:**
|
||||
|
||||
1. **Start:** The DAG run is triggered (e.g., by the dispatcher).
|
||||
2. **`get_url_and_assign_account`**
|
||||
* Purpose: Gets the URL and assigns the first account.
|
||||
* Flow:
|
||||
* `--> get_token` (Success path)
|
||||
* `--(fail)--> handle_bannable_error_branch` (Failure path)
|
||||
|
||||
3. **`get_token`** (Initial attempt)
|
||||
* Purpose: Calls the Thrift service to get a token using the assigned account.
|
||||
* Flow:
|
||||
* `--(success)--> download_and_probe` (Success path, passed via `coalesce_token_data`)
|
||||
* `--(fail)--> handle_bannable_error_branch` (Failure path)
|
||||
|
||||
4. **`handle_bannable_error_branch`**
|
||||
* Purpose: Checks the error from `get_token` and decides the next step based on error type and policy.
|
||||
* Flow (Branches):
|
||||
* If bannable error & retry policy:
|
||||
* `--> [Group: ban_account_and_prepare_for_retry]`
|
||||
* `--> check_sliding_window_for_ban`
|
||||
* `--> ban_account_task` (if ban criteria met)
|
||||
* `--> skip_ban_task` (if ban criteria not met)
|
||||
* `--> assign_new_account_for_retry` (after group)
|
||||
* `--> retry_get_token` (using new account)
|
||||
* If bannable error & stop policy:
|
||||
* `--> ban_and_fail` (Bans account and fails DAG)
|
||||
* If connection error & retry policy:
|
||||
* `--> assign_new_account_for_retry`
|
||||
* `--> retry_get_token`
|
||||
* If non-bannable/connection error:
|
||||
* (No specific path defined, DAG likely fails)
|
||||
|
||||
5. **`retry_get_token`**
|
||||
* Purpose: Calls the Thrift service again using the new account.
|
||||
* Flow:
|
||||
* `--(success)--> download_and_probe` (Success path, passed via `coalesce_token_data`)
|
||||
* `--(fail)--> handle_generic_failure` (Failure path)
|
||||
|
||||
6. **`coalesce_token_data`**
|
||||
* Purpose: Selects the successful token data from either the initial or retry attempt.
|
||||
* Flow:
|
||||
* `--> download_and_probe` (Success path)
|
||||
|
||||
7. **`download_and_probe`**
|
||||
* Purpose: Uses the token data to download the media file and probes it with ffmpeg.
|
||||
* Flow:
|
||||
* `--(success)--> mark_url_as_success` (Success path)
|
||||
* `--(fail)--> handle_generic_failure` (Failure path)
|
||||
|
||||
8. **`mark_url_as_success`**
|
||||
* Purpose: Records the successful processing result.
|
||||
* Flow:
|
||||
* `--(success)--> continue_processing_loop` (Success path)
|
||||
* `--(fail)--> handle_generic_failure` (Failure path)
|
||||
|
||||
9. **`continue_processing_loop`**
|
||||
* Purpose: Triggers a new run of the dispatcher DAG.
|
||||
* Flow:
|
||||
* (End of this DAG run)
|
||||
|
||||
10. **`handle_generic_failure`**
|
||||
* Purpose: Catches any unhandled failures and marks the DAG run as failed.
|
||||
* Flow:
|
||||
* (End of this DAG run, marked as failed)
|
||||
|
||||
### Purpose of Orchestrator and Dispatcher
|
||||
|
||||
The system uses separate orchestrator and dispatcher components for several key reasons:
|
||||
|
||||
1. **Worker Affinity/Pinning:** One of the main reasons is to ensure that all tasks related to processing a single URL run on the same worker machine. This is crucial because the `get_token` task generates an `info.json` file that contains session-specific data (like cookies and tokens). The subsequent `download_and_probe` task needs to use this exact `info.json` file. By using a dedicated worker DAG (`ytdlp_ops_worker_per_url.py`) with worker affinity, we guarantee that the file system where `info.json` is stored is accessible to both tasks.
|
||||
|
||||
2. **Scalability and Load Distribution:** The dispatcher can monitor queues or sources of URLs and trigger individual worker DAG runs. This decouples the discovery of work from the execution of work, allowing for better scaling and management of processing load across multiple workers.
|
||||
|
||||
3. **Fault Isolation:** If processing a single URL fails, it only affects that specific worker DAG run, not the entire pipeline. The dispatcher can continue to trigger other worker runs for other URLs.
|
||||
|
||||
4. **Flexibility:** The orchestrator/dispatcher pattern allows for more complex scheduling, prioritization, and routing logic to be implemented in the dispatcher, while keeping the worker DAG focused on the core processing steps for a single unit of work.
|
||||
@ -1,144 +0,0 @@
|
||||
# Airflow remote DL worker configuration.
|
||||
# This file should be used on a remote machine to run a download worker.
|
||||
# It requires a master Airflow instance running with services exposed.
|
||||
#
|
||||
# Before running, create a .env file in this directory with:
|
||||
# MASTER_HOST_IP=... a.b.c.d ... # IP address of the machine running docker-compose-master.yaml
|
||||
# POSTGRES_PASSWORD=... # The password for the PostgreSQL database from the master compose file
|
||||
# REDIS_PASSWORD=... # The password for Redis from the master compose file
|
||||
# AIRFLOW_UID=... # User ID for file permissions, should match master
|
||||
---
|
||||
x-airflow-common:
|
||||
&airflow-common
|
||||
# This should point to the same image used by the master.
|
||||
# If you built a custom image for master, you need to push it to a registry
|
||||
# and reference it here.
|
||||
image: ${AIRFLOW_IMAGE_NAME:-pangramia/ytdlp-ops-airflow:latest}
|
||||
build: .
|
||||
# Add extra hosts here to allow workers to resolve other hosts by name.
|
||||
# This section is auto-generated from cluster.yml
|
||||
extra_hosts:
|
||||
{% for host_name, host_ip in all_hosts.items() %}
|
||||
- "{{ host_name }}:{{ host_ip }}"
|
||||
{% endfor %}
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
&airflow-common-env
|
||||
# Airflow Core
|
||||
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
|
||||
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
|
||||
AIRFLOW__CORE__FERNET_KEY: '' # Should be same as master, but worker does not need it.
|
||||
|
||||
# Backend connections - These should point to the master node
|
||||
# Set MASTER_HOST_IP, POSTGRES_PASSWORD, and REDIS_PASSWORD in your .env file
|
||||
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}@${MASTER_HOST_IP}:5432/airflow
|
||||
AIRFLOW__CELERY__BROKER_URL: redis://:${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT}@${MASTER_HOST_IP}:52909/0
|
||||
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}@${MASTER_HOST_IP}:5432/airflow
|
||||
|
||||
# Remote Logging - connection is fetched from DB, which is on master
|
||||
AIRFLOW__LOGGING__REMOTE_LOGGING: "True"
|
||||
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://airflow-logs"
|
||||
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: minio_default
|
||||
AIRFLOW__LOGGING__ENCRYPT_S3_LOGS: "False"
|
||||
AIRFLOW__WEBSERVER__SECRET_KEY: 'qmALu5JCAW0518WGAqkVZQ=='
|
||||
AIRFLOW__CORE__INTERNAL_API_SECRET_KEY: 'qmALu5JCAW0518WGAqkVZQ=='
|
||||
AIRFLOW__CORE__LOCAL_SETTINGS_PATH: "/opt/airflow/config/custom_task_hooks.py"
|
||||
|
||||
volumes:
|
||||
# Mount dags to get any utility scripts, but the worker will pull the DAG from the DB
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
|
||||
# Mount logs locally in case remote logging fails
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
|
||||
# Mount config for local settings and other configurations
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
|
||||
# Mount download directories
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/downloadfiles:/opt/airflow/downloadfiles
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/addfiles:/opt/airflow/addfiles
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/inputfiles:/opt/airflow/inputfiles
|
||||
# Use AIRFLOW_UID and AIRFLOW_GID from .env file to fix permission issues.
|
||||
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
|
||||
|
||||
services:
|
||||
airflow-worker:
|
||||
<<: *airflow-common
|
||||
container_name: airflow-dl-worker-1
|
||||
hostname: ${HOSTNAME:-dl001}
|
||||
# The worker now listens on the generic queue AND its own dedicated queue.
|
||||
# The hostname is dynamically inserted into the queue name.
|
||||
command: airflow celery worker -q queue-dl,queue-dl-${HOSTNAME:-dl001}
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
# Increased from 4G to 8G to support higher memory per child process.
|
||||
memory: ${AIRFLOW_WORKER_DOWNLOAD_MEM_LIMIT:-8G}
|
||||
reservations:
|
||||
memory: ${AIRFLOW_WORKER_DOWNLOAD_MEM_RESERV:-2G}
|
||||
healthcheck:
|
||||
test:
|
||||
- "CMD-SHELL"
|
||||
- 'celery --app airflow.providers.celery.executors.celery_executor.app inspect ping -d "worker-dl@$$(hostname)"'
|
||||
interval: 30s
|
||||
timeout: 30s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
environment:
|
||||
<<: *airflow-common-env
|
||||
HOSTNAME: ${HOSTNAME:-dl001} # Explicitly set inside container
|
||||
DUMB_INIT_SETSID: "0"
|
||||
AIRFLOW__CELERY__WORKER_QUEUES: "queue-dl,queue-dl-${HOSTNAME:-dl001}"
|
||||
AIRFLOW__CELERY__WORKER_TAGS: "dl"
|
||||
AIRFLOW__CELERY__WORKER_PREFETCH_MULTIPLIER: "1"
|
||||
AIRFLOW__CELERY__WORKER_CONCURRENCY: ${AIRFLOW_WORKER_DOWNLOAD_CONCURRENCY:-16}
|
||||
AIRFLOW__CELERY__TASK_ACKS_LATE: "False"
|
||||
AIRFLOW__CELERY__OPERATION_TIMEOUT: "2.0"
|
||||
AIRFLOW__CELERY__WORKER_NAME: "worker-dl@%h"
|
||||
AIRFLOW__CELERY__WORKER_MAX_TASKS_PER_CHILD: "100"
|
||||
# Increased from 256MB to 512MB for memory-intensive yt-dlp tasks.
|
||||
# This value is in KB. 512 * 1024 = 524288.
|
||||
AIRFLOW__CELERY__WORKER_MAX_MEMORY_PER_CHILD: "524288" # 512MB
|
||||
# The hostname is now managed by Docker Compose to ensure uniqueness when scaling.
|
||||
# It will be generated based on project, service, and replica number (e.g., airflow-airflow-dl-worker-1).
|
||||
# hostname: "dl-worker-${HOSTNAME_SUFFIX:-$$(hostname)}"
|
||||
ports:
|
||||
- "8793:8793"
|
||||
networks:
|
||||
- default
|
||||
- proxynet
|
||||
restart: always
|
||||
|
||||
airflow-triggerer:
|
||||
<<: *airflow-common
|
||||
container_name: airflow-dl-triggerer-1
|
||||
hostname: ${HOSTNAME}
|
||||
command: triggerer
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
|
||||
interval: 30s
|
||||
timeout: 30s
|
||||
retries: 5
|
||||
start_period: 60s
|
||||
environment:
|
||||
<<: *airflow-common-env
|
||||
PYTHONASYNCIODEBUG: 1
|
||||
DUMB_INIT_SETSID: 0
|
||||
restart: always
|
||||
|
||||
docker-socket-proxy:
|
||||
profiles:
|
||||
- disabled
|
||||
image: tecnativa/docker-socket-proxy:0.1.1
|
||||
environment:
|
||||
CONTAINERS: 1
|
||||
IMAGES: 1
|
||||
AUTH: 1
|
||||
POST: 1
|
||||
privileged: true
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||
restart: always
|
||||
|
||||
networks:
|
||||
proxynet:
|
||||
name: airflow_proxynet
|
||||
external: true
|
||||
@ -1,534 +0,0 @@
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
#
|
||||
|
||||
# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
|
||||
#
|
||||
# WARNING: This configuration is for local development. Do not use it in a production deployment.
|
||||
#
|
||||
# This configuration supports basic configuration using environment variables or an .env file
|
||||
# The following variables are supported:
|
||||
#
|
||||
# AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow.
|
||||
# Default: apache/airflow:2.10.5
|
||||
# AIRFLOW_UID - User ID in Airflow containers
|
||||
# Default: 50000
|
||||
# AIRFLOW_PROJ_DIR - Base path to which all the files will be volumed.
|
||||
# Default: .
|
||||
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
|
||||
#
|
||||
# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested).
|
||||
# Default: airflow
|
||||
# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested).
|
||||
# Default: airflow
|
||||
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
|
||||
# Use this option ONLY for quick checks. Installing requirements at container
|
||||
# startup is done EVERY TIME the service is started.
|
||||
# A better way is to build a custom image or extend the official image
|
||||
# as described in https://airflow.apache.org/docs/docker-stack/build.html.
|
||||
# Default: ''
|
||||
#
|
||||
# Feel free to modify this file to suit your needs.
|
||||
---
|
||||
name: airflow-master
|
||||
x-minio-common: &minio-common
|
||||
image: quay.io/minio/minio:RELEASE.2025-07-23T15-54-02Z
|
||||
command: server --console-address ":9001" http://minio{1...3}/data{1...2}
|
||||
expose:
|
||||
- "9000"
|
||||
- "9001"
|
||||
networks:
|
||||
- proxynet
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-admin}
|
||||
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-0153093693-0009}
|
||||
healthcheck:
|
||||
test: ["CMD", "mc", "ready", "local"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
restart: always
|
||||
|
||||
x-airflow-common:
|
||||
&airflow-common
|
||||
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
|
||||
# This will build the image from the Dockerfile in this directory and tag it.
|
||||
image: ${AIRFLOW_IMAGE_NAME:-pangramia/ytdlp-ops-airflow:latest}
|
||||
build: .
|
||||
# Add extra hosts here to allow the master services (webserver, scheduler) to resolve
|
||||
# the hostnames of your remote DL workers. This is crucial for fetching logs.
|
||||
# Format: - "hostname:ip_address"
|
||||
# IMPORTANT: This section is auto-generated from cluster.yml
|
||||
extra_hosts:
|
||||
|
||||
- "af-test:89.253.223.97"
|
||||
|
||||
- "dl001:109.107.189.106"
|
||||
|
||||
env_file:
|
||||
- .env
|
||||
networks:
|
||||
- proxynet
|
||||
environment:
|
||||
&airflow-common-env
|
||||
AIRFLOW__CORE__PARALLELISM: 64
|
||||
AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 32
|
||||
AIRFLOW__SCHEDULER__PARSING_PROCESSES: 4
|
||||
|
||||
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
|
||||
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}@postgres/airflow
|
||||
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}@postgres/airflow
|
||||
AIRFLOW__CELERY__BROKER_URL: redis://:${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT}@redis:6379/0
|
||||
AIRFLOW__CORE__FERNET_KEY: ''
|
||||
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
|
||||
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
|
||||
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
|
||||
AIRFLOW_CONFIG: '/opt/airflow/config/airflow.cfg'
|
||||
AIRFLOW__WEBSERVER__SECRET_KEY: 'qmALu5JCAW0518WGAqkVZQ=='
|
||||
AIRFLOW__CORE__INTERNAL_API_SECRET_KEY: 'qmALu5JCAW0518WGAqkVZQ=='
|
||||
# yamllint disable rule:line-length
|
||||
# Use simple http server on scheduler for health checks
|
||||
# See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server
|
||||
# yamllint enable rule:line-length
|
||||
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
|
||||
# WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
|
||||
# for other purpose (development, test and especially production usage) build/extend Airflow image.
|
||||
#_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- apache-airflow-providers-docker apache-airflow-providers-http thrift>=0.16.0,<=0.20.0 backoff>=2.2.1 python-dotenv==1.0.1 psutil>=5.9.0} # The following line can be used to set a custom config file, stored in the local config folder
|
||||
# If you want to use it, outcomment it and replace airflow.cfg with the name of your config file
|
||||
AIRFLOW__LOGGING__REMOTE_LOGGING: "True"
|
||||
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://airflow-logs"
|
||||
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: minio_default
|
||||
AIRFLOW__LOGGING__ENCRYPT_S3_LOGS: "False"
|
||||
|
||||
AIRFLOW__LOGGING__REMOTE_LOG_FORMAT: "[%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s"
|
||||
AIRFLOW__LOGGING__LOG_LEVEL: "INFO"
|
||||
AIRFLOW__LOGGING__LOG_FILENAME_TEMPLATE: "{{ ti.dag_id }}/{{ ti.run_id }}/{{ ti.task_id }}/attempt={{ try_number }}.log"
|
||||
|
||||
AIRFLOW__CORE__LOCAL_SETTINGS_PATH: "/opt/airflow/config/custom_task_hooks.py"
|
||||
volumes:
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/downloadfiles:/opt/airflow/downloadfiles
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/addfiles:/opt/airflow/addfiles
|
||||
- ${AIRFLOW_PROJ_DIR:-.}/inputfiles:/opt/airflow/inputfiles
|
||||
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
|
||||
depends_on:
|
||||
&airflow-common-depends-on
|
||||
redis:
|
||||
condition: service_healthy
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
nginx-minio-lb:
|
||||
condition: service_healthy
|
||||
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:13
|
||||
env_file:
|
||||
- .env
|
||||
networks:
|
||||
- proxynet
|
||||
environment:
|
||||
POSTGRES_USER: airflow
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}
|
||||
POSTGRES_DB: airflow
|
||||
volumes:
|
||||
- postgres-db-volume:/var/lib/postgresql/data
|
||||
ports:
|
||||
- "5432:5432"
|
||||
healthcheck:
|
||||
test: ["CMD", "pg_isready", "-U", "airflow"]
|
||||
interval: 10s
|
||||
retries: 5
|
||||
start_period: 5s
|
||||
restart: always
|
||||
|
||||
redis:
|
||||
# Redis is limited to 7.2-bookworm due to licencing change
|
||||
# https://redis.io/blog/redis-adopts-dual-source-available-licensing/
|
||||
image: redis:7.2-bookworm
|
||||
env_file:
|
||||
- .env
|
||||
networks:
|
||||
- proxynet
|
||||
command: sh -c "redis-server --requirepass ${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} --bind 0.0.0.0 --save 60 1 --loglevel warning --appendonly yes"
|
||||
volumes:
|
||||
- ./redis-data:/data
|
||||
expose:
|
||||
- 6379
|
||||
ports:
|
||||
- "52909:6379"
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT}", "ping"]
|
||||
interval: 10s
|
||||
timeout: 30s
|
||||
retries: 50
|
||||
start_period: 30s
|
||||
restart: always
|
||||
|
||||
redis-proxy-account-clear:
|
||||
image: redis:7.2-bookworm
|
||||
container_name: redis-proxy-account-clear
|
||||
env_file:
|
||||
- .env
|
||||
networks:
|
||||
- proxynet
|
||||
command: >
|
||||
sh -c "
|
||||
echo 'Clearing proxy and account statuses from Redis...';
|
||||
redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} --scan --pattern 'proxy_status:*' | xargs -r redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} DEL;
|
||||
redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} --scan --pattern 'account_status:*' | xargs -r redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} DEL;
|
||||
echo 'Redis cleanup complete.'
|
||||
"
|
||||
depends_on:
|
||||
redis:
|
||||
condition: service_healthy
|
||||
|
||||
minio1:
|
||||
<<: *minio-common
|
||||
hostname: minio1
|
||||
volumes:
|
||||
- ./minio-data/1/1:/data1
|
||||
- ./minio-data/1/2:/data2
|
||||
|
||||
minio2:
|
||||
<<: *minio-common
|
||||
hostname: minio2
|
||||
volumes:
|
||||
- ./minio-data/2/1:/data1
|
||||
- ./minio-data/2/2:/data2
|
||||
depends_on:
|
||||
minio1:
|
||||
condition: service_started
|
||||
|
||||
minio3:
|
||||
<<: *minio-common
|
||||
hostname: minio3
|
||||
volumes:
|
||||
- ./minio-data/3/1:/data1
|
||||
- ./minio-data/3/2:/data2
|
||||
depends_on:
|
||||
minio2:
|
||||
condition: service_started
|
||||
|
||||
nginx-minio-lb:
|
||||
image: nginx:1.19.2-alpine
|
||||
hostname: nginx-minio-lb
|
||||
networks:
|
||||
- proxynet
|
||||
command: sh -c "apk add --no-cache curl >/dev/null 2>&1 && exec nginx -g 'daemon off;'"
|
||||
volumes:
|
||||
- ./nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
ports:
|
||||
- "9000:9000"
|
||||
- "9001:9001"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:9001/minio/health/live"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 10s
|
||||
depends_on:
|
||||
minio1:
|
||||
condition: service_healthy
|
||||
minio2:
|
||||
condition: service_healthy
|
||||
minio3:
|
||||
condition: service_healthy
|
||||
restart: always
|
||||
|
||||
minio-init:
|
||||
image: minio/mc
|
||||
container_name: minio-init
|
||||
networks:
|
||||
- proxynet
|
||||
depends_on:
|
||||
nginx-minio-lb:
|
||||
condition: service_healthy
|
||||
entrypoint: >
|
||||
/bin/sh -c "
|
||||
set -e;
|
||||
/usr/bin/mc alias set minio http://nginx-minio-lb:9000 $$MINIO_ROOT_USER $$MINIO_ROOT_PASSWORD;
|
||||
# Retry loop for bucket creation
|
||||
MAX_ATTEMPTS=10
|
||||
SUCCESS=false
|
||||
# Use a for loop for robustness, as it's generally more portable than `until`.
|
||||
for i in $$(seq 1 $$MAX_ATTEMPTS); do
|
||||
# Check if the bucket exists. If so, we're done.
|
||||
if /usr/bin/mc ls minio/airflow-logs > /dev/null 2>&1; then
|
||||
echo 'MinIO bucket already exists.'
|
||||
SUCCESS=true
|
||||
break
|
||||
fi
|
||||
# If not, try to create it. If successful, we're done.
|
||||
# We redirect output because `mc mb` can error if another process creates it in the meantime.
|
||||
if /usr/bin/mc mb minio/airflow-logs > /dev/null 2>&1; then
|
||||
echo 'MinIO bucket created.'
|
||||
SUCCESS=true
|
||||
break
|
||||
fi
|
||||
# If we reach here, both checks failed. Wait and retry.
|
||||
echo "Attempt $$i/$$MAX_ATTEMPTS: Waiting for MinIO bucket..."
|
||||
sleep 2
|
||||
done
|
||||
|
||||
# After the loop, check if we succeeded.
|
||||
if [ "$$SUCCESS" = "false" ]; then
|
||||
echo "Failed to create MinIO bucket after $$MAX_ATTEMPTS attempts."
|
||||
exit 1
|
||||
fi
|
||||
/usr/bin/mc anonymous set download minio/airflow-logs;
|
||||
echo 'MinIO initialized: bucket airflow-logs created and policy set to download.';
|
||||
"
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-admin}
|
||||
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-0153093693-0009}
|
||||
restart: on-failure
|
||||
|
||||
nginx-healthcheck:
|
||||
image: nginx:alpine
|
||||
container_name: nginx-healthcheck
|
||||
networks:
|
||||
- proxynet
|
||||
ports:
|
||||
- "8888:80"
|
||||
restart: always
|
||||
|
||||
airflow-webserver:
|
||||
<<: *airflow-common
|
||||
command: webserver
|
||||
ports:
|
||||
- "8080:8080"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
|
||||
airflow-scheduler:
|
||||
<<: *airflow-common
|
||||
command: scheduler
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
|
||||
|
||||
|
||||
airflow-master-worker:
|
||||
<<: *airflow-common
|
||||
command: airflow celery worker -q main,default
|
||||
healthcheck:
|
||||
# yamllint disable rule:line-length
|
||||
test:
|
||||
- "CMD-SHELL"
|
||||
- 'celery --app airflow.providers.celery.executors.celery_executor.app inspect ping -d "worker-master@$$(hostname)"'
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
environment:
|
||||
<<: *airflow-common-env
|
||||
# Required to handle warm shutdown of the celery workers properly
|
||||
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
|
||||
DUMB_INIT_SETSID: 0
|
||||
AIRFLOW__CELERY__WORKER_QUEUES: "main,default"
|
||||
AIRFLOW__CELERY__WORKER_TAGS: "master"
|
||||
AIRFLOW__CELERY__WORKER_CONCURRENCY: "16"
|
||||
AIRFLOW__CELERY__WORKER_PREFETCH_MULTIPLIER: "1"
|
||||
AIRFLOW__CELERY__TASK_ACKS_LATE: "False"
|
||||
AIRFLOW__CELERY__OPERATION_TIMEOUT: "2.0"
|
||||
AIRFLOW__CELERY__WORKER_NAME: "worker-master@%h"
|
||||
AIRFLOW__CELERY__WORKER_MAX_TASKS_PER_CHILD: "100"
|
||||
# Max memory per child process before it's recycled. Helps prevent memory leaks.
|
||||
# 256MB is sufficient for master worker tasks. DL workers use a higher limit.
|
||||
AIRFLOW__CELERY__WORKER_MAX_MEMORY_PER_CHILD: "262144" # 256MB
|
||||
|
||||
hostname: ${HOSTNAME}
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
|
||||
airflow-triggerer:
|
||||
<<: *airflow-common
|
||||
command: triggerer
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
|
||||
airflow-init:
|
||||
<<: *airflow-common
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
minio-init:
|
||||
condition: service_completed_successfully
|
||||
redis-proxy-account-clear:
|
||||
condition: service_completed_successfully
|
||||
entrypoint: /bin/bash
|
||||
# yamllint disable rule:line-length
|
||||
command:
|
||||
- -c
|
||||
- |
|
||||
# This container runs as root and is responsible for initializing the environment.
|
||||
# It sets permissions on mounted directories to ensure the 'airflow' user (running with AIRFLOW_UID)
|
||||
# can write to them. This is crucial for logs, dags, and plugins.
|
||||
echo "Initializing permissions for Airflow directories..."
|
||||
chown -R "${AIRFLOW_UID}:${AIRFLOW_GID}" /opt/airflow/dags /opt/airflow/logs /opt/airflow/plugins /opt/airflow/config /opt/airflow/downloadfiles /opt/airflow/addfiles /opt/airflow/inputfiles
|
||||
echo "Permissions set."
|
||||
if [[ -z "${AIRFLOW_UID}" ]]; then
|
||||
echo
|
||||
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
|
||||
echo "If you are on Linux, you SHOULD follow the instructions below to set "
|
||||
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
|
||||
echo "For other operating systems you can get rid of the warning with manually created .env file:"
|
||||
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
|
||||
echo
|
||||
fi
|
||||
# This container's job is to initialize the database, create a user, and import connections.
|
||||
# Wait for db to be ready.
|
||||
airflow db check --retry 30 --retry-delay 5
|
||||
|
||||
# Run database migrations.
|
||||
echo "Running database migrations..."
|
||||
airflow db upgrade
|
||||
echo "Database migrations complete."
|
||||
|
||||
# Create the admin user if it doesn't exist.
|
||||
# The '|| true' prevents the script from failing if the user already exists.
|
||||
echo "Checking for and creating admin user..."
|
||||
airflow users create \
|
||||
--username "admin" \
|
||||
--password "${AIRFLOW_ADMIN_PASSWORD:-admin_pwd_X9yZ3aB1cE5dF7gH}" \
|
||||
--firstname Admin \
|
||||
--lastname User \
|
||||
--role Admin \
|
||||
--email admin@example.com || true
|
||||
echo "Admin user check/creation complete."
|
||||
|
||||
# Import connections from any .json file in the config directory.
|
||||
echo "Searching for connection files in /opt/airflow/config..."
|
||||
if [ -d "/opt/airflow/config" ] && [ -n "$(ls -A /opt/airflow/config/*.json 2>/dev/null)" ]; then
|
||||
for conn_file in /opt/airflow/config/*.json; do
|
||||
if [ -f "$$conn_file" ]; then
|
||||
# Exclude files that are not meant to be Airflow connections.
|
||||
if [ "$(basename "$$conn_file")" = "camoufox_endpoints.json" ]; then
|
||||
echo "Skipping '$$conn_file' as it is not an Airflow connection file."
|
||||
continue
|
||||
fi
|
||||
echo "Importing connections from $$conn_file"
|
||||
airflow connections import "$$conn_file" || echo "Failed to import $$conn_file, but continuing."
|
||||
fi
|
||||
done
|
||||
else
|
||||
echo "No connection files found to import, or /opt/airflow/config is empty/missing."
|
||||
fi
|
||||
echo "Connection import process complete."
|
||||
# yamllint enable rule:line-length
|
||||
environment:
|
||||
<<: *airflow-common-env
|
||||
_AIRFLOW_DB_MIGRATE: 'true'
|
||||
_AIRFLOW_WWW_USER_CREATE: 'false' # Set to false as we handle it manually
|
||||
_PIP_ADDITIONAL_REQUIREMENTS: ''
|
||||
user: "0:0"
|
||||
|
||||
airflow-cli:
|
||||
<<: *airflow-common
|
||||
profiles:
|
||||
- debug
|
||||
environment:
|
||||
<<: *airflow-common-env
|
||||
CONNECTION_CHECK_MAX_COUNT: "0"
|
||||
# Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
|
||||
command:
|
||||
- bash
|
||||
- -c
|
||||
- airflow
|
||||
|
||||
# You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
|
||||
# or by explicitly targeted on the command line e.g. docker-compose up flower.
|
||||
# See: https://docs.docker.com/compose/profiles/
|
||||
flower:
|
||||
<<: *airflow-common
|
||||
command: celery flower
|
||||
ports:
|
||||
- "5555:5555"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: always
|
||||
depends_on:
|
||||
<<: *airflow-common-depends-on
|
||||
airflow-init:
|
||||
condition: service_completed_successfully
|
||||
|
||||
|
||||
|
||||
docker-socket-proxy:
|
||||
profiles:
|
||||
- disabled
|
||||
image: tecnativa/docker-socket-proxy:0.1.1
|
||||
networks:
|
||||
- proxynet
|
||||
environment:
|
||||
CONTAINERS: 1
|
||||
IMAGES: 1
|
||||
AUTH: 1
|
||||
POST: 1
|
||||
privileged: true
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||
restart: always
|
||||
|
||||
volumes:
|
||||
postgres-db-volume:
|
||||
|
||||
networks:
|
||||
proxynet:
|
||||
name: airflow_proxynet
|
||||
external: true
|
||||
@ -1,96 +0,0 @@
|
||||
name: ytdlp-ops
|
||||
include:
|
||||
# This automatically includes the generated camoufox service definitions and dependencies.
|
||||
# It simplifies the docker-compose command, as you no longer need to specify both files with -f.
|
||||
# The file is generated by the config-generator service and will be created even if empty.
|
||||
- docker-compose.camoufox.yaml
|
||||
|
||||
services:
|
||||
envoy:
|
||||
image: envoyproxy/envoy:v1.29-latest
|
||||
container_name: envoy-thrift-lb
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
# Mount the generated config file from the host
|
||||
- ./envoy.yaml:/etc/envoy/envoy.yaml:ro
|
||||
ports:
|
||||
# This is the single public port for all Thrift traffic
|
||||
- "${ENVOY_PORT:-9080}:${ENVOY_PORT:-9080}"
|
||||
# Expose the admin port for debugging
|
||||
- "${ENVOY_ADMIN_PORT:-9901}:${ENVOY_ADMIN_PORT:-9901}"
|
||||
networks:
|
||||
- proxynet
|
||||
# This service depends on ytdlp-ops-service, which in turn waits for camoufox.
|
||||
depends_on:
|
||||
- ytdlp-ops-service
|
||||
|
||||
ytdlp-ops-service:
|
||||
image: pangramia/ytdlp-ops-server:latest # Don't comment out or remove, build is performed externally
|
||||
# container_name is omitted; Docker will use the service name for DNS.
|
||||
# This service depends on the 'camoufox-group' service, which is defined in the
|
||||
# generated docker-compose.camoufox.yaml file. This ensures all camoufox
|
||||
# instances are started before this service.
|
||||
depends_on:
|
||||
- camoufox-group
|
||||
# Ports are no longer exposed directly. Envoy will connect to them on the internal network.
|
||||
env_file:
|
||||
- ./.env # Path is relative to the compose file
|
||||
volumes:
|
||||
- context-data:/app/context-data
|
||||
# Mount the generated config directory to make endpoints available to the server
|
||||
- ./config:/app/config:ro
|
||||
# Mount the plugin source code for live updates without rebuilding the image.
|
||||
# Assumes the plugin source is in a 'bgutil-ytdlp-pot-provider' directory
|
||||
# next to your docker-compose.yaml file.
|
||||
#- ./bgutil-ytdlp-pot-provider:/app/bgutil-ytdlp-pot-provider
|
||||
networks:
|
||||
- proxynet
|
||||
command:
|
||||
# --- Parameters for ALL service roles ---
|
||||
- "--port"
|
||||
- "${YTDLP_BASE_PORT:-9090}"
|
||||
- "--timeout"
|
||||
- "${YTDLP_TIMEOUT:-600}"
|
||||
- "--workers"
|
||||
- "${YTDLP_WORKERS:-3}"
|
||||
- "--verbose"
|
||||
- "--server-identity"
|
||||
- "${SERVER_IDENTITY:-ytdlp-ops-airflow-service}"
|
||||
- "--redis-host"
|
||||
- "${REDIS_HOST:-redis}"
|
||||
- "--redis-port"
|
||||
- "${REDIS_PORT:-6379}"
|
||||
- "--redis-password"
|
||||
- "${REDIS_PASSWORD}"
|
||||
- "--account-active-duration-min"
|
||||
- "${ACCOUNT_ACTIVE_DURATION_MIN:-30}"
|
||||
- "--account-cooldown-duration-min"
|
||||
- "${ACCOUNT_COOLDOWN_DURATION_MIN:-60}"
|
||||
- "--service-role"
|
||||
- "all-in-one"
|
||||
|
||||
# --- Parameters for worker/all-in-one roles ONLY ---
|
||||
- "--script-dir"
|
||||
- "/app"
|
||||
- "--context-dir"
|
||||
- "/app/context-data"
|
||||
- "--clean-context-dir"
|
||||
- "--clients"
|
||||
- "${YT_CLIENTS:-web,mweb,ios,android}"
|
||||
- "--proxies"
|
||||
- "socks5://172.17.0.1:1087"
|
||||
- "--camoufox-endpoints-file"
|
||||
- "/app/config/camoufox_endpoints.json"
|
||||
- "--print-tokens"
|
||||
- "--stop-if-no-proxy"
|
||||
restart: unless-stopped
|
||||
pull_policy: always
|
||||
|
||||
volumes:
|
||||
context-data:
|
||||
name: context-data
|
||||
|
||||
networks:
|
||||
proxynet:
|
||||
name: airflow_proxynet
|
||||
external: true
|
||||
Loading…
x
Reference in New Issue
Block a user