23 KiB
Proxy and Account Management Strategy
This document describes the intelligent resource management strategy (for proxies and accounts) used by the ytdlp-ops-server. The goal of this system is to maximize the success rate, minimize blocks, and ensure fault tolerance.
The server can run in different roles to support a distributed architecture, separating management tasks from token generation work.
Service Roles and Architecture
The server is designed to run in one of three roles, specified by the --service-role flag:
-
management: A single, lightweight service instance responsible for all management API calls.- Purpose: Provides a centralized endpoint for monitoring and managing the state of all proxies and accounts across the system.
- Behavior: Exposes only management functions (
getProxyStatus,banAccount, etc.). Calls to token generation functions will fail. - Deployment: Runs as a single container (
ytdlp-ops-management) and exposes its port directly to the host (e.g., port9091), bypassing Envoy.
-
worker: The primary workhorse for token andinfo.jsongeneration.- Purpose: Handles all token generation requests.
- Behavior: Implements the full API, but its management functions are scoped to its own
server_identity. - Deployment: Runs as a scalable service (
ytdlp-ops-worker) behind the Envoy load balancer (e.g., port9080).
-
all-in-one(Default): A single instance that performs both management and worker roles. Ideal for local development or small-scale deployments.
This architecture allows for a robust, federated system where workers manage their own resources locally, while a central service provides a global view for management and monitoring.
1. Account Lifecycle Management (Cooldown / Resting)
Goal: To prevent excessive use and subsequent blocking of accounts by providing them with "rest" periods after intensive work.
How It Works:
The account lifecycle consists of three states:
ACTIVE: The account is active and used for tasks. An activity timer starts on its first successful use.RESTING: If an account has beenACTIVEfor longer than the configured limit, theAccountManagerautomatically moves it to a "resting" state. The Airflow worker will not select it for new jobs.- Return to
ACTIVE: After the cooldown period ends, theAccountManagerautomatically returns the account to theACTIVEstate, making it available again.
Configuration:
These parameters are configured when starting the ytdlp-ops-server.
--account-active-duration-min: The "action time" in minutes an account can be continuously active before being moved toRESTING.- Default:
30(minutes).
- Default:
--account-cooldown-duration-min: The "rest time" in minutes an account must remain in theRESTINGstate.- Default:
60(minutes).
- Default:
Where to Configure:
The parameters are passed as command-line arguments to the server. When using Docker Compose, this is done in airflow/docker-compose-ytdlp-ops.yaml:
command:
# ... other parameters
- "--account-active-duration-min"
- "${ACCOUNT_ACTIVE_DURATION_MIN:-30}"
- "--account-cooldown-duration-min"
- "${ACCOUNT_COOLDOWN_DURATION_MIN:-60}"
You can change the default values by setting the ACCOUNT_ACTIVE_DURATION_MIN and ACCOUNT_COOLDOWN_DURATION_MIN environment variables in your .env file.
Relevant Files:
server_fix/account_manager.py: Contains the core logic for state transitions.ytdlp_ops_server_fix.py: Parses the command-line arguments.airflow/docker-compose-ytdlp-ops.yaml: Passes the arguments to the server container.
2. Smart Banning Strategy
Goal: To avoid unfairly banning good proxies. The problem is often with the account, not the proxy it's using.
How It Works:
Stage 1: Ban the Account First
- When a serious, bannable error occurs (e.g.,
BOT_DETECTEDorSOCKS5_CONNECTION_FAILED), the system penalizes only the account that caused the error. - For the proxy, this error is simply recorded as a single failure, but the proxy itself is not banned and remains in rotation.
Stage 2: Ban the Proxy via "Sliding Window"
- A proxy is banned automatically only if it shows systematic failures with DIFFERENT accounts over a short period.
- This is a reliable indicator that the proxy itself is the problem. The
ProxyManageron the server tracks this and automatically bans such a proxy.
Configuration:
These parameters are hard-coded as constants in the source code. Changing them requires editing the file.
Where to Configure:
- File:
server_fix/proxy_manager.py - Constants in the
ProxyManagerclass:FAILURE_WINDOW_SECONDS: The time window in seconds for analyzing failures.- Default:
3600(1 hour).
- Default:
FAILURE_THRESHOLD_COUNT: The minimum total number of failures to trigger a check.- Default:
3.
- Default:
FAILURE_THRESHOLD_UNIQUE_ACCOUNTS: The minimum number of unique accounts that must have failed with the proxy to trigger a ban.- Default:
3.
- Default:
Relevant Files:
server_fix/proxy_manager.py: Contains the sliding window logic and constants.airflow/dags/ytdlp_ops_worker_per_url.py: Thehandle_bannable_error_callablefunction implements the "account-only" ban policy.
Account Statuses Explained
You can view the status of all accounts using the ytdlp_mgmt_proxy_account DAG. The statuses have the following meanings:
ACTIVE: The account is healthy and available for use. An account is consideredACTIVEby default if it has no specific status set.BANNED: The account has been temporarily disabled due to repeated failures (e.g.,BOT_DETECTEDerrors) or by a manual ban. The status will show the time remaining until it automatically becomesACTIVEagain (e.g.,BANNED (active in 55m)).RESTING: The account has been used for an extended period and is in a mandatory "cooldown" period to prevent burnout. The status will show the time remaining until it becomesACTIVEagain (e.g.,RESTING (active in 25m)).- (Blank Status): In older versions, an account that had only ever failed (and never succeeded) might appear with a blank status. This has been fixed; these accounts are now correctly shown as
ACTIVE.
3. End-to-End Rotation Flow: How It All Works Together
This section describes the step-by-step flow of how a worker gets assigned an account and a proxy for a single job, integrating all the management strategies described above.
-
Worker Initialization (
ytdlp_ops_worker_per_url)- The DAG run starts, triggered either by the orchestrator or by its previous successful run.
- The
pull_url_from_redistask fetches a URL from the Redis_inboxqueue.
-
Account Selection (Airflow Worker)
- The
assign_accounttask is executed. - It generates the full list of potential account IDs based on the
account_pool(e.g.,my_prefix_01tomy_prefix_50). - It connects to Redis and iterates through this list, checking the status of each account.
- It builds a new, temporary list containing only accounts that are not in a
BANNEDorRESTINGstate. - If the resulting list of active accounts is empty, the worker fails (unless auto-creation is enabled).
- It then takes the filtered list of active accounts and uses
random.choice()to select one. - The chosen
account_idis passed to the next task.
- The
-
Proxy Selection (
ytdlp-ops-server)- The
get_tokentask runs, sending the randomly chosenaccount_idin a Thrift RPC call to theytdlp-ops-server. - On the server, the
ProxyManageris asked for a proxy. This happens on every single request. - The
ProxyManagerperforms the following steps on every call to ensure it has the most up-to-date information: a. Query Redis: It fetches the entire current state of all proxies from Redis. This ensures it immediately knows about any status changes (e.g., a ban) made by other workers. b. Rebuild Active List: It rebuilds its internal in-memory list of proxies, including only those with anACTIVEstatus. c. Apply Sliding Window Ban: It checks the recent failure history for each active proxy. If a proxy has failed too many times with different accounts, it is banned on the spot, even if its status wasACTIVE. d. Select Proxy: It selects the next available proxy from the final, filtered active list using a round-robin index. e. Return Proxy: It returns the selectedproxy_urlto be used for the token generation task. - Worker Affinity: Crucially, even though workers may share a proxy state in Redis under a common
server_identity, each worker instance will only ever use the proxies it was configured with at startup. It uses Redis to check the status of its own proxies but will ignore other proxies in the shared pool.
- The
-
Execution and Reporting
- The server now has both the
account_id(from Airflow) and theproxy_url(from itsProxyManager). - It proceeds with the token generation process using these resources.
- Upon completion (success or failure), it reports the outcome to Redis, updating the status for both the specific account and proxy that were used. This affects their failure counters, cooldown timers, etc., for the next run.
- The server now has both the
This separation of concerns is key:
- The Airflow worker (
assign_accounttask) is responsible for the random selection of an active account, while maintaining affinity (re-using the same account after a success). - The
ytdlp-ops-serveris responsible for the round-robin selection of an active proxy.
4. Automatic Account Ban on Consecutive Failures
Goal: To automatically remove accounts from rotation that consistently cause non-bannable errors (e.g., incorrect password, authorization issues).
How It Works:
- The
AccountManagertracks the number of consecutive failures for each account. - On any successful operation, this counter is reset.
- If the number of consecutive failures reaches a set threshold, the account is automatically banned for a specified duration.
Configuration:
These parameters are set in the AccountManager constructor.
Where to Configure:
- File:
server_fix/account_manager.py - Parameters in the
__init__method ofAccountManager:failure_threshold: The number of consecutive failures before a ban.- Default:
5.
- Default:
ban_duration_s: The duration of the ban in seconds.- Default:
3600(1 hour).
- Default:
5. Monitoring and Recovery
How to Check Statuses
The ytdlp_mgmt_proxy_account DAG is the primary tool for monitoring the health of your resources. It connects directly to the management service to perform actions.
- DAG ID:
ytdlp_mgmt_proxy_account - How to Use: Trigger the DAG from the Airflow UI. Ensure the
management_hostandmanagement_portparameters are correctly set to point to yourytdlp-ops-managementservice instance. To get a full overview, set the parameters:entity:allaction:list
- Result: The DAG log will display tables with the current status of all accounts and proxies. For
BANNEDorRESTINGaccounts, it shows the time remaining until they become active again (e.g.,RESTING (active in 45m)). For proxies, it highlights which proxy is(next)in the round-robin rotation for a specific worker.
Worker vs. Management Service Roles in Automatic State Changes
It is important to understand the distinct roles each service plays in the automatic state management of accounts and proxies. The system uses a reactive, "on-read" update mechanism.
-
The
workerservice is proactive. It is responsible for putting resources into a "bad" state.- When a worker encounters too many failures with an account, it moves the account to
BANNED. - When an account's activity timer expires, the worker moves it to
RESTING. - When a proxy fails the sliding window check during a token request, the worker bans it.
- When a worker encounters too many failures with an account, it moves the account to
-
The
managementservice is reactive but crucial for recovery. It is responsible for taking resources out of a "bad" state.- The logic to check if a ban has expired or a rest period is over is located in the
getAccountStatusandgetProxyStatusmethods. - This means an account or proxy is only returned to an
ACTIVEstate when its status is queried. - Since the
ytdlp_mgmt_proxy_accountDAG calls these methods on themanagementservice, running this DAG is the primary mechanism for automatically clearing expired bans and rest periods.
- The logic to check if a ban has expired or a rest period is over is located in the
In summary, workers put resources into timeout, and the management service (when queried) brings them back. This makes periodic checks with the management DAG important for overall system health and recovery.
Important Note on Unbanning Proxies
When a proxy is unbanned (either individually via unban or collectively via unban_all), the system performs two critical actions:
- It sets the proxy's status back to
ACTIVE. - It deletes the proxy's entire failure history from Redis.
This second step is crucial. Without it, the ProxyManager's "Sliding Window" check would see the old failures, immediately re-ban the "active" proxy on its next use, and lead to a NO_ACTIVE_PROXIES error. Clearing the history ensures that an unbanned proxy gets a truly fresh start.
What Happens When All Accounts Are Banned or Resting?
If the entire pool of accounts becomes unavailable (either BANNED or RESTING), the system will effectively pause by default.
- The
ytdlp_ops_worker_per_urlDAG will fail at theassign_accountstep with anAirflowExceptionbecause the active account pool will be empty. - This will stop the processing loops. The system will remain paused until accounts are either manually unbanned or their ban/rest timers expire, at which point you can re-start the processing loops using the
ytdlp_ops_orchestratorDAG. - The DAG graph for
ytdlp_ops_worker_per_urlnow explicitly shows tasks forassign_account,get_token,ban_account,retry_get_token, etc., making the process flow and failure points much clearer.
The system can be configured to automatically create new accounts to prevent processing from halting completely.
Automatic Account Creation on Exhaustion
- Goal: Ensure the processing pipeline continues to run even if all accounts in the primary pool are temporarily banned or resting.
- How it works: If the
auto_create_new_accounts_on_exhaustionparameter is set toTrueand the account pool is defined using a prefix (not an explicit list), the system will generate a new, unique account ID when it finds the active pool empty. - New Account Naming: New accounts are created with the format
{prefix}-auto-{unique_id}. - Configuration:
- Parameter:
auto_create_new_accounts_on_exhaustion - Where to set: In the
ytdlp_ops_orchestratorDAG configuration when triggering a run. - Default:
True.
- Parameter:
6. Failure Handling and Retry Policy
Goal: To provide flexible control over how the system behaves when a worker encounters a "bannable" error (e.g., BOT_DETECTED).
How It Works
When a worker's get_token task fails with a bannable error, the system's behavior is determined by the on_bannable_failure policy, which can be configured when starting the ytdlp_ops_orchestrator.
Configuration
- Parameter:
on_bannable_failure - Where to set: In the
ytdlp_ops_orchestratorDAG configuration. - Options:
stop_loop(Strictest):- The account used is banned.
- The URL is marked as failed in the
_failRedis hash. - The worker's processing loop is stopped. The lane becomes inactive.
retry_with_new_account(Default, Most Resilient):- The failing account is banned.
- The worker immediately retries the same URL with a new, unused account from the pool.
- If the retry succeeds, the worker continues its loop to the next URL.
- If the retry also fails, the second account and the proxy are also banned, and the worker's loop is stopped.
retry_and_ban_account_only:- Similar to
retry_with_new_account, but on the second failure, it bans only the second account, not the proxy. - This is useful when you trust your proxies but want to aggressively cycle through failing accounts.
- Similar to
retry_without_ban(Most Lenient):- The worker retries with a new account, but no accounts or proxies are ever banned.
- This policy is useful for debugging or when you are confident that failures are transient and not the fault of the resources.
This policy allows the system to be resilient to single account failures without losing the URL, while providing granular control over when to ban accounts and/or proxies if the problem persists.
7. Worker DAG Logic (ytdlp_ops_worker_per_url)
This DAG is the "workhorse" of the system. It is designed as a self-sustaining loop to process one URL per run. The logic for handling failures and retries is now explicitly visible in the DAG's task graph.
Tasks and Their Purpose:
pull_url_from_redis: Fetches one URL from the Redis_inboxqueue. If the queue is empty, the DAG run is skipped, stopping this worker's processing "lane".assign_account: Selects an account for the job. It maintains account affinity by re-using the same account from the previous successful run in its "lane". If it's the first run or the previous run failed, it picks a random active account.get_token: The primary attempt to get tokens andinfo.jsonby calling theytdlp-ops-server.handle_bannable_error_branch: A branching task that runs ifget_tokenfails. It inspects the error and decides the next step based on theon_bannable_failurepolicy.ban_account_and_prepare_for_retry: If a retry is permitted, this task bans the failed account and selects a new one.retry_get_token: A second attempt to get the token using the new account.ban_second_account_and_proxy: If the retry also fails, this task bans the second account and the proxy that was used.download_and_probe: Ifget_tokenorretry_get_tokensucceeds, this task usesyt-dlpto download the media andffmpegto verify that the downloaded file is a valid media file.mark_url_as_success: Ifdownload_and_probesucceeds, this task records the successful result in the Redis_resulthash.handle_generic_failure: If any task fails non-recoverably, this task records the detailed error information in the Redis_failhash.decide_what_to_do_next: A final branching task that decides whether to continue the loop (trigger_self_run), stop it gracefully (stop_loop), or mark it as failed (fail_loop).trigger_self_run: The task that actually triggers the next DAG run, creating the continuous loop.
8. Proxy State Lifecycle in Redis
This section details how a proxy's state (e.g., ACTIVE, BANNED) is managed and persisted in Redis. The system uses a "lazy initialization" pattern, meaning a proxy's state is only written to Redis when it is first needed.
Step 1: Configuration and In-Memory Initialization
The server first learns about the list of available proxies from its startup configuration, not from Redis.
- Source of Truth: Proxies are defined in the
.envfile (e.g.,CAMOUFOX_PROXIES,SOCKS5_SOCK_SERVER_IP). - Injection: The
airflow/generate_envoy_config.pyscript aggregates these into a single list, which is passed to theytdlp-ops-servervia the--proxiescommand-line argument during Docker Compose startup. - In-Memory State: The
ProxyManagerinserver_fix/proxy_manager.pyreceives this list and holds it in memory. At this point, Redis is not involved.
Step 2: First Write to Redis (Lazy Initialization)
A proxy's state is only persisted to Redis the first time it is actively managed or queried.
- Trigger: This typically happens on the first API call that requires proxy state, such as
getProxyStatus. - Action: The
ProxyManagerchecks Redis for a hash with the keyproxies:<server_identity>(e.g.,proxies:ytdlp-ops-airflow-service). - Initialization: If the key does not exist, the
ProxyManageriterates through its in-memory list of proxies and writes each one to the Redis hash with a default state ofACTIVE.
Step 3: Runtime Updates (Success and Failure)
The proxy's state in Redis is updated in real-time based on the outcome of token generation tasks.
- On Success: When a task using a proxy succeeds,
ProxyManager.report_success()is called. This updates the proxy'ssuccess_countandlast_success_timestampin the Redis hash. - On Failure: When a task fails,
ProxyManager.report_failure()is called.- A record of the failure (including the account ID and job ID) is added to a separate Redis sorted set with the key
proxy_failures:<proxy_url>. This key has a TTL and is used for the sliding window ban strategy. - The proxy's
failure_countandlast_failure_timestampare updated in the main Redis hash.
- A record of the failure (including the account ID and job ID) is added to a separate Redis sorted set with the key
- Automatic Ban: If the conditions for the "Sliding Window" ban are met (too many failures from different accounts in a short time),
ProxyManager.ban_proxy()is called, which updates the proxy'sstatustoBANNEDin the Redis hash.
Step 4: Observation and Manual Control
You can view and modify the proxy states stored in Redis using the provided management tools.
- Observation:
- Airflow DAG: The
ytdlp_mgmt_proxy_accountDAG (action: list_statuses,entity: proxy). - CLI Client: The
proxy_manager_client.pyscript (listcommand). - These tools call the
getProxyStatusAPI endpoint, which reads directly from theproxies:<server_identity>hash in Redis.
- Airflow DAG: The
- Manual Control:
- The same tools provide
ban,unban, andresetactions. - These actions call API endpoints that directly modify the
statusfield for a proxy in theproxies:<server_identity>Redis hash. - The
delete_from_redisaction in the DAG provides a way to completely remove a proxy's state and failure history from Redis, forcing it to be re-initialized asACTIVEon its next use.
- The same tools provide
Summary of Redis Keys
| Redis Key Pattern | Type | Purpose |
|---|---|---|
proxies:<server_identity> |
Hash | The primary store for proxy state. Maps proxy_url to a JSON string containing its status (ACTIVE/BANNED), success/failure counts, and timestamps. |
proxy_failures:<proxy_url> |
Sorted Set | A temporary log of recent failures for a specific proxy, used by the sliding window ban logic. The score is the timestamp of the failure. |