feat: complete Ansible corrected file paths and removed redundant files

This commit is contained in:
aperez 2025-08-27 02:32:27 +03:00
parent 7b13257bd1
commit 2be7716bed
142 changed files with 4905 additions and 2058 deletions

File diff suppressed because it is too large Load Diff

View File

@ -13,3 +13,168 @@
# 2025-08-19 17:56:18.910148
+/read-only ../yt-dlp-deployment/ansible/airflow_worker
# 2025-08-26 19:27:21.057865
+/model openrouter/qwen/qwen3-coder
# 2025-08-26 19:29:07.866685
+ok. I have copied major files from other repo.. You need to analyze what is necessary, fix pathes, allow to remove or restructgure ansible, create README, but don't much change code, this is workable. So now you need to recheck do we need top-level folder as it, or change them (duplicates thrift_model and pangramia). take a look at ansible/
# 2025-08-26 19:30:47.790538
+D
# 2025-08-26 19:30:56.392322
+No
# 2025-08-26 19:32:26.827936
+Sorry .. this thrift and pangramia are only needed to 1) allow clients to address services 2) to be copied to airflow dockerfiles to allow dags to work with... So examine at first /add ansible/
# 2025-08-26 19:32:47.626964
+D
# 2025-08-26 19:32:51.274712
+No
# 2025-08-26 19:32:53.860479
+D
# 2025-08-26 19:33:07.362019
+No
# 2025-08-26 19:33:20.996991
+/undo setup.py
# 2025-08-26 19:33:49.644236
+/add ansible
# 2025-08-26 19:34:17.769816
+Please examine which files are addressed, but probably missed... from ansibl?
# 2025-08-26 19:34:59.954086
+/model openrouter/qwen/qwen3-coder
# 2025-08-26 19:35:21.760002
+/ask please analyze which files are addressed from ansible, but probably missed from this repo
# 2025-08-26 19:48:30.224955
+D
# 2025-08-26 19:58:41.483184
+please create that folders if missed, but also say if we might remove some top-level folders /run ls
# 2025-08-26 19:59:25.580454
+D
# 2025-08-26 19:59:31.175830
+S
# 2025-08-26 20:04:17.221108
+/run ls
# 2025-08-26 20:09:50.479402
+/add airflow/Dockerfile
# 2025-08-26 20:10:42.985326
+SOrry may you analyze if all that folder top are needed and if rerferenced properly
# 2025-08-26 20:11:08.062598
+D
# 2025-08-26 20:15:58.400588
+Sorry ... it seems for me that thrift_model yt_ops and ytdlp_ops_auth are all for the thrift dependences, used in airflow/Dockerfile build. But we need to check if thery need to be top-level or create a certain inside special folder, and update ansible references, since then it copy that into both master and worker for build.. same for setup.py it's only to setup locally that package (e.g. pangramia) and then be used by clients(two files alike ...) ... So we probably may move them but you need to recheck ansible refs
# 2025-08-26 20:17:25.069042
+/run ls
# 2025-08-26 20:18:34.524722
+we probalby may move both top level py as well as setup.py to that yt_ops_package/ while adding inside pangramia => ln -s to thrift_model/gen_py/pangramia).
# 2025-08-26 20:19:52.199608
+/model
# 2025-08-26 20:20:05.979177
+/model openrouter/qwen/qwen3-coder
# 2025-08-26 20:20:19.321643
+sorry you probably failed with previous
# 2025-08-26 20:21:40.616011
+may I move top level python files to package, ok? then you will update ansible?
# 2025-08-26 20:22:51.544085
+/model openrouter/moonshotai/kimi-k2
# 2025-08-26 20:23:00.944724
+exit
# 2025-08-26 20:25:10.944346
+/add ansible
# 2025-08-26 20:36:07.449988
+/add README.md
# 2025-08-26 20:36:15.896777
+/run ls
# 2025-08-26 20:36:58.783499
+I haven't finished, please move thrift_model and setup.py and all py top level files to new folder yt_ops_package, and update references
# 2025-08-26 20:37:10.904873
+D
# 2025-08-26 20:42:07.375209
+No
# 2025-08-26 20:42:14.699143
+D
# 2025-08-26 20:42:23.537622
+S
# 2025-08-26 20:43:16.492226
+No.. you don't need to udpate that files, you only configure ansible to copy to a proper folder to place
# 2025-08-26 20:47:19.166850
+D
# 2025-08-26 21:03:13.329925
+/clear
# 2025-08-26 21:03:48.279889
+Please recheck that ansible addresses to proper folder and files, change only ansible sciprts, also add README.md
# 2025-08-26 21:13:26.425838
+D
# 2025-08-26 21:13:28.330054
+S
# 2025-08-27 02:03:38.074031
+/clear
# 2025-08-27 02:04:48.031688
+do we need for ansible airflow/docker-compose.camoufox.yaml and airflow/docker-compose-master.yaml and docker-compose-dl.yaml... or we utilize only j2 versions... Also do we need airflow/init-airflow.sh and airflow/init-yt-service.sh and init-yt-service.sh ?
# 2025-08-27 02:04:53.172717
+D
# 2025-08-27 02:06:03.759537
+No
# 2025-08-27 02:13:28.788492
+exit
# 2025-08-27 02:15:18.658843
+may you recheck that ansible/ folder does have proper links to the files and folder from this repo, so it's redy for deployment ... Also recheck that it's ok from logging perspective, alike workers and master are have proper s3 logging configured
# 2025-08-27 02:15:49.033642
+/add ansible/
# 2025-08-27 02:15:51.656556
+added
# 2025-08-27 02:16:44.736374
+D
# 2025-08-27 02:17:22.140783
+S

Binary file not shown.

View File

@ -62,3 +62,11 @@ Thumbs.db
# Build artifacts
target/
# yt_ops_package build artifacts
yt_ops_package/__pycache__/
yt_ops_package/*.py[cod]
yt_ops_package/*$py.class
yt_ops_package/build/
yt_ops_package/dist/
yt_ops_package/*.egg-info/

1
.gitignore vendored Normal file
View File

@ -0,0 +1 @@
.aider*

4
README.md Normal file
View File

@ -0,0 +1,4 @@
# YT Ops Services

View File

@ -1,19 +0,0 @@
Dockerfile.thrift
FROM python:3.9-slim as builder
WORKDIR /app
#COPY ../setup.py /app/setup.py
#COPY ../requirements.txt /app/requirements.txt
#COPY ../yt_ops_services /app/yt_ops_services
#COPY ../thrift_model /app/thrift_model
#COPY ../server /app/server
COPY requirements.txt /app/requirements.txt
# Install dependencies
RUN pip install --user --no-cache-dir -r /app/requirements.txt
# Install the custom package in editable mode
#RUN pip install --user -e /app

File diff suppressed because one or more lines are too long

View File

@ -1,534 +0,0 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL.
#
# WARNING: This configuration is for local development. Do not use it in a production deployment.
#
# This configuration supports basic configuration using environment variables or an .env file
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow.
# Default: apache/airflow:2.10.5
# AIRFLOW_UID - User ID in Airflow containers
# Default: 50000
# AIRFLOW_PROJ_DIR - Base path to which all the files will be volumed.
# Default: .
# Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode
#
# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested).
# Default: airflow
# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested).
# Default: airflow
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
# Use this option ONLY for quick checks. Installing requirements at container
# startup is done EVERY TIME the service is started.
# A better way is to build a custom image or extend the official image
# as described in https://airflow.apache.org/docs/docker-stack/build.html.
# Default: ''
#
# Feel free to modify this file to suit your needs.
---
name: airflow-master
x-minio-common: &minio-common
image: quay.io/minio/minio:RELEASE.2025-07-23T15-54-02Z
command: server --console-address ":9001" http://minio{1...3}/data{1...2}
expose:
- "9000"
- "9001"
networks:
- proxynet
env_file:
- .env
environment:
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-admin}
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-0153093693-0009}
healthcheck:
test: ["CMD", "mc", "ready", "local"]
interval: 5s
timeout: 5s
retries: 5
restart: always
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# This will build the image from the Dockerfile in this directory and tag it.
image: ${AIRFLOW_IMAGE_NAME:-pangramia/ytdlp-ops-airflow:latest}
build: .
# Add extra hosts here to allow the master services (webserver, scheduler) to resolve
# the hostnames of your remote DL workers. This is crucial for fetching logs.
# Format: - "hostname:ip_address"
# IMPORTANT: This section is auto-generated from cluster.yml
extra_hosts:
- "af-test:89.253.223.97"
- "dl001:109.107.189.106"
env_file:
- .env
networks:
- proxynet
environment:
&airflow-common-env
AIRFLOW__CORE__PARALLELISM: 64
AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 32
AIRFLOW__SCHEDULER__PARSING_PROCESSES: 4
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT}@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
AIRFLOW_CONFIG: '/opt/airflow/config/airflow.cfg'
AIRFLOW__WEBSERVER__SECRET_KEY: 'qmALu5JCAW0518WGAqkVZQ=='
AIRFLOW__CORE__INTERNAL_API_SECRET_KEY: 'qmALu5JCAW0518WGAqkVZQ=='
# yamllint disable rule:line-length
# Use simple http server on scheduler for health checks
# See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server
# yamllint enable rule:line-length
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
# WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
# for other purpose (development, test and especially production usage) build/extend Airflow image.
#_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- apache-airflow-providers-docker apache-airflow-providers-http thrift>=0.16.0,<=0.20.0 backoff>=2.2.1 python-dotenv==1.0.1 psutil>=5.9.0} # The following line can be used to set a custom config file, stored in the local config folder
# If you want to use it, outcomment it and replace airflow.cfg with the name of your config file
AIRFLOW__LOGGING__REMOTE_LOGGING: "True"
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://airflow-logs"
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: minio_default
AIRFLOW__LOGGING__ENCRYPT_S3_LOGS: "False"
AIRFLOW__LOGGING__REMOTE_LOG_FORMAT: "[%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s"
AIRFLOW__LOGGING__LOG_LEVEL: "INFO"
AIRFLOW__LOGGING__LOG_FILENAME_TEMPLATE: "{{ ti.dag_id }}/{{ ti.run_id }}/{{ ti.task_id }}/attempt={{ try_number }}.log"
AIRFLOW__CORE__LOCAL_SETTINGS_PATH: "/opt/airflow/config/custom_task_hooks.py"
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
- ${AIRFLOW_PROJ_DIR:-.}/downloadfiles:/opt/airflow/downloadfiles
- ${AIRFLOW_PROJ_DIR:-.}/addfiles:/opt/airflow/addfiles
- ${AIRFLOW_PROJ_DIR:-.}/inputfiles:/opt/airflow/inputfiles
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
nginx-minio-lb:
condition: service_healthy
services:
postgres:
image: postgres:13
env_file:
- .env
networks:
- proxynet
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-pgdb_pwd_A7bC2xY9zE1wV5uP}
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
ports:
- "5432:5432"
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 10s
retries: 5
start_period: 5s
restart: always
redis:
# Redis is limited to 7.2-bookworm due to licencing change
# https://redis.io/blog/redis-adopts-dual-source-available-licensing/
image: redis:7.2-bookworm
env_file:
- .env
networks:
- proxynet
command: sh -c "redis-server --requirepass ${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} --bind 0.0.0.0 --save 60 1 --loglevel warning --appendonly yes"
volumes:
- ./redis-data:/data
expose:
- 6379
ports:
- "52909:6379"
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT}", "ping"]
interval: 10s
timeout: 30s
retries: 50
start_period: 30s
restart: always
redis-proxy-account-clear:
image: redis:7.2-bookworm
container_name: redis-proxy-account-clear
env_file:
- .env
networks:
- proxynet
command: >
sh -c "
echo 'Clearing proxy and account statuses from Redis...';
redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} --scan --pattern 'proxy_status:*' | xargs -r redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} DEL;
redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} --scan --pattern 'account_status:*' | xargs -r redis-cli -h redis -a $${REDIS_PASSWORD:-redis_pwd_K3fG8hJ1mN5pQ2sT} DEL;
echo 'Redis cleanup complete.'
"
depends_on:
redis:
condition: service_healthy
minio1:
<<: *minio-common
hostname: minio1
volumes:
- ./minio-data/1/1:/data1
- ./minio-data/1/2:/data2
minio2:
<<: *minio-common
hostname: minio2
volumes:
- ./minio-data/2/1:/data1
- ./minio-data/2/2:/data2
depends_on:
minio1:
condition: service_started
minio3:
<<: *minio-common
hostname: minio3
volumes:
- ./minio-data/3/1:/data1
- ./minio-data/3/2:/data2
depends_on:
minio2:
condition: service_started
nginx-minio-lb:
image: nginx:1.19.2-alpine
hostname: nginx-minio-lb
networks:
- proxynet
command: sh -c "apk add --no-cache curl >/dev/null 2>&1 && exec nginx -g 'daemon off;'"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
ports:
- "9000:9000"
- "9001:9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9001/minio/health/live"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
depends_on:
minio1:
condition: service_healthy
minio2:
condition: service_healthy
minio3:
condition: service_healthy
restart: always
minio-init:
image: minio/mc
container_name: minio-init
networks:
- proxynet
depends_on:
nginx-minio-lb:
condition: service_healthy
entrypoint: >
/bin/sh -c "
set -e;
/usr/bin/mc alias set minio http://nginx-minio-lb:9000 $$MINIO_ROOT_USER $$MINIO_ROOT_PASSWORD;
# Retry loop for bucket creation
MAX_ATTEMPTS=10
SUCCESS=false
# Use a for loop for robustness, as it's generally more portable than `until`.
for i in $$(seq 1 $$MAX_ATTEMPTS); do
# Check if the bucket exists. If so, we're done.
if /usr/bin/mc ls minio/airflow-logs > /dev/null 2>&1; then
echo 'MinIO bucket already exists.'
SUCCESS=true
break
fi
# If not, try to create it. If successful, we're done.
# We redirect output because `mc mb` can error if another process creates it in the meantime.
if /usr/bin/mc mb minio/airflow-logs > /dev/null 2>&1; then
echo 'MinIO bucket created.'
SUCCESS=true
break
fi
# If we reach here, both checks failed. Wait and retry.
echo "Attempt $$i/$$MAX_ATTEMPTS: Waiting for MinIO bucket..."
sleep 2
done
# After the loop, check if we succeeded.
if [ "$$SUCCESS" = "false" ]; then
echo "Failed to create MinIO bucket after $$MAX_ATTEMPTS attempts."
exit 1
fi
/usr/bin/mc anonymous set download minio/airflow-logs;
echo 'MinIO initialized: bucket airflow-logs created and policy set to download.';
"
env_file:
- .env
environment:
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-admin}
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-0153093693-0009}
restart: on-failure
nginx-healthcheck:
image: nginx:alpine
container_name: nginx-healthcheck
networks:
- proxynet
ports:
- "8888:80"
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-master-worker:
<<: *airflow-common
command: airflow celery worker -q main,default
healthcheck:
# yamllint disable rule:line-length
test:
- "CMD-SHELL"
- 'celery --app airflow.providers.celery.executors.celery_executor.app inspect ping -d "worker-master@$$(hostname)"'
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
DUMB_INIT_SETSID: 0
AIRFLOW__CELERY__WORKER_QUEUES: "main,default"
AIRFLOW__CELERY__WORKER_TAGS: "master"
AIRFLOW__CELERY__WORKER_CONCURRENCY: "16"
AIRFLOW__CELERY__WORKER_PREFETCH_MULTIPLIER: "1"
AIRFLOW__CELERY__TASK_ACKS_LATE: "False"
AIRFLOW__CELERY__OPERATION_TIMEOUT: "2.0"
AIRFLOW__CELERY__WORKER_NAME: "worker-master@%h"
AIRFLOW__CELERY__WORKER_MAX_TASKS_PER_CHILD: "100"
# Max memory per child process before it's recycled. Helps prevent memory leaks.
# 256MB is sufficient for master worker tasks. DL workers use a higher limit.
AIRFLOW__CELERY__WORKER_MAX_MEMORY_PER_CHILD: "262144" # 256MB
hostname: ${HOSTNAME}
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
command: triggerer
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
depends_on:
<<: *airflow-common-depends-on
minio-init:
condition: service_completed_successfully
redis-proxy-account-clear:
condition: service_completed_successfully
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
# This container runs as root and is responsible for initializing the environment.
# It sets permissions on mounted directories to ensure the 'airflow' user (running with AIRFLOW_UID)
# can write to them. This is crucial for logs, dags, and plugins.
echo "Initializing permissions for Airflow directories..."
chown -R "${AIRFLOW_UID}:${AIRFLOW_GID}" /opt/airflow/dags /opt/airflow/logs /opt/airflow/plugins /opt/airflow/config /opt/airflow/downloadfiles /opt/airflow/addfiles /opt/airflow/inputfiles
echo "Permissions set."
if [[ -z "${AIRFLOW_UID}" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
echo "If you are on Linux, you SHOULD follow the instructions below to set "
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
echo "For other operating systems you can get rid of the warning with manually created .env file:"
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
echo
fi
# This container's job is to initialize the database, create a user, and import connections.
# Wait for db to be ready.
airflow db check --retry 30 --retry-delay 5
# Run database migrations.
echo "Running database migrations..."
airflow db upgrade
echo "Database migrations complete."
# Create the admin user if it doesn't exist.
# The '|| true' prevents the script from failing if the user already exists.
echo "Checking for and creating admin user..."
airflow users create \
--username "admin" \
--password "${AIRFLOW_ADMIN_PASSWORD:-admin_pwd_X9yZ3aB1cE5dF7gH}" \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com || true
echo "Admin user check/creation complete."
# Import connections from any .json file in the config directory.
echo "Searching for connection files in /opt/airflow/config..."
if [ -d "/opt/airflow/config" ] && [ -n "$(ls -A /opt/airflow/config/*.json 2>/dev/null)" ]; then
for conn_file in /opt/airflow/config/*.json; do
if [ -f "$$conn_file" ]; then
# Exclude files that are not meant to be Airflow connections.
if [ "$(basename "$$conn_file")" = "camoufox_endpoints.json" ]; then
echo "Skipping '$$conn_file' as it is not an Airflow connection file."
continue
fi
echo "Importing connections from $$conn_file"
airflow connections import "$$conn_file" || echo "Failed to import $$conn_file, but continuing."
fi
done
else
echo "No connection files found to import, or /opt/airflow/config is empty/missing."
fi
echo "Connection import process complete."
# yamllint enable rule:line-length
environment:
<<: *airflow-common-env
_AIRFLOW_DB_MIGRATE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'false' # Set to false as we handle it manually
_PIP_ADDITIONAL_REQUIREMENTS: ''
user: "0:0"
airflow-cli:
<<: *airflow-common
profiles:
- debug
environment:
<<: *airflow-common-env
CONNECTION_CHECK_MAX_COUNT: "0"
# Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
command:
- bash
- -c
- airflow
# You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
# or by explicitly targeted on the command line e.g. docker-compose up flower.
# See: https://docs.docker.com/compose/profiles/
flower:
<<: *airflow-common
command: celery flower
ports:
- "5555:5555"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
docker-socket-proxy:
profiles:
- disabled
image: tecnativa/docker-socket-proxy:0.1.1
networks:
- proxynet
environment:
CONTAINERS: 1
IMAGES: 1
AUTH: 1
POST: 1
privileged: true
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: always
volumes:
postgres-db-volume:
networks:
proxynet:
name: airflow_proxynet
external: true

View File

@ -1,47 +0,0 @@
# THIS FILE IS AUTO-GENERATED BY generate_envoy_config.py
# DO NOT EDIT MANUALLY.
#
# It contains the service definitions for the camoufox instances
# and adds the necessary dependencies to the main services.
services:
camoufox-1:
image: ghcr.io/safing/camoufox:latest
container_name: ytdlp-ops-camoufox-1-1
restart: unless-stopped
ports:
- "12345:12345"
environment:
- DISPLAY=:99
- CAMOUFOX_MAX_MEMORY_MB=2048
- CAMOUFOX_MAX_CONCURRENT_CONTEXTS=8
- CAMOUFOX_RESTART_THRESHOLD_MB=1500
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix:rw
- camoufox-data-1:/app/context-data
command: [
"--ws-host", "0.0.0.0",
"--port", "12345",
"--ws-path", "mypath",
"--headless",
"--monitor-resources",
"--memory-restart-threshold", "1800"
]
deploy:
resources:
limits:
memory: 2.5G
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "3"
networks:
- camoufox-network
volumes:
camoufox-data-1:
networks:
camoufox-network:
driver: bridge

View File

@ -1,102 +0,0 @@
#!/bin/bash
#
# This script should be run on the Airflow host (master or worker)
# to initialize the environment. It creates the .env file and sets
# up permissions.
#
set -e
# --- Configuration ---
# The directory where docker-compose.yaml is located
AIRFLOW_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
cd "$AIRFLOW_DIR"
echo "--- Initializing Airflow Environment in $AIRFLOW_DIR ---"
# --- Step 1: Create or update .env file for Docker permissions ---
if [ -f ".env" ]; then
echo ".env file already exists. Ensuring correct permissions are set..."
# Ensure AIRFLOW_UID is set to the current user's ID.
if ! grep -q "^AIRFLOW_UID=" .env; then
echo "AIRFLOW_UID not found in .env. Appending..."
echo "AIRFLOW_UID=$(id -u)" >> .env
fi
# Ensure HOSTNAME is set for worker identity.
if ! grep -q "^HOSTNAME=" .env; then
echo "HOSTNAME not found in .env. Appending..."
echo "HOSTNAME=$(hostname)" >> .env
fi
# Force AIRFLOW_GID to be 0, as required by the Airflow image.
# This removes any existing AIRFLOW_GID line and adds the correct one.
if grep -q "^AIRFLOW_GID=" .env; then
echo "Found existing AIRFLOW_GID. Forcing it to 0..."
# The sed command works on both Linux and macOS, creating a .env.bak file.
sed -i.bak '/^AIRFLOW_GID=/d' .env
fi
echo "AIRFLOW_GID=0" >> .env
echo "Permissions updated in .env file."
else
echo "Creating .env file..."
# Note: On Linux hosts, this is crucial for permissions.
echo "AIRFLOW_UID=$(id -u)" > .env
echo "AIRFLOW_GID=0" >> .env
# Add HOSTNAME for worker-specific queueing and container identity
echo "HOSTNAME=$(hostname)" >> .env
# Add default passwords. These should be changed for production.
echo "POSTGRES_PASSWORD=pgdb_pwd_A7bC2xY9zE1wV5uP" >> .env
echo "REDIS_PASSWORD=redis_pwd_K3fG8hJ1mN5pQ2sT" >> .env
echo "AIRFLOW_ADMIN_PASSWORD=admin_pwd_X9yZ3aB1cE5dF7gH" >> .env
echo ".env file created. For a DL worker, you must also add MASTER_HOST_IP. Please review and update passwords."
fi
echo "Current .env contents:"
cat .env
echo "----------------------------------------"
# --- Step 2: Create directories and set permissions ---
# These directories are mounted into the containers and need to exist on the host.
echo "Ensuring mounted directories exist..."
# Define directories in an array for reuse
DIRS_TO_CREATE=(dags logs plugins config inputfiles downloadfiles addfiles)
mkdir -p "${DIRS_TO_CREATE[@]}"
echo "Directories checked/created."
# Load .env to get AIRFLOW_UID. The `set -o allexport` command exports all variables defined from now on.
if [ -f .env ]; then
set -o allexport
source .env
set +o allexport
else
echo "ERROR: .env file not found. Cannot determine AIRFLOW_UID for setting permissions."
exit 1
fi
# Set permissions on the directories. This is crucial for the Airflow user inside the container.
# The airflow-init container on the master does this, but for workers, we must do it here.
echo "Setting ownership for mounted directories to AIRFLOW_UID=${AIRFLOW_UID}..."
if command -v sudo &> /dev/null; then
sudo chown -R "${AIRFLOW_UID}:0" "${DIRS_TO_CREATE[@]}"
echo "Permissions set successfully."
else
echo "WARNING: 'sudo' command not found. Attempting 'chown' as current user."
chown -R "${AIRFLOW_UID}:0" "${DIRS_TO_CREATE[@]}" || (
echo "ERROR: Failed to set permissions. Please run the following command manually with appropriate privileges:"
echo "chown -R \"${AIRFLOW_UID}:0\" dags logs plugins config inputfiles downloadfiles addfiles"
exit 1
)
echo "Permissions set successfully."
fi
echo "----------------------------------------"
# --- Step 3: Instructions for creating admin user ---
echo "--- Next Steps ---"
echo "1. Ensure your docker-compose.yaml (and -master.yaml, -dl.yaml) files are present."
echo "2. Start Airflow services: docker compose up -d"
echo "3. The admin user will be created automatically with the password from your .env file."
echo " Default username: admin"
echo " Default password can be found in .env as AIRFLOW_ADMIN_PASSWORD"
echo
echo "Initialization complete."

View File

@ -1,32 +0,0 @@
#!/bin/bash
#
# This script should be run on the YT Service host to initialize the environment.
# It creates the .env file from the example if it doesn't exist.
#
set -e
# --- Configuration ---
# The directory where docker-compose-ytdlp-ops.yaml is located
SERVICE_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
cd "$SERVICE_DIR"
echo "--- Initializing YT Service Environment in $SERVICE_DIR ---"
# --- Step 1: Create .env file from .env.example ---
if [ -f ".env" ]; then
echo ".env file already exists. Skipping creation."
else
if [ -f ".env.example" ]; then
echo "Creating .env file from .env.example..."
cp .env.example .env
echo ".env file created. IMPORTANT: Please edit it with your production values."
else
echo "Warning: .env.example not found. Cannot create .env file."
echo "Please create a .env file manually."
fi
fi
echo "----------------------------------------"
echo "Initialization check complete."
echo "Please review the .env file and then follow the 'Next Steps' from the deployment script."

View File

@ -1,49 +0,0 @@
# Ansible-driven YT-DLP / Airflow Cluster Quick-Start & Cheat-Sheet
> One playbook = one command to **deploy**, **update**, **restart**, or **re-configure** the entire cluster.
---
## 0. Prerequisites (run once on the **tower** server)
```
---
## 1. Ansible Vault Setup (run once on your **local machine**)
This project uses Ansible Vault to encrypt sensitive data like passwords and API keys. To run the playbooks, you need to provide the vault password. The recommended way is to create a file named `.vault_pass` in the root of the project directory.
1. **Create the Vault Password File:**
From the project's root directory (e.g., `/opt/yt-ops-services`), create the file. The file should contain only your vault password on a single line.
```bash
# Replace 'your_secret_password_here' with your actual vault password
echo "your_secret_password_here" > .vault_pass
```
2. **Secure the File:**
It's good practice to restrict permissions on this file so only you can read it.
```bash
chmod 600 .vault_pass
```
The `ansible.cfg` file is configured to automatically look for this `.vault_pass` file in the project root.
---
## 2. Common Operations
### Running Ansible Commands
**IMPORTANT:** All `ansible-playbook` commands should be run from within the `ansible/` directory. This allows Ansible to automatically find the `ansible.cfg` and `inventory.ini` files.
```bash
cd ansible
ansible-playbook <playbook_name>.yml
```
If you run the command from the project root, you will see warnings about the inventory not being parsed, because Ansible does not automatically find `ansible/ansible.cfg`.
The `ansible.cfg` file has been configured to look for the `.vault_pass` file in the project root directory (one level above `ansible/`). Ensure your `.vault_pass` file is located there.

View File

@ -42,15 +42,16 @@
- "airflow/docker-compose-master.yaml"
- "airflow/dags"
- "airflow/config"
- "setup.py"
- "yt_ops_services"
- "thrift_model"
- "VERSION"
- "yt_ops_package/setup.py"
- "yt_ops_package/yt_ops_services"
- "yt_ops_package/thrift_model"
- "yt_ops_package/VERSION"
- "yt_ops_package/pangramia"
- "airflow/init-airflow.sh"
- "airflow/update-yt-dlp.sh"
- "airflow/nginx.conf"
- "get_info_json_client.py"
- "proxy_manager_client.py"
- "yt_ops_package/get_info_json_client.py"
- "yt_ops_package/proxy_manager_client.py"
- "token_generator"
- "utils"
@ -68,7 +69,7 @@
- name: Sync pangramia thrift files
synchronize:
src: "../thrift_model/gen_py/pangramia/"
src: "../yt_ops_package/thrift_model/gen_py/pangramia/"
dest: "{{ airflow_master_dir }}/pangramia/"
archive: yes
recursive: yes
@ -89,7 +90,7 @@
- name: Template Minio connection file
template:
src: "../airflow/config/minio_default_conn.json.j2"
src: "../templates/minio_default_conn.json.j2"
dest: "{{ airflow_master_dir }}/config/minio_default_conn.json"
mode: "{{ file_permissions }}"
owner: "{{ ssh_user }}"
@ -98,7 +99,7 @@
- name: Template YT-DLP Redis connection file
template:
src: "../airflow/config/ytdlp_redis_conn.json.j2"
src: "../templates/ytdlp_redis_conn.json.j2"
dest: "{{ airflow_master_dir }}/config/ytdlp_redis_conn.json"
mode: "{{ file_permissions }}"
owner: "{{ ssh_user }}"

View File

@ -31,14 +31,15 @@
- "airflow/.dockerignore"
- "airflow/dags"
- "airflow/config"
- "setup.py"
- "yt_ops_services"
- "thrift_model"
- "VERSION"
- "yt_ops_package/setup.py"
- "yt_ops_package/yt_ops_services"
- "yt_ops_package/thrift_model"
- "yt_ops_package/VERSION"
- "yt_ops_package/pangramia"
- "airflow/init-airflow.sh"
- "airflow/update-yt-dlp.sh"
- "get_info_json_client.py"
- "proxy_manager_client.py"
- "yt_ops_package/get_info_json_client.py"
- "yt_ops_package/proxy_manager_client.py"
- "token_generator"
- "utils"
@ -66,7 +67,7 @@
- name: Sync pangramia thrift files
synchronize:
src: "../thrift_model/gen_py/pangramia/"
src: "../yt_ops_package/thrift_model/gen_py/pangramia/"
dest: "{{ airflow_worker_dir }}/pangramia/"
archive: yes
recursive: yes
@ -112,12 +113,6 @@
recurse: yes
become: yes
# - name: Login to Docker Hub
# community.docker.docker_login:
# username: "{{ dockerhub_user }}"
# password: "{{ vault_dockerhub_token }}"
# no_log: true
- name: Verify Dockerfile exists in build directory
stat:
path: "{{ airflow_worker_dir }}/Dockerfile"

View File

@ -10,3 +10,10 @@ AIRFLOW_GID=0
MINIO_ROOT_USER=admin
MINIO_ROOT_PASSWORD={{ vault_minio_root_password }}
AIRFLOW_VAR_MASTER_HOST_IP={{ hostvars[groups['airflow_master'][0]].ansible_host }}
# S3 Logging Configuration
AIRFLOW_VAR_S3_LOG_BUCKET=your-s3-bucket-name
AIRFLOW_VAR_S3_LOG_FOLDER=airflow-logs/master
AWS_ACCESS_KEY_ID={{ vault_aws_access_key_id | default('') }}
AWS_SECRET_ACCESS_KEY={{ vault_aws_secret_access_key | default('') }}
AWS_DEFAULT_REGION={{ aws_region | default('us-east-1') }}

View File

@ -20,3 +20,10 @@ ACCOUNT_COOLDOWN_DURATION_MIN=30
MINIO_ROOT_USER=admin
MINIO_ROOT_PASSWORD={{ vault_minio_root_password }}
AIRFLOW_GID=0
# S3 Logging Configuration
AIRFLOW_VAR_S3_LOG_BUCKET=your-s3-bucket-name
AIRFLOW_VAR_S3_LOG_FOLDER=airflow-logs/workers/{{ inventory_hostname }}
AWS_ACCESS_KEY_ID={{ vault_aws_access_key_id | default('') }}
AWS_SECRET_ACCESS_KEY={{ vault_aws_secret_access_key | default('') }}
AWS_DEFAULT_REGION={{ aws_region | default('us-east-1') }}

Some files were not shown because too many files have changed in this diff Show More