containerize with Haloy: Dockerfile, Caddy basicauth, haloy.yml for db.xn--2dk.xyz

- Dockerfile: debian slim, installs DuckDB CLI, Python deps, Caddy
- start.sh: runs prepara_db.py → starts Caddy (basicauth) → starts DuckDB UI
- Caddyfile: updated for container (no TLS, port 8080, Haloy handles HTTPS)
- haloy.yml: deploys to db.xn--2dk.xyz on port 8080
- requirements.txt: duckdb, boto3, python-dotenv
- prepara_db.py, open_gui.sh, duckdb-ui.service: add previously untracked files
- remove prepara_gui.py (replaced by prepara_db.py)
This commit is contained in:
2026-03-25 13:23:59 +01:00
parent 03758acdd9
commit 9eb2dee013
10 changed files with 193 additions and 62 deletions

8
Caddyfile Normal file
View File

@@ -0,0 +1,8 @@
:8080 {
basicauth /* {
# user: admin | pwd: 2/e+h<L9\V6;
# regenerate: htpasswd -nbB -C 10 admin NEWPWD | cut -d: -f2 | base64
admin JDJ5JDEwJHlaV2tLUzBQL2ZsSndBL2g4WDZBNk9NdEZtTnVqcThOOHZ2aXNGRVVMWHhJUDB0WHhNanZD
}
reverse_proxy localhost:4213
}

31
Dockerfile Normal file
View File

@@ -0,0 +1,31 @@
FROM debian:12-slim
ENV DEBIAN_FRONTEND=noninteractive
# System deps + Caddy
RUN apt-get update -qq && \
apt-get install -y --no-install-recommends \
python3 python3-pip python3-venv \
curl ca-certificates unzip && \
# Caddy
curl -fsSL https://caddyserver.com/install.sh | bash && \
# DuckDB CLI
curl -fsSL \
"https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip" \
-o /tmp/duckdb.zip && \
unzip /tmp/duckdb.zip -d /usr/local/bin && \
chmod +x /usr/local/bin/duckdb && \
rm /tmp/duckdb.zip && \
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip3 install --no-cache-dir --break-system-packages -r requirements.txt
COPY prepara_db.py Caddyfile start.sh ./
RUN chmod +x start.sh
EXPOSE 8080
ENTRYPOINT ["./start.sh"]

View File

@@ -101,6 +101,51 @@ duckdb --ui basedosdados.duckdb
python gera_schemas.py # gera schemas.json e file_tree.md (~21 MB de egress)
```
## Servidor com UI protegida por senha
Para expor o DuckDB UI num servidor com HTTPS e autenticação básica, use o [Caddy](https://caddyserver.com/) como reverse proxy.
**Pré-requisitos no servidor:** `caddy`, `htpasswd` (pacote `apache2-utils`), `duckdb`
**1. Instalar o serviço DuckDB UI**
Edite `duckdb-ui.service` com o usuário e caminho corretos e copie para o systemd:
```bash
# edite User= e WorkingDirectory= e EnvironmentFile= no arquivo
cp duckdb-ui.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now duckdb-ui
```
**2. Configurar o Caddy**
Edite `Caddyfile` substituindo `your.domain.com` pelo domínio real, depois:
```bash
cp Caddyfile /etc/caddy/Caddyfile
systemctl reload caddy
```
O Caddy obtém o certificado TLS via Let's Encrypt automaticamente (portas 80 e 443 abertas no firewall).
**Trocar a senha:**
```bash
htpasswd -nbB -C 10 admin NOVA_SENHA | cut -d: -f2 | base64
# cole o resultado no Caddyfile no lugar do hash atual, depois:
systemctl reload caddy
```
**Arquivos relevantes:**
| Arquivo | Função |
|---|---|
| `Caddyfile` | Config do Caddy: HTTPS + basicauth → proxy para localhost:4213 |
| `duckdb-ui.service` | Serviço systemd que sobe o DuckDB UI em background |
---
### `--gcloud-run`
Cria uma VM `e2-standard-4` Debian 12 em `us-central1-a`, copia o script e o `.env`, instala as dependências e executa via SSH. Variáveis opcionais:

16
duckdb-ui.service Normal file
View File

@@ -0,0 +1,16 @@
[Unit]
Description=DuckDB UI - basedosdados explorer
After=network.target
[Service]
Type=simple
User=YOUR_USER
WorkingDirectory=/path/to/baseldosdados
ExecStartPre=/usr/bin/python3 prepara_gui.py
ExecStart=/usr/bin/duckdb --ui basedosdados.duckdb
Restart=on-failure
RestartSec=5s
EnvironmentFile=/path/to/baseldosdados/.env
[Install]
WantedBy=multi-user.target

10
haloy.yml Normal file
View File

@@ -0,0 +1,10 @@
name: basedosdados
server: YOUR_SERVER_IP_OR_HOSTNAME
domains:
- domain: db.xn--2dk.xyz
port: 8080
env:
- HETZNER_S3_BUCKET
- HETZNER_S3_ENDPOINT
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY

6
open_gui.sh Executable file
View File

@@ -0,0 +1,6 @@
#!/bin/bash
cd "$(dirname "$0")"
INIT=$(mktemp /tmp/duckdb_init_XXXX)
printf "LOAD httpfs;\nATTACH 'basedosdados.duckdb' AS bd (READ_ONLY);\n" > "$INIT"
duckdb --ui ui.duckdb -init "$INIT"
rm -f "$INIT"

63
prepara_db.py Normal file
View File

@@ -0,0 +1,63 @@
import os
import duckdb
import boto3
from dotenv import load_dotenv
load_dotenv()
BUCKET = os.environ['HETZNER_S3_BUCKET']
ENDPOINT_URL = os.environ['HETZNER_S3_ENDPOINT']
ACCESS_KEY = os.environ['AWS_ACCESS_KEY_ID']
SECRET_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
# DuckDB expects the endpoint without scheme
s3_endpoint = ENDPOINT_URL.removeprefix('https://').removeprefix('http://')
# Lista todos os prefixos no bucket (dataset/tabela)
s3 = boto3.client('s3',
endpoint_url=ENDPOINT_URL,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
paginator = s3.get_paginator('list_objects_v2')
datasets = {}
for page in paginator.paginate(Bucket=BUCKET, Delimiter='/'):
for prefix in page.get('CommonPrefixes', []):
dataset = prefix['Prefix'].rstrip('/')
datasets[dataset] = []
for page2 in paginator.paginate(Bucket=BUCKET,
Prefix=dataset+'/',
Delimiter='/'):
for p in page2.get('CommonPrefixes', []):
table = p['Prefix'].rstrip('/').split('/')[-1]
datasets[dataset].append(table)
# Cria conexão DuckDB e configura S3
con = duckdb.connect('basedosdados3.duckdb')
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute(f"""
SET s3_endpoint='{s3_endpoint}';
SET s3_access_key_id='{ACCESS_KEY}';
SET s3_secret_access_key='{SECRET_KEY}';
SET s3_url_style='path';
""")
# Cria schemas e views
for dataset, tables in datasets.items():
con.execute(f"CREATE SCHEMA IF NOT EXISTS {dataset}")
for table in tables:
path = f"s3://{BUCKET}/{dataset}/{table}/*.parquet"
try:
con.execute(f"""
CREATE OR REPLACE VIEW {dataset}.{table} AS
SELECT * FROM read_parquet('{path}', hive_partitioning=true)
""")
print(f"{dataset}.{table}")
except Exception as e:
if 'Geoparquet' in str(e) or 'geometria' in str(e) or 'geometry' in str(e).lower():
print(f" skip (geoparquet) {dataset}.{table}")
else:
raise
con.close()
print("Done! Open with: duckdb --ui basedosdados3.duckdb")

View File

@@ -1,62 +0,0 @@
import os
import duckdb
import boto3
from dotenv import load_dotenv
load_dotenv()
S3_ENDPOINT = os.environ["HETZNER_S3_ENDPOINT"] # https://hel1.your-objectstorage.com
S3_BUCKET = os.environ["HETZNER_S3_BUCKET"] # baseldosdados
ACCESS_KEY = os.environ["AWS_ACCESS_KEY_ID"]
SECRET_KEY = os.environ["AWS_SECRET_ACCESS_KEY"]
# Strip protocol for DuckDB httpfs (expects bare hostname)
s3_host = S3_ENDPOINT.removeprefix("https://").removeprefix("http://")
con = duckdb.connect('basedosdados.duckdb')
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute(f"""
CREATE OR REPLACE PERSISTENT SECRET hetzner (
TYPE S3,
KEY_ID '{ACCESS_KEY}',
SECRET '{SECRET_KEY}',
ENDPOINT '{s3_host}',
URL_STYLE 'path'
);
""")
# List all dataset/table prefixes in the bucket
s3 = boto3.client(
's3',
endpoint_url=S3_ENDPOINT,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
)
paginator = s3.get_paginator('list_objects_v2')
datasets = {}
for page in paginator.paginate(Bucket=S3_BUCKET, Delimiter='/'):
for prefix in page.get('CommonPrefixes', []):
dataset = prefix['Prefix'].rstrip('/')
datasets[dataset] = []
for page2 in paginator.paginate(Bucket=S3_BUCKET,
Prefix=dataset + '/',
Delimiter='/'):
for p in page2.get('CommonPrefixes', []):
table = p['Prefix'].rstrip('/').split('/')[-1]
datasets[dataset].append(table)
# Create schemas and views
for dataset, tables in datasets.items():
con.execute(f"CREATE SCHEMA IF NOT EXISTS {dataset}")
for table in tables:
path = f"s3://{S3_BUCKET}/{dataset}/{table}/*.parquet"
con.execute(f"""
CREATE OR REPLACE VIEW {dataset}.{table} AS
SELECT * FROM '{path}'
""")
print(f"{dataset}.{table}")
con.close()
print("Done! Open with: duckdb --ui basedosdados.duckdb")

3
requirements.txt Normal file
View File

@@ -0,0 +1,3 @@
duckdb
boto3
python-dotenv

11
start.sh Normal file
View File

@@ -0,0 +1,11 @@
#!/bin/bash
set -euo pipefail
echo "[start] Building DuckDB views from S3..."
python3 prepara_db.py
echo "[start] Starting Caddy..."
caddy start --config /app/Caddyfile --adapter caddyfile
echo "[start] Starting DuckDB UI on :4213..."
exec duckdb --ui basedosdados3.duckdb