Compare commits

...

3 Commits

Author SHA1 Message Date
8c22944bbb fix duckdb filename, add db file and gitignore cleanup
- Dockerfile + start.sh: use basedosdados.duckdb (not basedosdados3.duckdb)
- add basedosdados.duckdb (3.5 MB, needed for Docker build)
- add requirements.txt (local dev use)
- .gitignore: remove *.duckdb exclusion, add .DS_Store
2026-03-25 13:30:25 +01:00
0d77f83045 simplify container: skip db prep, password via env var, fixed server IP
- start.sh: remove prepara_db.py step; load S3 creds via DuckDB init file
- Caddyfile: switch to basic_auth with {env.BASIC_AUTH_HASH} — no rebuild to rotate password
- Dockerfile: drop Python/pip layers (no longer needed at runtime)
- haloy.yml: set server to 89.167.95.136, add BASIC_AUTH_HASH to env
- remove requirements.txt (only needed for local prepara_db.py, not the container)
2026-03-25 13:27:51 +01:00
9eb2dee013 containerize with Haloy: Dockerfile, Caddy basicauth, haloy.yml for db.xn--2dk.xyz
- Dockerfile: debian slim, installs DuckDB CLI, Python deps, Caddy
- start.sh: runs prepara_db.py → starts Caddy (basicauth) → starts DuckDB UI
- Caddyfile: updated for container (no TLS, port 8080, Haloy handles HTTPS)
- haloy.yml: deploys to db.xn--2dk.xyz on port 8080
- requirements.txt: duckdb, boto3, python-dotenv
- prepara_db.py, open_gui.sh, duckdb-ui.service: add previously untracked files
- remove prepara_gui.py (replaced by prepara_db.py)
2026-03-25 13:23:59 +01:00
12 changed files with 193 additions and 64 deletions

3
.gitignore vendored
View File

@@ -1,6 +1,5 @@
.env
.DS_Store
logs/
*.duckdb
*.duckdb.wal
done_tables.txt
done_transfers.txt

7
Caddyfile Normal file
View File

@@ -0,0 +1,7 @@
:8080 {
basic_auth /* {
# Set BASIC_AUTH_HASH on the server: caddy hash-password --plaintext 'YOUR_PWD'
admin {env.BASIC_AUTH_HASH}
}
reverse_proxy localhost:4213
}

22
Dockerfile Normal file
View File

@@ -0,0 +1,22 @@
FROM debian:12-slim
RUN apt-get update -qq && \
apt-get install -y --no-install-recommends \
curl ca-certificates unzip && \
curl -fsSL https://caddyserver.com/install.sh | bash && \
curl -fsSL \
"https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip" \
-o /tmp/duckdb.zip && \
unzip /tmp/duckdb.zip -d /usr/local/bin && \
chmod +x /usr/local/bin/duckdb && \
rm /tmp/duckdb.zip && \
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY basedosdados.duckdb Caddyfile start.sh ./
RUN chmod +x start.sh
EXPOSE 8080
ENTRYPOINT ["./start.sh"]

View File

@@ -101,6 +101,51 @@ duckdb --ui basedosdados.duckdb
python gera_schemas.py # gera schemas.json e file_tree.md (~21 MB de egress)
```
## Servidor com UI protegida por senha
Para expor o DuckDB UI num servidor com HTTPS e autenticação básica, use o [Caddy](https://caddyserver.com/) como reverse proxy.
**Pré-requisitos no servidor:** `caddy`, `htpasswd` (pacote `apache2-utils`), `duckdb`
**1. Instalar o serviço DuckDB UI**
Edite `duckdb-ui.service` com o usuário e caminho corretos e copie para o systemd:
```bash
# edite User= e WorkingDirectory= e EnvironmentFile= no arquivo
cp duckdb-ui.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now duckdb-ui
```
**2. Configurar o Caddy**
Edite `Caddyfile` substituindo `your.domain.com` pelo domínio real, depois:
```bash
cp Caddyfile /etc/caddy/Caddyfile
systemctl reload caddy
```
O Caddy obtém o certificado TLS via Let's Encrypt automaticamente (portas 80 e 443 abertas no firewall).
**Trocar a senha:**
```bash
htpasswd -nbB -C 10 admin NOVA_SENHA | cut -d: -f2 | base64
# cole o resultado no Caddyfile no lugar do hash atual, depois:
systemctl reload caddy
```
**Arquivos relevantes:**
| Arquivo | Função |
|---|---|
| `Caddyfile` | Config do Caddy: HTTPS + basicauth → proxy para localhost:4213 |
| `duckdb-ui.service` | Serviço systemd que sobe o DuckDB UI em background |
---
### `--gcloud-run`
Cria uma VM `e2-standard-4` Debian 12 em `us-central1-a`, copia o script e o `.env`, instala as dependências e executa via SSH. Variáveis opcionais:

BIN
basedosdados.duckdb Normal file

Binary file not shown.

16
duckdb-ui.service Normal file
View File

@@ -0,0 +1,16 @@
[Unit]
Description=DuckDB UI - basedosdados explorer
After=network.target
[Service]
Type=simple
User=YOUR_USER
WorkingDirectory=/path/to/baseldosdados
ExecStartPre=/usr/bin/python3 prepara_gui.py
ExecStart=/usr/bin/duckdb --ui basedosdados.duckdb
Restart=on-failure
RestartSec=5s
EnvironmentFile=/path/to/baseldosdados/.env
[Install]
WantedBy=multi-user.target

10
haloy.yml Normal file
View File

@@ -0,0 +1,10 @@
name: basedosdados
server: 89.167.95.136
domains:
- domain: db.xn--2dk.xyz
port: 8080
env:
- HETZNER_S3_ENDPOINT
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- BASIC_AUTH_HASH

6
open_gui.sh Executable file
View File

@@ -0,0 +1,6 @@
#!/bin/bash
cd "$(dirname "$0")"
INIT=$(mktemp /tmp/duckdb_init_XXXX)
printf "LOAD httpfs;\nATTACH 'basedosdados.duckdb' AS bd (READ_ONLY);\n" > "$INIT"
duckdb --ui ui.duckdb -init "$INIT"
rm -f "$INIT"

63
prepara_db.py Normal file
View File

@@ -0,0 +1,63 @@
import os
import duckdb
import boto3
from dotenv import load_dotenv
load_dotenv()
BUCKET = os.environ['HETZNER_S3_BUCKET']
ENDPOINT_URL = os.environ['HETZNER_S3_ENDPOINT']
ACCESS_KEY = os.environ['AWS_ACCESS_KEY_ID']
SECRET_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
# DuckDB expects the endpoint without scheme
s3_endpoint = ENDPOINT_URL.removeprefix('https://').removeprefix('http://')
# Lista todos os prefixos no bucket (dataset/tabela)
s3 = boto3.client('s3',
endpoint_url=ENDPOINT_URL,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
paginator = s3.get_paginator('list_objects_v2')
datasets = {}
for page in paginator.paginate(Bucket=BUCKET, Delimiter='/'):
for prefix in page.get('CommonPrefixes', []):
dataset = prefix['Prefix'].rstrip('/')
datasets[dataset] = []
for page2 in paginator.paginate(Bucket=BUCKET,
Prefix=dataset+'/',
Delimiter='/'):
for p in page2.get('CommonPrefixes', []):
table = p['Prefix'].rstrip('/').split('/')[-1]
datasets[dataset].append(table)
# Cria conexão DuckDB e configura S3
con = duckdb.connect('basedosdados3.duckdb')
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute(f"""
SET s3_endpoint='{s3_endpoint}';
SET s3_access_key_id='{ACCESS_KEY}';
SET s3_secret_access_key='{SECRET_KEY}';
SET s3_url_style='path';
""")
# Cria schemas e views
for dataset, tables in datasets.items():
con.execute(f"CREATE SCHEMA IF NOT EXISTS {dataset}")
for table in tables:
path = f"s3://{BUCKET}/{dataset}/{table}/*.parquet"
try:
con.execute(f"""
CREATE OR REPLACE VIEW {dataset}.{table} AS
SELECT * FROM read_parquet('{path}', hive_partitioning=true)
""")
print(f"{dataset}.{table}")
except Exception as e:
if 'Geoparquet' in str(e) or 'geometria' in str(e) or 'geometry' in str(e).lower():
print(f" skip (geoparquet) {dataset}.{table}")
else:
raise
con.close()
print("Done! Open with: duckdb --ui basedosdados3.duckdb")

View File

@@ -1,62 +0,0 @@
import os
import duckdb
import boto3
from dotenv import load_dotenv
load_dotenv()
S3_ENDPOINT = os.environ["HETZNER_S3_ENDPOINT"] # https://hel1.your-objectstorage.com
S3_BUCKET = os.environ["HETZNER_S3_BUCKET"] # baseldosdados
ACCESS_KEY = os.environ["AWS_ACCESS_KEY_ID"]
SECRET_KEY = os.environ["AWS_SECRET_ACCESS_KEY"]
# Strip protocol for DuckDB httpfs (expects bare hostname)
s3_host = S3_ENDPOINT.removeprefix("https://").removeprefix("http://")
con = duckdb.connect('basedosdados.duckdb')
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute(f"""
CREATE OR REPLACE PERSISTENT SECRET hetzner (
TYPE S3,
KEY_ID '{ACCESS_KEY}',
SECRET '{SECRET_KEY}',
ENDPOINT '{s3_host}',
URL_STYLE 'path'
);
""")
# List all dataset/table prefixes in the bucket
s3 = boto3.client(
's3',
endpoint_url=S3_ENDPOINT,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
)
paginator = s3.get_paginator('list_objects_v2')
datasets = {}
for page in paginator.paginate(Bucket=S3_BUCKET, Delimiter='/'):
for prefix in page.get('CommonPrefixes', []):
dataset = prefix['Prefix'].rstrip('/')
datasets[dataset] = []
for page2 in paginator.paginate(Bucket=S3_BUCKET,
Prefix=dataset + '/',
Delimiter='/'):
for p in page2.get('CommonPrefixes', []):
table = p['Prefix'].rstrip('/').split('/')[-1]
datasets[dataset].append(table)
# Create schemas and views
for dataset, tables in datasets.items():
con.execute(f"CREATE SCHEMA IF NOT EXISTS {dataset}")
for table in tables:
path = f"s3://{S3_BUCKET}/{dataset}/{table}/*.parquet"
con.execute(f"""
CREATE OR REPLACE VIEW {dataset}.{table} AS
SELECT * FROM '{path}'
""")
print(f"{dataset}.{table}")
con.close()
print("Done! Open with: duckdb --ui basedosdados.duckdb")

3
requirements.txt Normal file
View File

@@ -0,0 +1,3 @@
duckdb
boto3
python-dotenv

20
start.sh Normal file
View File

@@ -0,0 +1,20 @@
#!/bin/bash
set -euo pipefail
# DuckDB init: load S3 credentials from env at session start
INIT=$(mktemp /tmp/duckdb_init_XXXX.sql)
S3_ENDPOINT="${HETZNER_S3_ENDPOINT#https://}"
S3_ENDPOINT="${S3_ENDPOINT#http://}"
cat > "$INIT" <<SQL
INSTALL httpfs; LOAD httpfs;
SET s3_endpoint='${S3_ENDPOINT}';
SET s3_access_key_id='${AWS_ACCESS_KEY_ID}';
SET s3_secret_access_key='${AWS_SECRET_ACCESS_KEY}';
SET s3_url_style='path';
SQL
echo "[start] Starting Caddy..."
caddy start --config /app/Caddyfile --adapter caddyfile
echo "[start] Starting DuckDB UI..."
exec duckdb --ui -init "$INIT" basedosdados.duckdb