Commit Graph

39 Commits

Author SHA1 Message Date
2446a0f78d docs: add census documentation and final research report
- Add 37 census documentation files for IBGE census datasets (1970-2010)
- Add dataviz wordcloud scripts and images
- Add relatorio_final.md with research findings on households and living conditions

New data from DuckDB queries:
- 90.7M households, 203M population
- 53.2% Black population
- 27.9% female-headed households
- 46.6% urban sewage without collection/treatment
- 15,816 favela sectors (2010)
- 68% Black population in Fortaleza
2026-03-30 22:03:55 +02:00
e9b680379b docs: add full descriptions for 1,411 v* variables in br_ibge_censo_2022.setor_censitario from IBGE official dictionary 2026-03-30 19:04:17 +02:00
c9a777b5fb fix deploy 2026-03-30 18:30:36 +02:00
18e360c70a docs: add profile of prison population from 2010 census microdata
- Add profile of 6,126 people in collective dwellings (v4002=63)
  with demographics: gender, race, education, age, civil status
- Add detailed analysis of 503 minors: 349 likely prisoners (v0502=20),
  154 dependents of staff/prisoners
- Add breakdown of female prisoners: higher education and whiter than male prisoners
- Fix language inconsistencies (Spanish, Chinese, English terms)
- Add documentation for br_ibge_censo_2022 setor_censitario (v* variables)
- Add documentation for prison population identification across census datasets
2026-03-30 11:50:29 +02:00
ab83e6be90 Update populacao_carceraria_2010_result.md 2026-03-30 01:14:34 +02:00
2fa85d8897 audit: populacao carceraria feminina 2026-03-30 00:43:25 +02:00
005de8a8af remove .cocoindex_code from git tracking and ignore it 2026-03-29 21:45:27 +02:00
86f9aa801f ignore it 2026-03-29 21:44:46 +02:00
775cd1aa47 refactor: flatten shell/ dir and add ask screenshot to README 2026-03-29 20:56:50 +02:00
ed5fa6756e refactor: reorganize project structure and fix broken references
- Move scripts to scripts/ directory (roda.sh, prepara_db.py, etc.)
- Move shell config to shell/ directory (Caddyfile, auth.py, haloy.yml)
- Move basedosdados.duckdb to data/ directory
- Update Dockerfile and start.sh with new file paths
- Update README.md with correct script paths
- Remove Python ask.py (replaced by Rust binary in ask/ask)
- Add Rust source files (schema_filter.rs, sql_generator.rs, table_selector.rs)
- Remove sentence-transformer dependencies from ask
- Move docs and context artifacts to their directories
2026-03-29 20:46:27 +02:00
02cb13362c audit: Add Considerações 2026-03-29 17:42:27 +02:00
36acd1320c feat: add --sync to export BQ tables directly to S3 without GCS intermediary 2026-03-29 17:39:13 +02:00
43e5ae6723 docs: add data sources from mcp-brasil with auth and format metadata 2026-03-29 16:05:03 +02:00
ac175b35b4 wip: audit results 2026-03-29 01:48:26 +01:00
d24f76a18d wip: audit report plan 2026-03-29 01:03:35 +01:00
3788e2cc81 fix: use Python ask.py instead of Rust binary to avoid compilation 2026-03-28 15:46:48 +01:00
533b9d265e fix: add .dockerignore and debug binary in Dockerfile 2026-03-28 15:36:06 +01:00
0c1f09529a fix: build ask binary inside Docker for Linux x86_64 2026-03-28 15:09:17 +01:00
36eb480687 fix: add explicit TLS email for Caddy certificates 2026-03-28 13:03:28 +01:00
71cd4fd04d fix: use explicit HTTPS site blocks for each domain 2026-03-28 13:01:29 +01:00
e2670077da fix: simplify Caddyfile handle directives 2026-03-28 12:38:20 +01:00
f34cc15991 fix: correct Caddyfile syntax for domain-based routing 2026-03-28 12:20:36 +01:00
a6509d8b30 Add logging to ask app: save questions, SQLs, success/error status, and timestamps to logs/log.json 2026-03-28 12:17:34 +01:00
8f62a79bbe feat: deploy ask TUI to ask.xn--2dk.xyz
- Add ttyd service for ask on port 7682
- Update haloy.yml with new domain and GEMINI_API_KEY
- Update Caddyfile to route ask.xn--2dk.xyz to ttyd
- Update Dockerfile to include ask binary
- Update README with ask section and schema files documentation
2026-03-28 12:12:31 +01:00
e1c2377343 feat(ask): add text wrapping for wide table columns
- Implement wrap_text function to handle long cell content
- Auto-wrap table columns when content exceeds available width
- Preserve original table rendering for fits-all cases
- Remove sample_datasets project (no longer needed)
- Update .gitignore to use wildcard for target dirs
2026-03-28 11:59:02 +01:00
c142080a5d schema: revert Phase 3 to S3 bigquery_tables enrichment
GraphQL approach had broken pagination (totalCount key missing, crashes
silently). S3 approach at least completes cleanly even if the metadata
table currently lacks a description column.
2026-03-28 11:27:54 +01:00
b5d84e3556 feat: add LLM SQL query assistant and dataset sampler
- ask.py: Python script to query Base dos Dados via natural language using Gemini,
  generates and executes DuckDB SQL from Portuguese questions
- ask/ (Rust): CLI companion for the SQL query assistant with system prompt
- sample_datasets.py: samples parquet files from S3 into a local DuckDB for exploration
- sample_datasets/ (Rust): CLI for dataset sampling
- context/: LLM context bundle (schemas, join keys, file tree) for query generation
2026-03-28 11:23:51 +01:00
6801db427e schema: use BD GraphQL API for enrichment, add file tree and schema artifacts
- Replace S3 bigquery_tables metadata lookup with paginated GraphQL API call
  to fetch table and column descriptions from Base dos Dados
- Add gera_schemas.py for schema compilation and S3 inventory
- Add schemas.json and file_tree.md as generated reference artifacts
- Add websocket proxy in Caddyfile for ttyd on port 7681
- Ignore generated context/ artifacts in .gitignore
- Add openai to requirements.txt
2026-03-28 11:23:38 +01:00
ed81e52254 docs: reorder README for data users, remove unused files (xdg-open, gera_schemas.py, open_gui.sh, docs/) 2026-03-26 12:01:46 +01:00
5239a03ea8 docs: expand /query curl usage, remove outdated UI references 2026-03-26 11:58:14 +01:00
41e7f7a972 replace duckdb-ui with ttyd shell: add /query HTTP endpoint, fix utf-8/locale, region config
- swap DuckDB UI for ttyd web terminal (--writable, -readonly db)
- add POST /query endpoint with X-Password auth for curl-based SQL execution
- fix UTF-8 rendering: set LANG/LC_ALL=C.UTF-8 in container
- pass BUCKET_REGION env var for correct S3 signing region
- simplify start.sh: drop Xvfb, views.duckdb generation, blocking duckdb -ui
- add less, ncurses-bin to Dockerfile for proper pager/terminal support
- update Caddyfile: single route to ttyd with flush_interval -1 for websocket
- update README to reflect current architecture and document /query usage
- remove duckdb-ui.service, schemas.json, file_tree.md (generated artifacts)
2026-03-26 11:54:46 +01:00
cd94603fac update haloy config: use hostname for server, fix env var format 2026-03-25 13:39:15 +01:00
8c22944bbb fix duckdb filename, add db file and gitignore cleanup
- Dockerfile + start.sh: use basedosdados.duckdb (not basedosdados3.duckdb)
- add basedosdados.duckdb (3.5 MB, needed for Docker build)
- add requirements.txt (local dev use)
- .gitignore: remove *.duckdb exclusion, add .DS_Store
2026-03-25 13:30:25 +01:00
0d77f83045 simplify container: skip db prep, password via env var, fixed server IP
- start.sh: remove prepara_db.py step; load S3 creds via DuckDB init file
- Caddyfile: switch to basic_auth with {env.BASIC_AUTH_HASH} — no rebuild to rotate password
- Dockerfile: drop Python/pip layers (no longer needed at runtime)
- haloy.yml: set server to 89.167.95.136, add BASIC_AUTH_HASH to env
- remove requirements.txt (only needed for local prepara_db.py, not the container)
2026-03-25 13:27:51 +01:00
9eb2dee013 containerize with Haloy: Dockerfile, Caddy basicauth, haloy.yml for db.xn--2dk.xyz
- Dockerfile: debian slim, installs DuckDB CLI, Python deps, Caddy
- start.sh: runs prepara_db.py → starts Caddy (basicauth) → starts DuckDB UI
- Caddyfile: updated for container (no TLS, port 8080, Haloy handles HTTPS)
- haloy.yml: deploys to db.xn--2dk.xyz on port 8080
- requirements.txt: duckdb, boto3, python-dotenv
- prepara_db.py, open_gui.sh, duckdb-ui.service: add previously untracked files
- remove prepara_gui.py (replaced by prepara_db.py)
2026-03-25 13:23:59 +01:00
03758acdd9 add schema dump: parquet footer reader generating schemas.json and file_tree.md 2026-03-25 10:13:40 +01:00
4572fcb28e add DuckDB explorer: creates views over S3 parquets for local querying 2026-03-25 10:13:37 +01:00
dd221cff88 add export pipeline: BigQuery → GCS → Hetzner S3 (roda.sh) 2026-03-25 10:13:34 +01:00
335abbfa2f add project setup: gitignore, env sample, readme 2026-03-25 10:13:31 +01:00