baseldosdados

Author	SHA1	Message	Date
rafapolo	2446a0f78d	docs: add census documentation and final research report - Add 37 census documentation files for IBGE census datasets (1970-2010) - Add dataviz wordcloud scripts and images - Add relatorio_final.md with research findings on households and living conditions New data from DuckDB queries: - 90.7M households, 203M population - 53.2% Black population - 27.9% female-headed households - 46.6% urban sewage without collection/treatment - 15,816 favela sectors (2010) - 68% Black population in Fortaleza	2026-03-30 22:03:55 +02:00
rafapolo	e9b680379b	docs: add full descriptions for 1,411 v* variables in br_ibge_censo_2022.setor_censitario from IBGE official dictionary	2026-03-30 19:04:17 +02:00
rafapolo	c9a777b5fb	fix deploy	2026-03-30 18:30:36 +02:00
rafapolo	18e360c70a	docs: add profile of prison population from 2010 census microdata - Add profile of 6,126 people in collective dwellings (v4002=63) with demographics: gender, race, education, age, civil status - Add detailed analysis of 503 minors: 349 likely prisoners (v0502=20), 154 dependents of staff/prisoners - Add breakdown of female prisoners: higher education and whiter than male prisoners - Fix language inconsistencies (Spanish, Chinese, English terms) - Add documentation for br_ibge_censo_2022 setor_censitario (v* variables) - Add documentation for prison population identification across census datasets	2026-03-30 11:50:29 +02:00
rafapolo	ab83e6be90	Update populacao_carceraria_2010_result.md	2026-03-30 01:14:34 +02:00
rafapolo	2fa85d8897	audit: populacao carceraria feminina	2026-03-30 00:43:25 +02:00
rafapolo	005de8a8af	remove .cocoindex_code from git tracking and ignore it	2026-03-29 21:45:27 +02:00
rafapolo	86f9aa801f	ignore it	2026-03-29 21:44:46 +02:00
rafapolo	775cd1aa47	refactor: flatten shell/ dir and add ask screenshot to README	2026-03-29 20:56:50 +02:00
rafapolo	ed5fa6756e	refactor: reorganize project structure and fix broken references - Move scripts to scripts/ directory (roda.sh, prepara_db.py, etc.) - Move shell config to shell/ directory (Caddyfile, auth.py, haloy.yml) - Move basedosdados.duckdb to data/ directory - Update Dockerfile and start.sh with new file paths - Update README.md with correct script paths - Remove Python ask.py (replaced by Rust binary in ask/ask) - Add Rust source files (schema_filter.rs, sql_generator.rs, table_selector.rs) - Remove sentence-transformer dependencies from ask - Move docs and context artifacts to their directories	2026-03-29 20:46:27 +02:00
rafapolo	02cb13362c	audit: Add Considerações	2026-03-29 17:42:27 +02:00
rafapolo	36acd1320c	feat: add --sync to export BQ tables directly to S3 without GCS intermediary	2026-03-29 17:39:13 +02:00
rafapolo	43e5ae6723	docs: add data sources from mcp-brasil with auth and format metadata	2026-03-29 16:05:03 +02:00
rafapolo	ac175b35b4	wip: audit results	2026-03-29 01:48:26 +01:00
rafapolo	d24f76a18d	wip: audit report plan	2026-03-29 01:03:35 +01:00
rafapolo	3788e2cc81	fix: use Python ask.py instead of Rust binary to avoid compilation	2026-03-28 15:46:48 +01:00
rafapolo	533b9d265e	fix: add .dockerignore and debug binary in Dockerfile	2026-03-28 15:36:06 +01:00
rafapolo	0c1f09529a	fix: build ask binary inside Docker for Linux x86_64	2026-03-28 15:09:17 +01:00
rafapolo	36eb480687	fix: add explicit TLS email for Caddy certificates	2026-03-28 13:03:28 +01:00
rafapolo	71cd4fd04d	fix: use explicit HTTPS site blocks for each domain	2026-03-28 13:01:29 +01:00
rafapolo	e2670077da	fix: simplify Caddyfile handle directives	2026-03-28 12:38:20 +01:00
rafapolo	f34cc15991	fix: correct Caddyfile syntax for domain-based routing	2026-03-28 12:20:36 +01:00
rafapolo	a6509d8b30	Add logging to ask app: save questions, SQLs, success/error status, and timestamps to logs/log.json	2026-03-28 12:17:34 +01:00
rafapolo	8f62a79bbe	feat: deploy ask TUI to ask.xn--2dk.xyz - Add ttyd service for ask on port 7682 - Update haloy.yml with new domain and GEMINI_API_KEY - Update Caddyfile to route ask.xn--2dk.xyz to ttyd - Update Dockerfile to include ask binary - Update README with ask section and schema files documentation	2026-03-28 12:12:31 +01:00
rafapolo	e1c2377343	feat(ask): add text wrapping for wide table columns - Implement wrap_text function to handle long cell content - Auto-wrap table columns when content exceeds available width - Preserve original table rendering for fits-all cases - Remove sample_datasets project (no longer needed) - Update .gitignore to use wildcard for target dirs	2026-03-28 11:59:02 +01:00
rafapolo	c142080a5d	schema: revert Phase 3 to S3 bigquery_tables enrichment GraphQL approach had broken pagination (totalCount key missing, crashes silently). S3 approach at least completes cleanly even if the metadata table currently lacks a description column.	2026-03-28 11:27:54 +01:00
rafapolo	b5d84e3556	feat: add LLM SQL query assistant and dataset sampler - ask.py: Python script to query Base dos Dados via natural language using Gemini, generates and executes DuckDB SQL from Portuguese questions - ask/ (Rust): CLI companion for the SQL query assistant with system prompt - sample_datasets.py: samples parquet files from S3 into a local DuckDB for exploration - sample_datasets/ (Rust): CLI for dataset sampling - context/: LLM context bundle (schemas, join keys, file tree) for query generation	2026-03-28 11:23:51 +01:00
rafapolo	6801db427e	schema: use BD GraphQL API for enrichment, add file tree and schema artifacts - Replace S3 bigquery_tables metadata lookup with paginated GraphQL API call to fetch table and column descriptions from Base dos Dados - Add gera_schemas.py for schema compilation and S3 inventory - Add schemas.json and file_tree.md as generated reference artifacts - Add websocket proxy in Caddyfile for ttyd on port 7681 - Ignore generated context/ artifacts in .gitignore - Add openai to requirements.txt	2026-03-28 11:23:38 +01:00
rafapolo	ed81e52254	docs: reorder README for data users, remove unused files (xdg-open, gera_schemas.py, open_gui.sh, docs/)	2026-03-26 12:01:46 +01:00
rafapolo	5239a03ea8	docs: expand /query curl usage, remove outdated UI references	2026-03-26 11:58:14 +01:00
rafapolo	41e7f7a972	replace duckdb-ui with ttyd shell: add /query HTTP endpoint, fix utf-8/locale, region config - swap DuckDB UI for ttyd web terminal (--writable, -readonly db) - add POST /query endpoint with X-Password auth for curl-based SQL execution - fix UTF-8 rendering: set LANG/LC_ALL=C.UTF-8 in container - pass BUCKET_REGION env var for correct S3 signing region - simplify start.sh: drop Xvfb, views.duckdb generation, blocking duckdb -ui - add less, ncurses-bin to Dockerfile for proper pager/terminal support - update Caddyfile: single route to ttyd with flush_interval -1 for websocket - update README to reflect current architecture and document /query usage - remove duckdb-ui.service, schemas.json, file_tree.md (generated artifacts)	2026-03-26 11:54:46 +01:00
rafapolo	cd94603fac	update haloy config: use hostname for server, fix env var format	2026-03-25 13:39:15 +01:00
rafapolo	8c22944bbb	fix duckdb filename, add db file and gitignore cleanup - Dockerfile + start.sh: use basedosdados.duckdb (not basedosdados3.duckdb) - add basedosdados.duckdb (3.5 MB, needed for Docker build) - add requirements.txt (local dev use) - .gitignore: remove *.duckdb exclusion, add .DS_Store	2026-03-25 13:30:25 +01:00
rafapolo	0d77f83045	simplify container: skip db prep, password via env var, fixed server IP - start.sh: remove prepara_db.py step; load S3 creds via DuckDB init file - Caddyfile: switch to basic_auth with {env.BASIC_AUTH_HASH} — no rebuild to rotate password - Dockerfile: drop Python/pip layers (no longer needed at runtime) - haloy.yml: set server to 89.167.95.136, add BASIC_AUTH_HASH to env - remove requirements.txt (only needed for local prepara_db.py, not the container)	2026-03-25 13:27:51 +01:00
rafapolo	9eb2dee013	containerize with Haloy: Dockerfile, Caddy basicauth, haloy.yml for db.xn--2dk.xyz - Dockerfile: debian slim, installs DuckDB CLI, Python deps, Caddy - start.sh: runs prepara_db.py → starts Caddy (basicauth) → starts DuckDB UI - Caddyfile: updated for container (no TLS, port 8080, Haloy handles HTTPS) - haloy.yml: deploys to db.xn--2dk.xyz on port 8080 - requirements.txt: duckdb, boto3, python-dotenv - prepara_db.py, open_gui.sh, duckdb-ui.service: add previously untracked files - remove prepara_gui.py (replaced by prepara_db.py)	2026-03-25 13:23:59 +01:00
rafapolo	03758acdd9	add schema dump: parquet footer reader generating schemas.json and file_tree.md	2026-03-25 10:13:40 +01:00
rafapolo	4572fcb28e	add DuckDB explorer: creates views over S3 parquets for local querying	2026-03-25 10:13:37 +01:00
rafapolo	dd221cff88	add export pipeline: BigQuery → GCS → Hetzner S3 (roda.sh)	2026-03-25 10:13:34 +01:00
rafapolo	335abbfa2f	add project setup: gitignore, env sample, readme	2026-03-25 10:13:31 +01:00

39 Commits