Eric GuimarãesBelo Horizonte, BRAvailable

Eric Guimarães

AI/ML engineer shipping production LLM systems. Building Magoquiz solo at magoquiz.com.

I put LLMs in production and build the full-stack product around them: serving, MLOps/LLMOps, evals, observability. I founded and run Magoquiz, a profitable B2B recommendation-quiz SaaS for LATAM e-commerce, built solo while finishing my CS degree. 46 paying customers, 128k quiz-takers.

Inside it I shipped a self-hosted LLM scoring service that runs about 200x cheaper per call than the hosted API. I distilled a Sonnet judge into a fine-tuned Llama 3.1 8B with LoRA/QLoRA on Modal, then compiled it to a TensorRT-LLM engine behind NVIDIA Triton on a scale-to-zero GPU, serving at roughly $0.0003 per score. Around the models I run the LLMOps that keeps them honest: a CI gate that blocks regressions with an LLM-as-judge, a W&B model registry, and a daily drift monitor over Langfuse traces.

I'm a full-stack engineer too. One integration (OAuth plus webhooks) drove 47x ROI and a 236% conversion lift in 10 days, and the platform has served 30k requests a day for 18 months with no customer-facing outages. Before Magoquiz I spent three years at SYDLE: multi-tenant rewrites, a 60k-page SEO platform that pulled over a million organic impressions a month, and a year as the technical point of contact between engineering, product, design, and execs.

/ Work

/ Selected projects

Self-hosted LLM judge: distillation + serving

Distilled a Sonnet judge into a fine-tuned Llama 3.1 8B (QLoRA on Modal A100), compiled to a TensorRT-LLM engine behind NVIDIA Triton on a scale-to-zero L4 GPU, fronted by a typed FastAPI gateway, serving at ~$0.0003 per score (about 200x cheaper than the hosted API). LLMOps around it: CI eval gate, W&B model registry, Langfuse drift monitor.

QLoRA · TensorRT-LLM · Triton · FastAPI · Modal · W&B · Langfuse

Magoquiz · 2025

Multi-model LLM comparison for recommendation

Comparative study across providers (Gemini, GPT) and architectures (single-shot LangChain vs. multi-step LangGraph + RAG) applied to e-commerce recommendation quizzes. Scientific method, statistical analysis.

LangGraph · LangChain · RAG · Gemini · GPT

Bachelor's thesis (TCC) · 2025

BlockFace

Tamper-proof surveillance system on a custom blockchain (Python + Go). Awarded Best of Semester with a monetary prize.

Blockchain · Python · Go · Computer Vision

PUC Minas · 2024 · Best of Semester

Liver Analysis

Medical-image classifier using a custom CNN with reinforcement-learning principles (Python + Streamlit). State-of-the-art results; submitted for academic publication.

PyTorch · CNN · Reinforcement Learning

PUC Minas · 2024 · Submitted for publication

Astral Intelligence

Personalized AI mobile app (Flutter) using RAG over a Supabase vector store, served through a custom reverse proxy.

Flutter · RAG · Supabase

PUC Minas · 2023

/ Experience

2024 — Present
Belo Horizonte, BR (Remote)

Founder & AI/ML Engineer · Magoquiz

  • Built a self-hosted LLM scoring service that runs about 200x cheaper per call than the hosted API. Distilled a Sonnet judge into a fine-tuned Llama 3.1 8B (LoRA/QLoRA on Modal), compiled it to a TensorRT-LLM engine behind NVIDIA Triton on a scale-to-zero L4 GPU, serving at ~$0.0003 per score (P50 1.3s).
  • Made the judge provider-portable across AWS Bedrock (Converse API, IAM, cross-region inference profiles) and a self-hosted GPU behind one interface, switchable with a single environment variable.
  • MLOps/LLMOps: a CI quality gate (GitHub Actions) that blocks AI quiz-generator regressions with an LLM-as-judge on every relevant PR, a Weights & Biases model registry, and a daily drift monitor over Langfuse traces that opens a ticket when quality drops.
  • Ran a pre-registered experiment over 13 model/prompt setups against an 88-quiz benchmark tied to 130k sessions. Scored each with an LLM-as-judge calibrated to real conversion, used paired significance tests, and confirmed the production setup was already cost-optimal. Published a public W&B report.
  • Built a hybrid-search RAG (dense + BM25 fused with Reciprocal Rank Fusion) in LlamaIndex, benchmarked on hit-rate@5 and MRR. Hybrid beat dense-only by 20pp.
  • Drove 47x ROI and a 236% conversion lift in 10 days for a strategic client, through an OAuth 2.0 (PKCE) + webhooks integration with Yampi, Resend, and PostHog.
  • Built the full-stack platform solo on a four-piece architecture (edge-deployed SvelteKit on Cloudflare Workers, a background-job worker, a token-refresh service, and a workflow runner) on PostgreSQL, serving 30k requests a day for 18 months with no customer-facing outages.
  • Shipped 9 e-commerce integrations on reusable OAuth + webhook abstractions (about 70% less time per integration), a programmatic-SEO pipeline of 2,500+ AI-reviewed posts (+300% organic impressions), and 230+ tests (unit, integration, E2E) behind a CI/CD pipeline.
2020 — 2023
Belo Horizonte, BR

Full-Stack Developer · SYDLE

  • Three years on a 6-person scrum team (PM, designer, QA, engineers). In the final year I was the primary technical point of contact between engineering, product, design, and execs.
  • Scoped and disambiguated 100+ tickets with product, design, and QA across legacy and greenfield codebases.
  • Mentored and onboarded 3 engineers; led code reviews and architectural guidance.
  • Established CI/CD across 3 environments with the DevOps team: fewer deploy errors, shorter delivery loop.
  • Sole engineer on Rede Doglife (pet-health aggregator), working in a tight loop with the CEO and design. Scaled it to 60k SEO-optimized pages across 5k cities for 1M+ monthly organic impressions.
  • Refactored a legacy Angular/TypeScript checkout from 80s to 30s (-62%) and killed recurring crashes. Tuned 30+ Elasticsearch queries for 50% lower latency.
  • Led full-stack SEO across 200+ legacy pages, lifting Lighthouse from 50 to 90 (+80%) in one month.
  • Refactored the backend (20+ queries, endpoints, functions) into multi-tenant / white-label, enabling launch of a second brand on the same codebase.

/ Toolbox

AI / LLMs
LLM integration (RAG, agents, MCP) · LangChain · LangGraph · LlamaIndex · Prompt engineering · LLM-as-judge evals · Fine-tuning (LoRA/QLoRA) · Llama 3.1 · Anthropic (Claude Code) · OpenAI · Embeddings / vector DBs (pgvector) · PyTorch · TensorFlow
LLMOps / Serving
NVIDIA Triton · TensorRT-LLM · Modal · AWS Bedrock · Weights & Biases · Langfuse / OpenTelemetry · Hugging Face · CI eval gates · Model registry & drift monitoring · Distillation / quantization · Scale-to-zero GPU
Languages
Python · TypeScript · JavaScript · Node.js · Go · Java
Backend
FastAPI · Pydantic v2 · REST · Serverless
Frontend
Svelte · SvelteKit · React · Next.js · Angular · TailwindCSS
Data
PostgreSQL · Supabase · pgvector · Elasticsearch
Cloud / DevOps
AWS · Cloudflare Workers · Vercel · Docker · CI/CD · n8n · Trigger.dev
Integrations
OAuth 2.0 (PKCE) · Webhooks · Stripe · PostHog · Google Analytics · GTM · Facebook Pixel
Quality
Cypress · Vitest · E2E · Storybook
Languages
PortugueseNative·EnglishC1 — Fluent·MandarinB1·SpanishA2
Education
BSc, Computer Science·PUC Minas2022 — 2025
Technical High School, Systems Development·COLTEC / UFMG2018 — 2021

/ Contact

I'm looking for senior roles that combine end-to-end product architecture, close cross-functional collaboration, and measurable business impact. The fastest way to reach me is email — happy to chat about LLM systems, SaaS, or anything in between.

Email
ericmfgui@gmail.com
LinkedIn
linkedin.com/in/ericmfgui
GitHub
github.com/mifegui
CV
PDF — eric-guimaraes-cv.pdf