Books 📚
Generative AI in Action
Manning Publications, 2024
A comprehensive guide to building transformative AI solutions with large language models. Topics include LLMs, prompt engineering, Retrieval-Augmented Generation (RAG), model adaptation and fine-tuning techniques (SFT, DPO, LoRA), production deployment patterns, and ethical AI practices.
Practical Weak Supervision: Doing More with Less Data
Co-authored with Wee Hyong Tok and Senja Filipi
A practical guide to weak supervision techniques for machine learning, focusing on building high-quality training datasets with limited labeled data.
Pro WCF: Practical Microsoft SOA Implementation
An earlier work covering enterprise-scale SOA architecture with Windows Communication Foundation (WCF).
Research & Papers đź“°
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
arXiv:2404.14219
Co-authored paper introducing the Phi-3 family of small language models (SLMs), demonstrating that smaller models can rival much larger models in performance while remaining deployable on edge devices.
The work covers:
- Phi-3-mini: 3.8B parameters, 69% MMLU accuracy, deployable on smartphones
- Phi-3-small/medium: 7B and 14B parameter variants with 75-78% MMLU accuracy
- Phi-3.5 series: Enhanced variants including MoE and Vision models
- Focus on efficient training, inference, and edge deployment
- Extensive alignment for robustness, safety, and chat capabilities
This demonstrates the viability of small language models as a practical alternative to large foundation models, particularly important for privacy-preserving, on-device AI applications.
Open Source Models 🎉
I publish models on Hugging Face exploring various aspects of language model training and fine-tuning, from domain-specific SLMs to general-purpose architectures.
Blog Series: Building LLMs from Scratch
I’m currently publishing an in-depth, multi-part series on building production-grade Large Language Models from scratch, with complete working code and published models.
Series Overview
- Part 1: Getting started—architecture design, setup, and data preparation
- Part 2: Data collection, cleaning, and custom tokenizer development
- Part 3: Multi-GPU training, precision tricks (fp16/bf16), checkpointing, and experiment tracking
- Part 4: Inference optimization, model publishing, and deployment strategies
Each part includes working Python code, published model artifacts, real-world training challenges and solutions, and reproducible experiments with detailed hyperparameters.
If you’re interested in any of these topics or would like to discuss AI, LLMs, or research get in touch! .