Publications

Books 📚

Generative AI in Action

Manning Publications, 2024

A comprehensive guide to building transformative AI solutions with large language models. Topics include LLMs, prompt engineering, Retrieval-Augmented Generation (RAG), model adaptation and fine-tuning techniques (SFT, DPO, LoRA), production deployment patterns, and ethical AI practices.

Practical Weak Supervision: Doing More with Less Data

Co-authored with Wee Hyong Tok and Senja Filipi

A practical guide to weak supervision techniques for machine learning, focusing on building high-quality training datasets with limited labeled data.

Pro WCF: Practical Microsoft SOA Implementation

An earlier work covering enterprise-scale SOA architecture with Windows Communication Foundation (WCF).

Research & Papers 📰

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

arXiv:2404.14219

Co-authored paper introducing the Phi-3 family of small language models (SLMs), demonstrating that smaller models can rival much larger models in performance while remaining deployable on edge devices.

The work covers:

Phi-3-mini: 3.8B parameters, 69% MMLU accuracy, deployable on smartphones
Phi-3-small/medium: 7B and 14B parameter variants with 75-78% MMLU accuracy
Phi-3.5 series: Enhanced variants including MoE and Vision models
Focus on efficient training, inference, and edge deployment
Extensive alignment for robustness, safety, and chat capabilities

This demonstrates the viability of small language models as a practical alternative to large foundation models, particularly important for privacy-preserving, on-device AI applications.

Open Source Models 🎉

I publish models on Hugging Face exploring various aspects of language model training and fine-tuning, from domain-specific SLMs to general-purpose architectures.

Blog Series: Building LLMs from Scratch

I’m currently publishing an in-depth, multi-part series on building production-grade Large Language Models from scratch, with complete working code and published models.

Series Overview

Part 1: Getting started—architecture design, setup, and data preparation
Part 2: Data collection, cleaning, and custom tokenizer development
Part 3: Multi-GPU training, precision tricks (fp16/bf16), checkpointing, and experiment tracking
Part 4: Inference optimization, model publishing, and deployment strategies

Each part includes working Python code, published model artifacts, real-world training challenges and solutions, and reproducible experiments with detailed hyperparameters.

If you’re interested in any of these topics or would like to discuss AI, LLMs, or research get in touch! .