LLM | Amit Bahree's (useless?) insight!

Building LLMs from Scratch - Part 4: Evaluation & Deployment

TL;DR In this final part of our 4-part series on building language models from scratch, we explore the evaluation, testing, and deployment pipeline that transforms our trained historical language models into working systems. Part 1 showed you how to use the published models, Part 2 covered data collection and custom tokenization, and Part 3 detailed the model architecture and training infrastructure. Here, we complete the journey with evaluation frameworks, testing infrastructure, and deployment to Hugging Face Hub. ...

🏛️Building LLMs from Scratch - Part 3: Training Architecture & GPU Optimization

TL;DR In this third part of our 4-part series on building language models from scratch, I explore the complete training infrastructure that transforms our clean historical data and custom tokenizer into working language models. Part 1 How to build a Large Language Model from Scratch - covered using the published model Part 2 Building LLMs from Scratch - Part 2: Data Collection & Custom Tokenizers - detailed data collection and custom tokenizer development. Here, we build the complete training pipeline from a custom GPT architecture through deployment-ready checkpoints. ...

🏛️Building LLMs from Scratch - Part 2: Data Collection & Custom Tokenizers

TL;DR In this second part of our 4-part series on building language models from scratch, I explore the two foundational areas of LLM development: data collection and custom tokenizer creation. Part 1 - Building LLM from Scratch covered using the published model; here, we build the complete pipeline from raw historical documents to a custom tokenizer that understands archaic English, London geography, and period-specific terminology. ...

🏛️How to build a Large Language Model from Scratch - Part 1

TL;DR In this post, I show how to build a working LLM from scratch and show a complete end-to-end pipeline from data gathering to training to deployment of a language model. For this project I concentrate on Old English and only related to London, using historical London texts (1500-1850). To show the flexibility, I built two language models which are identical in architecture and the only differs is their size and parameters (117M vs 354M). ...

Reasoning AI Models: An overview

TL;DR As part of my role at Microsoft’s AI Foundry Applied AI engineering team in CoreAI, I have participated in numerous detailed discussions about the evolving landscape of AI models. In conversations with many customers, from CxOs to engineers, one recurring topic is the rise of reasoning AI models. These models are designed to perform complex tasks by explicitly breaking down problems into logical steps, rather than just generating text in a single pass like traditional large language models (LLMs). This shift toward reasoning-centric AI marks a major evolution in how we develop and deploy AI systems—and it’s a key factor behind the rise of Agents and Agentic AI. ...