Amit Bahree’s (useless?) insight!

ÎÜñ|‹ø//ñ [ÐëÞrëçã†ëð] 👋🏽‍

When God asks what you've done with your life, try not to say:
"Didn't you read my tweets?" They are RPC free! 😎 🖬

🎉Announcing My New Book: Generative AI in Action📚

In today’s rapidly evolving tech world, mastering Generative AI isn’t just an advantage—it’s a necessity. Are you ready to harness its power to transform your business and solve real-world challenges? I’m excited to announce that my new book, Generative AI in Action, is now available in print and ebook formats from Manning Publications . 📖 Special Launch Offer 🌟 As a thank-you to my early supporters, I’m offering an exclusive discount. Use the code pbbahree at checkout to receive 45% off your purchase of Generative AI in Action in all formats (valid through Sept. 30, 2024)! ...

What is KV Cache in LLMs and How Does It Help?

TL;DR: KV cache is a memory optimization central to efficient LLM inference. It enables faster, longer, and more cost-effective generation by caching previously computed attention keys and values—unlocking the practical deployment of models like GPT-4o, Llama 3, etc. 1. Introduction Generative AI, powered largely today by Large language models (LLMs) such as GPT-4o, Llama 3, etc., is transforming AI applications, from chatbots to code assistants and multimodal reasoning. As these models scale in size and context length, inference becomes a major computational and memory challenge. The Key-Value (KV) cache is a pivotal optimization that enables practical, high-performance inference in modern transformer architectures. ...

RustySnake - Classic Snake game to learn Rust

1. Overview Rust has been gaining attention recently due to its unique combination of performance, safety, and modern programming features. Its strict ownership model eliminates common memory issues like null pointer dereferencing and data races, providing a secure environment for developers. At the same time, its expressive syntax and focus on developer productivity make it a strong contender for systems programming. Its growing ecosystem and community continue to expand its capabilities into areas like web development and embedded systems, ensuring a confident and secure learning experience for developers. ...

An introduction to Mixture of Experts (MoE)

AI is advancing at an unprecedented pace, with Mixture of Experts (MoE) models being one set of model architectures at the forefront of this revolution. These architectures enable breakthroughs in efficiency and scalability by leveraging a modular design where only a subset of specialized “expert” networks are activated for each input. MoE architectures have become a cornerstone in building ultra-large-scale models like GLaM and Switch Transformers. Mixture of Experts (MoE) is an advanced machine learning architecture that lately has gained significance, particularly in the realm of #LLMs (large language models) and NNs (neural networks). In talking with many people about AI, I’ve found that MoE as a topic comes up often, with many folks either not understanding what it is or their understanding of it being incorrect. ...

Automating Hugo Deployments

1. A little background I have been meaning to automate the deployment of my blog post to a dev server (running locally) for a while, but I haven’t had the time to get around to it until now. In addition to deploying this, the dev server also had several constraints. It is one of the machines at home and is not exposed directly to the internet. I also don’t have any ports opened on the firewalls at home, which adds a few more interesting challenges. I have a server hosted online in one of the data centers, but I want to keep that for more ‘production’ things. ...

AI generated Podcast for my book: Generative AI in Action 🎧

The one thing I wanted to do after my book Generative AI in Action was complete was to create a summary in natural speech and possibly use TTS (Text-to-speech) to create an audio summary—think of it as a podcast that is easier for people to consume and get a quick sense of what the book is about. TTS (Text to Speech) or not to TTS? Initially, I was inclined towards using TTS (Text to Speech) for the audio summary. This technology, I thought, would be a convenient way to create a podcast-like summary that would be easier for people to consume and get a quick sense of what the book is about. My journey began with TTS - using GPT 4o to create a summary after ingesting the book and then using that as into the Azure AI Speech stack . However, I stumbled across something intriguing. Instead of TTS, I opted for NotebookLM from Google Labs to generate the audio overview - the podcast. This decision marked a significant shift in my approach, and I created two podcasts using the content from the book - one from multiple sources and another from a single source (the book). ...

Backing up TeslaMate data to OneDrive

I have been running a couple of instances of Teslamate - one locally on a server at home and another in Azure in a Ubuntu VM (see 👉 this blog post for details). I have been backing up the data to a NAS and then an offsite backup for the local instance. For the Azure instance, I have been running various backups during the day and backing up the data to OneDrive. This allows me to have a data backup in case the VM crashes or the data gets corrupted. ...

SLMs - Running Phi-3 on an iphone and locally

We released Phi-3 recently, which builds on Phi-2 ( read more on that here ) and it is a great model to use for various tasks. In this post, we will show how to run Phi-3 locally including a demo of it running on a phone. There should have been a video here but your browser does not seem to support it. 1. What are Small Language Models (SLMs)? Before diving into running Phi-2 locally, let’s take a moment to understand the concept of small language models (SLMs) and their significance in natural language processing (NLP). A SLM is a type of AI model that has been trained on a massive dataset of text but is limited in terms of its size and capabilities compared to a Large Language Model (LLM). SLMs are designed to be more lightweight and efficient, making them suitable for various applications, including chatbots, language translation, and content generation. SLMs are much smaller than LLMs, with fewer parameters and a smaller dataset, so they have a lower computational cost, making them more suitable for edge or resource-constraint devices. ...

SLMs - How to run Phi-2 Locally, and implement RAG

1. What are Small Language Models (SLMs)? Before diving into running Phi-2 locally, let’s take a moment to understand the concept of small language models (SLMs) and their significance in natural language processing (NLP). A SLM is a type of AI model that has been trained on a massive dataset of text but is limited in terms of its size and capabilities compared to a Large Language Model (LLM). SLMs are designed to be more lightweight and efficient, making them suitable for various applications, including chatbots, language translation, and content generation. SLMs are much smaller than LLMs, with fewer parameters and a smaller dataset, so they have a lower computational cost, making them more suitable for edge or resource-constraint devices. ...

Shedding Light on the Art of Prompt Engineering

How many prompt engineers does it take to change a light bulb? Just one, but first, they need to fine-tune the model to make sure the AI doesn’t end up writing a poem about darkness instead. DALLE generated image of How many engineers it take to change a light bulb

📚 My new book "Generative AI in Action"

🌐 As software continues to revolutionize the world, the advent of Generative AI is transforming the very fabric of software itself. My latest book, Generative AI in Action delves into this transformative journey. I am thrilled to announce the early release of my latest book, Generative AI in Action now available through Manning Early Access Program (MEAP) . This publication is a deep dive into the cutting-edge world of #GenerativeAI, #LLMs, #OpenAI, and #Azure #OpenAI, tailored specifically for enterprises. 🤘 ...

AI working with humans

What does AI think of humans and our humor? I asked #AI - #CodeInterpreter plugin for #ChatGPT to express its experience as an AI working with humans as a meme. It wrote this code to create the meme below. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import matplotlib.pyplot as plt import numpy as np # Create a new figure with a white background fig, ax = plt.subplots(figsize=(6, 6), dpi=80) fig.patch.set_facecolor('white') # Set up the text and meme caption meme_caption = "When I, an AI, try to understand human humor:" meme_text = "Error 404: Humor not found" # Add the meme caption and text to the figure ax.text(0.5, 0.7, meme_caption, horizontalalignment='center', verticalalignment='center', transform=ax.transAxes, color='black', fontsize=14, fontweight='bold') ax.text(0.5, 0.3, meme_text, horizontalalignment='center', verticalalignment='center', transform=ax.transAxes, color='black', fontsize=16, fontweight='bold') # Remove the axes for a clean meme look plt.axis('off') # Show the meme plt.show() AI's experience working with humans ...

OpenAI's Whisper speech model - an overview

What is Whisper from OpenAI? Whisper is a speech recognition model (ASR – automatic speech recognition) from OpenAI. The model itself is multi-task model and as a result in addition to speech recognition, can also do language identification and speech translation across a number of languages. The model is open sourced and it comes in 5 sizes. Of these, 4 have a english-only variant which seem to perform better if one only needs english. The model is also robust to noise, accents, background noise and technical language. Whisper achieves near SOTA performance with zero-shot translation from multiple-languages to English. ...

Hello New Bing 👋

Bing is getting a new look and feel, powered by Microsoft AI and OpenAI (ChatGPT) and was announced yesterday. There is a lot of buzz around this, and I thought I would share my thoughts on this as I got access today. What is the new Bing? Well, it is the thing that is making the 800-pound gorilla in the room, Google, come out and dance on its toes. 🦍 The new Bing is an overhauled version of the search engine that uses ChatGPT technology to understand questions and generate answers. It runs on the next generation of OpenAI’s language model, which is significantly more capable than the version of ChatGPT that has been available since November 20221. The new Bing provides more relevant results for simple things like sports scores, stock prices and weather, along with a new sidebar that shows more comprehensive answers if you want them3. You can also chat and create with the new Bing, using its natural language and creative abilities4. The new Bing is live starting today, with limited capabilities. ...

PFOaaS - Polite Fork Off As A Service

API Introduction Polite Fork Off As A Service (or PFOaaS) - https://pfoaas.desigeek.com/ is a modern REST API that solves the problem of one telling people to politely fork off. 😇 There are days when we all need such a service for various reasons, and I think it is a great way to release some pent-up frustration. 🖤 It is also a great way to get some laughs too. This is of course meant for hard code engineers, writing RPC free code 😜 ...

Using CoPilot beyond code

In the last week or so, all the range online has been #OpenAI’s new chatbot called #ChatGPT (you can read more details on ChatGPT here ). This also got me thinking, about how can we use #CoPilot more than just code. GitHub CoPilot as you might recall is your #AI powered pair-programmer. And as we can see below, it indeed is possible to use Codex as sort of a more general purpose usage. I start with the prompts on how one might use CoPilot – a function to read a file and return its contents as a string, just to show there isn’t anything different I am doing in using this. And then for general-purpose usage, I used the prompts in VSCode. ...

Hello ChatGPT

OpenAI recently released #ChatGPT , a GPT-3 based chatbot that can be used to chat with. ChatGPT is a fine-tuned model of GPT3.5 , using #RL (specifically a PPO algorithm) similar to the Instruct series. This post is my experience in using it. Blog post with ChatGPT What better place to start with, than asking it about itself? 😃 ...

Moving from WordPress to Hugo

I had been thinking for a while to move away from WordPress for the blog to something simpler and cleaner. WordPress has been great for me when I first moved to it from another engine. However, over time, I found that things have gotten slower, as I added themes and add-ins. Some of these have been great, and others are not really needed. I also wanted to dogfood some of the things we built at work, though in my case that wasn’t the primary motive, just another nice to have. 😄 ...

AI generated text-to-video

Here is an example of how one can use a text prompt to generate a series of frames, that then are stitched together into a video. The prompt I used was: “a man walking in the parking lot with a miniature poodle”. the final video generated is shown below. There should have been a video here but your browser does not seem to support it. AI-generated video from a text prompt of a man walking in a parking lot with a miniature poodle ...

The rise of prompt engineering

I have said this before - with the advent of large AI models, Prompt Engineering is critical and is the next challenge for us to master. What is Prompt engineering? Prompt engineering is the process of fine-tuning large models and often is written in natural language, outlining the intention of the user. Prompt engineering is a key element that allows the output to be accurate and reflect the needs of the user. Prompts should not be thought as the explicit one input to the model, instead are multiple tasks for the model. ...

Nuget packages not found after installing Visual Studio 2022

I recently needed to install Visual Studio 2022 on one my existing machines to debug a new zeroshot model that has a dependency on our Speech SDK. The Speech SDK is one of our key #AI services in Cognitive Services (as part of #AzureAI). I already had VSCode running, but in this case I need the bigger brother. After installing Visual Studio, I could not get any nuget packages to install; I could not even fetch anything and didn’t matter what I used - the package manager console in Visual Studio, PowerShell, etc. ...

Podman error on Ubuntu - short-name did not resolve to an alias and no unqualified-search registries

I recently installed Ubuntu on one of the Pi’s are home and installed Podman - which I hadn’t heard of until recently and is a container engine, similar to docker but doesn’t have a daemon. When trying to get a basic alpine test image running I got this error: Error: error creating build container: short-name "python:3.7-alpine" did not resolve to an alias and no unqualified-search registries are defined in "/etc/containers/registries.conf" ...

Developers mysterious life

The mysterious life of developers has evaded many of us, until now … There should have been a video here but your browser does not seem to support it. The mysterious life of a developer (courtesy Spoon Norge)

How to run TeslaMate on Azure

If you have a Tesla, then you should absolutely check out TeslaMate which is data logger for your car(s) that one self-hosts. This uses the car’s API and gets all different kinds of telemetry of your drives, charging, batter conditions, acceleration, braking, parking, etc. I personally prefer this, over other online services (of which there are a few) - as it is giving away the keys to the kingdom - literally in this case (the Tokens used to authenticate and login). ...

AI writing AI code🤐

It is 2021. And we have #AI writing #AI code. 🤪 It is quite interesting, but also can be quite boring once you get beyond the initial technology, and just think of it as one of the tools in your arsenal. And getting to that point is a good think. As part of a think at work I recently started playing with GitHub Copilot , which is using GPT3 to be your pair programmer – helping write code. GPT3 has multiple models (called engines), and Copilot uses one of these family of engines called Codex. Codex is a derivative of the base GPT3 engine that is trained on billions of lines of code. ...