Local ML Experiment
Running AI models locally on your machine in the browser
This experiment explores the fascinating world of Machine Learning models, more specifically running AI models that run entirely in the browser, utilizing
WebGPU
to process the model.
What I'm Exploring
- Model Performance: Testing various open-source models (Llama, Mistral, Phi) on consumer hardware
- Quantization: Exploring how model size affects performance and quality
- Use Cases: Building practical applications with local AI
- Privacy Benefits: Keeping all data processing local and secure
What This Demo Is (and Isn't)
This is an educational experiment exploring the possibilities of running AI models locally in the browser. Here's what you should expect:
- ✅ It is: A demonstration of browser-based AI using small, efficient models
- ✅ It is: Completely private - all processing happens on your device
- ✅ It is: A proof of concept for local AI capabilities
This isn't trying to be:
- ❌ It's not: A way for you to replace ChatGPT or other large-scale AI services
- ❌ It's not: Optimized for complex reasoning or professional tasks
- ❌ It's not: As capable as models with billions of parameters, don't be suprised at the somewhat goofy responses
The models used here are intentionally small (0.5-1.5B parameters) to run efficiently in browser constraints. Think of this as a glimpse into a future where personal devices can run AI
locally, not as a complete solution for all AI needs.
Some of the use cases that I'm thinking about is:
- Autocomplete for search queries without complex client side or server side code
- Personalized content generation with minimal code
- Personalized recommendations with minimal code
Live Demo: Model Switcher
Try this interactive demo that lets you switch between different AI models right in your browser! Experience sentiment analysis and text generation running entirely locally.
⚠️ Device Requirements Not Met
This experiment requires a modern device with sufficient resources to run AI models locally.
Issues detected:
- Low device memory (less than 4GB)
- Low CPU core count (less than 4 cores)
Recommended requirements:
- WebGPU support
- 4GB+ RAM
- 4+ CPU cores
- Desktop/laptop (not mobile)
← Back to Experiments
Current Models
Qwen2.5-0.5B-Instruct
Size: 380MB | Type: Instruction-tuned
A powerful model with strong reasoning and instruction-following capabilities. Best for complex tasks, detailed responses, and when you need higher quality outputs.
Best for: Chat, reasoning, instruction following
📖 View on Hugging Face
DistilGPT2
Size: 82MB | Type: Text generation
A lightweight, fast model perfect for quick demos and experimentation. While less capable than Qwen, it loads much faster and is great for testing the interface.
Best for: Quick demos, creative writing, testing
📖 View on Hugging Face
How It Works
This demo uses 🤗 Transformers.js to run LLMs directly in your browser using WebGPU. The models are quantized to 4-bit precision
to keep them lightweight while maintaining good performance.
The Process:
- Model Selection: When you select a model, the browser checks if it's already cached locally
- Download (if needed): If not cached, the model files (80-400MB) are downloaded from Hugging Face and stored in your browser's cache
- Initialization: The model is loaded into your GPU memory using WebGPU for accelerated inference
- Inference: When you type a message, it's tokenized and fed through the neural network entirely on your device
- Response Generation: The model generates text token by token, which is displayed in real-time as it's created
- Caching: The model stays in memory for quick subsequent responses, and remains cached for future visits
Key Technologies:
- WebGPU Acceleration: Uses your GPU for faster inference (📖 learn more)
- On-the-Fly Loading: Models are downloaded and cached as needed
- Privacy First: Everything runs locally in your browser
Example
Here's a variation of this experiment in CodeSandbox (NOTE: that it won't work in the sandbox, due to restrictions with service workers and WebGPU, but you can copy the code and run it
locally)
Last updated: 2026-03-20T13:45:54.645Z