Tracked Repositories
117 repositories across 44 organizations.
huggingface
3 reposollama
1 repoUser-friendly local LLM runner built on llama.cpp (~167K stars)
ggml
2 reposopen-webui
1 repoSelf-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)
nomic-ai
1 repovllm-project
2 reposCross-platform ML pipeline framework (vision, audio, NLP)
Highly optimized neural network operators library (ARM, x86, WASM)
Google's Lite Runtime (successor to TensorFlow Lite)
Model visualization and exploration tool
AI Edge APIs — upstream repo deleted (404), local copy retained
apple
9 reposoobabooga
1 repoGradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)
mudler
1 repoFree, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)
exo-explore
1 repoRun LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)
deepspeedai
1 repoMicrosoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)
microsoft
2 reposMicrosoft's cross-platform, high-performance ONNX inference engine
lm-sys
1 repomiscellaneous
5 repostencent
2 reposHigh-performance neural network inference for mobile (Android/iOS)
nvidia
2 repossgl-project
1 repoHigh-throughput LLM/VLM serving with RadixAttention and structured generation
mozilla-ai
1 repoSingle-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)
mlc-ai
1 repoHigh-performance LLM inference in web browsers via WebGPU
alibaba
1 repoAlibaba's neural network inference framework for mobile & edge
apache
2 reposblaizzy
5 reposk2-fsa
1 repoONNX-based runtime for ASR, TTS, VAD, and keyword spotting
triton-inference-server
1 repoNVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)
openvinotoolkit
2 reposdusty-nv
1 repointel
1 repoIntel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)
nexa-ai
1 repointernlm
1 repoHigh-throughput LLM serving with TurboMind engine (C++/CUDA)
paddlepaddle
1 repoLightweight inference engine for mobile & embedded from PaddlePaddle
argmax
4 reposcactus-compute
13 reposmeta
1 repoPyTorch's portable execution framework for on-device inference
turboderp-org
1 repoHigh-performance EXL2-quantized inference for consumer NVIDIA GPUs