Tracked Repositories

117 repositories across 44 organizations.

huggingface

3 repos

huggingface/transformersPython42 commits

HuggingFace Transformers — state-of-the-art NLP/ML model library (~140K stars)

★ 159.1kDeepWiki

huggingface/diffusersPython22 commits

HuggingFace Diffusers — diffusion model inference & training (Stable Diffusion, Flux, etc.)

★ 33.3kDeepWiki

huggingface/text-generation-inferencearchivedPython

HuggingFace TGI — LLM serving (archived March 2026, read-only)

★ 10.8kDeepWiki

ollama

1 repo

ollama/ollamaGo30 commits

User-friendly local LLM runner built on llama.cpp (~167K stars)

★ 168.3kDeepWiki

ggml

2 repos

ggml-org/llama.cppC++93 commits

High-performance LLM inference in C/C++ (CPU + GPU)

★ 102.8kDeepWiki

ggml-org/whisper.cppC++

High-performance Whisper speech recognition in C/C++

★ 48.4kDeepWiki

open-webui

1 repo

open-webui/open-webuiPython

Self-hosted ChatGPT alternative with built-in RAG, offline-capable (~104K stars)

★ 130.9kDeepWiki

nomic-ai

1 repo

nomic-ai/gpt4allC++

Desktop AI app + SDK for running LLMs locally (~73K stars)

★ 77.3kDeepWiki

vllm-project

2 repos

vllm-project/vllmPython159 commits

Most widely adopted open-source LLM serving engine; PagedAttention, continuous batching

★ 75.9kDeepWiki

vllm-project/vllm-gaudiPython8 commits

vLLM community plugin for Intel Gaudi accelerators

★ 38DeepWiki

google

12 repos

google-ai-edge/mediapipeC++13 commits

Cross-platform ML pipeline framework (vision, audio, NLP)

★ 34.6kDeepWiki

google-ai-edge/galleryKotlin13 commits

AI Edge model gallery

★ 19.9kDeepWiki

google-ai-edge/LiteRT-LMC++43 commits

LiteRT for language model inference

★ 3.2kDeepWiki

google-ai-edge/mediapipe-samplesJupyter Notebook4 commits

Sample apps using MediaPipe

★ 2.6kDeepWiki

google/XNNPACKC71 commits

Highly optimized neural network operators library (ARM, x86, WASM)

★ 2.3kDeepWiki

google-ai-edge/LiteRTC++77 commits

Google's Lite Runtime (successor to TensorFlow Lite)

★ 2.1kDeepWiki

google-ai-edge/model-explorerJavaScript2 commits

Model visualization and exploration tool

★ 1.4kDeepWiki

google-ai-edge/litert-torchJupyter Notebook2 commits

LiteRT integration with PyTorch

★ 987DeepWiki

google-ai-edge/litert-samplesKotlin

Sample code for LiteRT

★ 266DeepWiki

google-ai-edge/ai-edge-quantizerPython4 commits

Quantization tooling for AI Edge models

★ 114DeepWiki

google-ai-edge/models-samplesJupyter Notebook

Sample models for AI Edge

★ 19DeepWiki

google-ai-edge/ai-edge-apis

AI Edge APIs — upstream repo deleted (404), local copy retained

★ 0DeepWiki

apple

9 repos

ml-explore/mlxC++8 commits

Array framework for ML on Apple silicon (Python)

★ 25.3kDeepWiki

ml-explore/mlx-examplesPython1 commits

Example models and applications using MLX

★ 8.5kDeepWiki

apple/coremltoolsPython

Tools for converting & running models with Core ML

★ 5.2kDeepWiki

ml-explore/mlx-lmPython8 commits

LLM inference and fine-tuning with MLX

★ 4.6kDeepWiki

ml-explore/mlx-swift-examplesSwift

Example apps using MLX Swift

★ 2.5kDeepWiki

ml-explore/mlx-swiftC++2 commits

Swift bindings for MLX

★ 1.7kDeepWiki

ml-explore/mlx-dataC++

Efficient data loading for MLX

★ 468DeepWiki

ml-explore/mlx-swift-lmSwift3 commits

LLM inference in Swift via MLX

★ 365DeepWiki

ml-explore/mlx-cC++3 commits

C bindings for MLX

★ 192DeepWiki

oobabooga

1 repo

oobabooga/text-generation-webuiPython72 commits

Gradio web UI for LLMs — multi-backend (llama.cpp, ExLlamaV2, transformers) (~43K stars)

★ 46.4kDeepWiki

mudler

1 repo

mudler/LocalAIGo58 commits

Free, open-source OpenAI drop-in replacement — runs locally, no GPU required (~36K stars)

★ 45.1kDeepWiki

exo-explore

1 repo

exo-explore/exoPython9 commits

Run LLMs distributed across heterogeneous devices (Mac, iPhone, etc.)

★ 43.5kDeepWiki

deepspeedai

1 repo

deepspeedai/DeepSpeedPython2 commits

Microsoft DeepSpeed — distributed training and inference (ZeRO, MII, FastGen)

★ 42.0kDeepWiki

microsoft

2 repos

onnx/onnxPython10 commits

Open Neural Network Exchange format specification

★ 20.6kDeepWiki

microsoft/onnxruntimeC++37 commits

Microsoft's cross-platform, high-performance ONNX inference engine

★ 19.8kDeepWiki

lm-sys

1 repo

lm-sys/FastChatPython

LLM serving framework and home of Chatbot Arena (~37K stars)

★ 39.4kDeepWiki

miscellaneous

5 repos

mlc-ai/mlc-llmPython4 commits

MLC's universal LLM deployment engine (multi-backend)

★ 22.4kDeepWiki

tile-ai/tilelangPython15 commits

Tile-based ML language and compiler

★ 5.5kDeepWiki

Anemll/AnemllPython

Community on-device LLM project

★ 1.6kDeepWiki

waybarrios/vllm-mlxPython

vLLM-style inference on Apple silicon via MLX

★ 781DeepWiki

RunanywhereAI/openevolve-pvtPython

Internal fork of OpenEvolve

★ 0DeepWiki

tencent

2 repos

Tencent/ncnnC++8 commits

High-performance neural network inference for mobile (Android/iOS)

★ 23.1kDeepWiki

Tencent/TNNC++

Tencent Neural Network — mobile and edge inference

★ 4.6kDeepWiki

nvidia

2 repos

NVIDIA/TensorRT-LLMPython118 commits

NVIDIA's optimized LLM inference library (GPU)

★ 13.3kDeepWiki

NVIDIA/TensorRTC++

NVIDIA's high-performance deep learning inference SDK (GPU)

★ 12.9kDeepWiki

sgl-project

1 repo

sgl-project/sglangPython246 commits

High-throughput LLM/VLM serving with RadixAttention and structured generation

★ 25.6kDeepWiki

mozilla-ai

1 repo

mozilla-ai/llamafileC++2 commits

Single-file LLM executables via Cosmopolitan Libc — zero install, all platforms (~21K stars)

★ 24.1kDeepWiki

mlc-ai

1 repo

mlc-ai/web-llmTypeScript3 commits

High-performance LLM inference in web browsers via WebGPU

★ 17.7kDeepWiki

alibaba

1 repo

alibaba/MNNC++10 commits

Alibaba's neural network inference framework for mobile & edge

★ 14.8kDeepWiki

apache

2 repos

apache/tvmPython33 commits

Apache TVM ML compiler — auto-tunes models for any hardware target

★ 13.3kDeepWiki

apache/tvm-ffiC++6 commits

Apache TVM Foreign Function Interface for deep learning compilation

★ 375DeepWiki

blaizzy

5 repos

Blaizzy/mlx-audioPython3 commits

Audio models (TTS, ASR) with MLX

★ 6.6kDeepWiki

Blaizzy/mlx-vlmPython21 commits

Vision-language models on Apple silicon via MLX

★ 4.2kDeepWiki

Blaizzy/mlx-audio-swiftSwift4 commits

Swift audio inference using MLX

★ 552DeepWiki

Blaizzy/mlx-embeddingsPython

Text embedding models with MLX

★ 346DeepWiki

Blaizzy/mlx-videoPython

Video model inference with MLX

★ 186DeepWiki

k2-fsa

1 repo

k2-fsa/sherpa-onnxC++24 commits

ONNX-based runtime for ASR, TTS, VAD, and keyword spotting

★ 11.5kDeepWiki

triton-inference-server

1 repo

triton-inference-server/serverPython5 commits

NVIDIA Triton — production multi-model inference server (HTTP/gRPC, multi-backend)

★ 10.5kDeepWiki

openvinotoolkit

2 repos

openvinotoolkit/openvinoC++45 commits

Intel's toolkit for optimizing & deploying deep learning on Intel hardware

★ 10.0kDeepWiki

openvinotoolkit/openvino.genaiC++13 commits

OpenVINO GenAI — generative AI layer with speculative decoding & KV-cache opt

★ 482DeepWiki

dusty-nv

1 repo

dusty-nv/jetson-inferenceC++

DNN inference library & tutorials for NVIDIA Jetson

★ 8.8kDeepWiki

intel

1 repo

intel/ipex-llmarchivedPython

Intel IPEX-LLM — local LLM acceleration on Intel hardware (archived Jan 2026, read-only)

★ 8.8kDeepWiki

nexa-ai

1 repo

NexaAI/nexa-sdkKotlin

Unified SDK for running LLMs and multimodal models locally

★ 7.9kDeepWiki

internlm

1 repo

InternLM/lmdeployPython12 commits

High-throughput LLM serving with TurboMind engine (C++/CUDA)

★ 7.8kDeepWiki

paddlepaddle

1 repo

PaddlePaddle/Paddle-LiteC++

Lightweight inference engine for mobile & embedded from PaddlePaddle

★ 7.2kDeepWiki

argmax

4 repos

argmaxinc/WhisperKitSwift

On-device Whisper inference for Apple platforms (Swift)

★ 6.0kDeepWiki

argmaxinc/whisperkittoolsPython

Python tooling for WhisperKit model optimization

★ 241DeepWiki

argmaxinc/OpenBenchJupyter Notebook2 commits

On-device AI benchmarking framework

★ 83DeepWiki

argmaxinc/argmax-sdk-swift-playgroundSwift1 commits

Swift playground for ArgMax SDK

★ 19DeepWiki

cactus-compute

13 repos

cactus-compute/cactusC15 commits

Cactus core edge inference framework

★ 4.6kDeepWiki

cactus-compute/cactus-react-nativeC++2 commits

React Native bindings for Cactus

★ 153DeepWiki

cactus-compute/cactus-flutterC++

Flutter bindings for Cactus

★ 69DeepWiki

cactus-compute/cactus-kotlinKotlin

Kotlin/Android bindings for Cactus

★ 66DeepWiki

cactus-compute/functiongemma-hackathonPython

Function-calling Gemma hackathon project

★ 38DeepWiki

cactus-compute/demo-cactus-chatTypeScript

Demo chat app using Cactus

★ 28DeepWiki

cactus-compute/.github

Org profile and community health files

★ 1DeepWiki

cactus-compute/dllm

Distributed LLM inference via Cactus

★ 0DeepWiki

cactus-compute/fast-mtpPython

Fast model transfer protocol

★ 0DeepWiki

cactus-compute/fmtpPython

Fine-grained model transfer protocol

★ 0DeepWiki

cactus-compute/homebrew-cactusRuby2 commits

Homebrew tap for Cactus

★ 0DeepWiki

cactus-compute/openclaw

Open-source claw project

★ 0DeepWiki

cactus-compute/opencode

Open-source code project by Cactus

★ 0DeepWiki

turboderp-org

1 repo

turboderp-org/exllamav2Python

High-performance EXL2-quantized inference for consumer NVIDIA GPUs

★ 4.5kDeepWiki

tensorflow

1 repo

tensorflow/tflite-microC++

TensorFlow Lite for microcontrollers and embedded devices

★ 2.8kDeepWiki

fluid-inference

3 repos

FluidInference/FluidAudioSwift24 commits

On-device audio inference framework

★ 1.8kDeepWiki

FluidInference/mobiusPython2 commits

Fluid Inference core runtime

★ 60DeepWiki

FluidInference/text-processing-rsRust

Rust text processing library for inference

★ 24DeepWiki

try-mirai

3 repos

trymirai/uzuRust23 commits

Mirai's on-device inference runtime

★ 1.5kDeepWiki

trymirai/lalamoPython3 commits

Mirai's LLaMA-based on-device model

★ 71DeepWiki

trymirai/uzu-swiftSwift

Swift SDK for Uzu

★ 53DeepWiki

ubiquitouslearning

1 repo

UbiquitousLearning/mllmC++

Multimodal LLM inference framework for mobile & edge

★ 1.5kDeepWiki

arm-software

1 repo

ARM-software/armnnC++

ARM Neural Network SDK for ARM & Mali devices

★ 1.3kDeepWiki

nimbleedge

2 repos

NimbleEdge/deliteAIC++

NimbleEdge's deliteAI on-device inference framework

★ 532DeepWiki

NimbleEdge/executorchPython

NimbleEdge fork of ExecuTorch with edge optimizations

★ 2DeepWiki

picovoice

1 repo

Picovoice/picollmPython

Picovoice's on-device LLM inference engine

★ 309DeepWiki

rocm

1 repo

ROCm/AMDMIGraphXC++15 commits

AMD's graph inference engine for MI-series GPUs

★ 289DeepWiki

zetic-ai

18 repos

zetic-ai/ZETIC_Melange_appsSwift

MLange sample applications

★ 53DeepWiki

zetic-ai/zetic_mlange_extC++

MLange extension library

★ 4DeepWiki

zetic-ai/zetic_mlange_ios_sampleSwift

iOS sample app for MLange

★ 3DeepWiki

zetic-ai/zetic-llm-ios-swift-templateSwift

Swift LLM template for iOS

★ 3DeepWiki

zetic-ai/zetic_mlange_flutterDart

Flutter SDK for MLange

★ 1DeepWiki

zetic-ai/zetic_mlange_android_sampleKotlin

Android sample app for MLange

★ 1DeepWiki

zetic-ai/zetic-llm-android-kotlin-templateKotlin

Kotlin LLM template for Android

★ 1DeepWiki

zetic-ai/zetic-llm-react-native-templateTypeScript

React Native LLM template

★ 1DeepWiki

zetic-ai/.github

Org profile and community health files

★ 0DeepWiki

zetic-ai/ai-edge-torchJupyter Notebook

Zetic fork of Google AI Edge Torch

★ 0DeepWiki

zetic-ai/onnx2torchPython

Convert ONNX models to PyTorch

★ 0DeepWiki

zetic-ai/ZETIC_MLange_documentPython

MLange SDK documentation

★ 0DeepWiki

zetic-ai/ZeticMLangeiOSSwift3 commits

iOS framework for MLange

★ 0DeepWiki

zetic-ai/ZeticMLangeExtiOSSwift

iOS extension framework for MLange

★ 0DeepWiki

zetic-ai/zetic-llm-flutter-templateDart

Flutter LLM template

★ 0DeepWiki

zetic-ai/whisper_sampleSwift

Whisper sample app using MLange

★ 0DeepWiki

zetic-ai/YOLOv8DemoKotlin

YOLOv8 demo using MLange

★ 0DeepWiki

zetic-ai/YOLOv8nSimpleDemoSwift

Simplified YOLOv8n demo using MLange

★ 0DeepWiki

Tracked Repositories

huggingface

ollama

ggml

open-webui

nomic-ai

vllm-project

google

apple

oobabooga

mudler

exo-explore

deepspeedai

microsoft

lm-sys

miscellaneous

tencent

nvidia

sgl-project

mozilla-ai

mlc-ai

alibaba

apache

blaizzy

k2-fsa

triton-inference-server

openvinotoolkit

dusty-nv

intel

nexa-ai

internlm

paddlepaddle

argmax

cactus-compute

meta

turboderp-org

tensorflow

fluid-inference

try-mirai

ubiquitouslearning

arm-software

nimbleedge

picovoice

rocm

zetic-ai