Efficient multimodal large language models
Unclaimed project
Are you a maintainer of MiniCPM-o? Claim this project to take control of your public changelog and roadmap.
Changelog
MiniCPM-o 4.5: A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Live Streaming on Your Phone
Efficient multimodal large language models
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Stable Diffusion web UI
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A feature-rich command-line audio/video downloader