Alibaba Group has released its new Qwen 3.5 small model series, and developers are already running it fully offline on the iPhone 17 Pro.
In demo videos shared online, the 2B-parameter model runs directly on the device without using the cloud. The model is quantized to 6-bit and works through Apple’s MLX framework, which is designed for Apple Silicon.
One viral demo from iOS developer Adrien Grondin shows the model answering questions quickly and analysing images in real time. Users can switch a reasoning mode on or off, allowing either deeper analysis or faster replies.
The Qwen 3.5 small series includes 0.8B, 2B, 4B and 9B parameter models. According to the Qwen team, the smaller models focus on efficiency while still offering strong reasoning and multimodal skills, including text and image understanding.
Alibaba says the models are designed for “more intelligence, less compute.” The 2B version is aimed at edge devices like smartphones. The 9B version reportedly performs close to much larger systems on some benchmarks.
Running AI fully on a phone offers key benefits. Data stays on the device, which improves privacy. There is also no internet delay, making responses instant and suitable for real-time apps.
The Qwen 3.5 small models were released on March 2, 2026. Developers say the powerful chip inside the iPhone 17 Pro helps deliver smooth performance.
The breakthrough highlights a wider shift in AI. More companies are pushing tools that work locally instead of relying on remote servers. Experts say this could lead to a new wave of private, fast and portable AI applications.