External Publication
Visit Post

describe images using local llms and shortcuts to do this

AppleVis [Unofficial] May 11, 2026
Source
Hey guys! I would like to try using the iPhone to play. For this, since using Gemini uses up a lot of tokens quickly, I would like to ask how good local models are for describing images and if it is possible to make a shortcut for this. The idea would be the following: I press a button on the controller. The ption change. I make a gesture on the iPhone screen with VoiceOver. silently, it takes a screenshot of the screen, sends it to llm with a specific prompt, speaks and deletes the prompt. Do you think it would work? Find out which option is in focus, player status, among others.

Discussion in the ATmosphere

Loading comments...