Visual and Multimodal Texts

New Apple model combines vision understanding and image generation with impressive results

Manzano combines visual understanding and text-to-image generation, while significantly reducing performance or quality trade-offs.

VentureBeat

Meta introduces Chameleon, a state-of-the-art multimodal model

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now As competition in the generative AI field ...

Nature

Multimodal Argumentation and Visual Rhetoric

Multimodal argumentation and visual rhetoric encompass an emergent field that explores how diverse communicative modes—including images, diagrams and other visual representations—contribute to the ...

InfoQ

Mistral AI Releases Pixtral Large: a Multimodal Model for Advanced Image and Text Analysis

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

VentureBeat

Salesforce releases ‘xGen-MM’ open-source multimodal AI models to advance visual language understanding

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Salesforce, the enterprise software giant, ...

12don MSN

Language shapes visual processing in both human brains and AI models, study finds

Neuroscientists have been trying to understand how the brain processes visual information for over a century. The development ...

GIGAZINE

DeepSeek releases 'DeepSeek-OCR,' a multimodal AI model that uses visual information to compress text input

DeepSeek has released a new multimodal AI model called ' DeepSeek-OCR.' 'OCR' stands for Optical Character Recognition, which is used for document scanning and other purposes. The model is said to be ...

techtimes

Apple Unveils New 'MM1' Multimodal AI Model Capable of Interpreting Images, Text Data

Apple has revealed its latest development in artificial intelligence (AI) large language model (LLM), introducing the MM1 family of multimodal models capable of interpreting both images and text data.

InfoQ

Microsoft Open-Sources Multimodal Chatbot Visual ChatGPT

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Vivek Yadav, an engineering manager from ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results