What Is Visual ChatGPT?
Visual ChatGPT
Microsoft has introduced Visual ChatGPT, an upgraded version of the chatbot that can produce images from text and process image prompts uploaded by users. While OpenAI's DALL-E-2 system has already experimented with AI image generation, Visual ChatGPT represents a step toward multimodal AI, which Microsoft revealed it was aiming for with the upcoming GPT-4 upgrade coming to Bing with ChatGPT. This means that image processing could soon be joined by AI-powered video and sound tools.
AI Art Generator
Visual ChatGPT is built on Open AI's GPT Large Language Model (LLM) and Microsoft's Prometheus model. The majority of AI art generators use a Visual Foundation Model (VFM) like Stable Diffusion to create images. To create Visual ChatGPT, Microsoft bolted several VFMs onto the flexible GPT model through the creation of a "Prompt Manager." This enabled ChatGPT to leverage the VFMs and receive their feedback iteratively until it met the requirements of users or reached the ending condition.
Images from Text and Image Prompts
Visual ChatGPT is different from standard AI image generators because it can generate images from text and image prompts, handle complex requests that span multiple processes, and even offer input and feedback on images uploaded or generated. For instance, users can ask the AI about the color of a motorbike or the contents of a picture and edit an image multiple times within the same session.
Various Uses in Business
Visual ChatGPT could be used for various purposes, such as refining an image that may not exist online or removing an object from an image or changing a background's color, which can be expensive and complicated using photo editing software like Photoshop. Professionals, such as architects and interior designers, could use Visual ChatGPT to show clients what a painting would look like or how removing a wall would change the look of a space. Additionally, visually impaired users could receive accurate AI descriptions of uploaded images.
AI Tools Are Still in Infancy
However, AI tools are still in their infancy, and Bing and Google Bard have made high-profile errors and battled quirks. Therefore, it is likely that Visual ChatGPT will also face similar issues. Moreover, there will always be safety concerns when it comes to the internet. Inappropriate content is bound to make its way to Visual ChatGPT, and Microsoft will need to handle explicit content with its image and video AI tools carefully. Even with content filters, there may be ways to bypass them, as was seen with the jailbroken ChatGPT "alter-ego" DAN.
Might Be Used for Deepfakes
The rise of photo edits and tweaks may also raise questions about the authenticity of any image or video we see online. Social media is already rife with heavily idealized snapshots of life, and it is easier to see people being deceptive with these tools. Video and audio deepfakes are already problematic when it comes to spreading disinformation, and this issue will need to be monitored carefully.
Will Redefine Our Internet Search
In conclusion, Visual ChatGPT has the potential to redefine the way we search the internet by combining image processing with AI-powered video and sound tools. While there are concerns about the safety and authenticity of the images and videos generated, Visual ChatGPT could be a valuable tool for professionals and visually impaired users. Microsoft will need to handle explicit content with care and monitor the use of this tool to prevent disinformation from spreading.
Source: Tom's Guide: What is Visual ChatGPT?
Image by Tumisu from Pixabay
Image from Github
Comments
Post a Comment