Sora: The AI tool that can create stunning videos from text

OpenAI, the company behind the popular chatbot ChatGPT, has unveiled a new AI tool that can generate realistic videos from text prompts. The tool, named Sora, can produce videos up to 60 seconds long, featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.

How does Sora work?

Sora is a generative AI model, which means it can create something new from existing data. Sora uses a large corpus of videos, both publicly available and licensed, to learn how to simulate the physical world in motion. It can also generate video from a still image or extend existing footage with new material.

To use Sora, users simply need to provide a text prompt describing what they want to see in the video. The prompt can include details such as the subject, the style, the mood, the setting, and the camera angle. Sora then interprets the prompt and renders a video that matches the description.

For example, one of the sample videos that OpenAI shared on its website was based on the prompt: “A movie trailer featuring the adventures of the 30-year-old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” The resulting video shows a man riding a motorcycle across a desert landscape, encountering various obstacles and enemies, and ending with a dramatic cliffhanger.

What are the applications and implications of Sora?

Sora is not yet available to the public. For now, OpenAI is only granting access to red teamers, who will assess the potential risks associated with the model’s release, and a limited number of visual artists, designers, and filmmakers, who will provide feedback on how to improve the model for creative professionals.

OpenAI CEO Sam Altman announced the model’s creation on X on Thursday, and invited users to suggest prompts from which it would generate videos. He later shared some of the videos, such as “two golden retrievers podcasting on top of a mountain” and “a bicycle race on ocean with different animals as athletes riding the bicycles with drone camera view.”

Many users expressed their excitement and admiration for the new technology, which could transform a range of creative industries, such as filmmaking, advertising, graphic design, and game development. Sora could also enable new forms of expression and communication, such as storytelling, education, and social media.

However, some users also raised concerns and questions about the ethical and societal implications of Sora. Some of the issues include:

The potential misuse of Sora to create fake or misleading videos, such as deepfakes, propaganda, or misinformation, that could harm individuals or groups, or influence public opinion or elections.
The impact of Sora on the originality, authenticity, and value of human-generated content, such as art, journalism, or entertainment, and the rights and compensation of the content creators.
The transparency and accountability of OpenAI and Sora, such as the sources and quality of the training data, the limitations and biases of the model, and the terms and conditions of the use of the tool.

OpenAI said it is aware of these challenges and is working with policymakers, researchers, and civil society to ensure the safe and beneficial use of Sora and other AI models.

How does Sora compare to other text-to-video models?

Sora is not the first text-to-video model, but it is the most advanced and realistic one so far. Other AI companies, such as Meta, Google, and Runway, have also developed text-to-video models, but they have only been able to produce short, low-quality, and often distorted videos.

Runway, a Brooklyn-based startup, released its most advanced model, Gen-2, in March 2023. The model can produce videos up to 10 seconds long, with a resolution of 256×256 pixels. The videos are often choppy, blurry, and surreal, and sometimes bear little resemblance to the prompts.

Sora, on the other hand, can produce videos up to 60 seconds long, with a resolution of 1024×1024 pixels. The videos are smooth, sharp, and natural, and closely match the prompts. Sora also demonstrates a better understanding of the 3D structure, motion, and occlusion of objects, as well as the emotions and expressions of characters.

Runway CEO and co-founder Cristóbal Valenzuela posted “game on” on X in response to OpenAI’s announcement, indicating the competition and innovation in the field of text-to-video generation.