Are AI video creators dreaming of San Pedro? Madonna is an early adopter of the next wave of AI

When Madonna sings her 1980s hit “La Isla Bonita” on her concert tour, moving images of swirling, sunset-colored clouds play on giant arena screens behind her.

To achieve this ethereal look, the pop legend embraced an as yet untapped branch of generative AI: the text-to-video tool. Type a few words – for example, “surreal cloud sunset” or “waterfall in the forest at dawn” – and a video is instantly created.

Following in the footsteps of AI chatbots and stills creators, some AI video enthusiasts say the emerging technology could one day disrupt entertainment and allow you to choose your own movie with customizable plots and endings. But there is a long way to go before they can do this, and there are many ethical pitfalls along the way.

It was more of an experiment for early adopters like Madonna, who had long been pushing the boundaries of art. It mixed an earlier version of the “La Isla Bonita” concert visuals that used more traditional computer graphics to evoke a tropical mood.

“We tried CGI. It looked pretty bland and cheesy, and she didn’t like it,” said Sasha Kasiuha, content director for Madonna’s Celebration Tour, which runs through the end of April. “Then we decided to try artificial intelligence.”

ChatGPT creator OpenAI gave a glimpse of what complex text-to-video technology could look like when the company recently introduced Sora, a new tool that’s not yet publicly available. Madonna’s team tried a different product from New York-based startup Runway, which pioneered the technology by releasing the first generic text-to-video model last March. The company released a more advanced “Gen-2” version in June.

While some view these tools as “a magic device where you type a word and it somehow brings to life exactly what’s in your head,” Runway CEO Cristóbal Valenzuela said, the most effective approaches are made by creative professionals looking for a decades-old upgrade. digital editing software they currently use.

He said Runway can’t make a full-length documentary yet. But it can help fill in some background video or b-roll (supporting shots and scenes that help tell the story).

“This saves you maybe a week of work,” Valenzuela said. “What many use cases have in common is that people use it as a way to augment or speed up something they could do before.”

Runway’s target customers are “major broadcast companies, production companies, post-production companies, visual effects companies, marketing teams, advertising companies. A lot of people create content for a living,” Valenzuela said.

Dangers await. Without effective protections, AI video creators could threaten democracies with convincing “deepfake” videos of things that never happened, or — as has already happened with AI image creators — flood the internet with fake pornographic scenes posing as real people. recognizable faces. Under pressure from regulators, major tech companies have pledged to watermark AI-generated output to help determine what is real.

Additionally, copyright disputes arise over the video and image collections on which AI systems are trained (neither Runway nor OpenAI disclose their data sources) and the extent to which they unfairly copy trademarked works. And there are fears that at some point video production machines could replace human work and art.

For now, the longest AI-generated video clips are still measured in seconds and can contain noticeable glitches like jerky movements and crooked hands and fingers. Alexander Waibel, a professor of computer science at Carnegie Mellon University who has researched artificial intelligence since the 1970s, said fixing this is “just a matter of more data and more training” and the computing power on which that training relies.

“Now I can say, ‘Make me a video of a rabbit dressed as Napoleon walking through New York City,'” Waibel said. “He knows what New York City looks like, what a rabbit looks like, what Napoleon looks like.”

He said it was impressive, but still far from making for a compelling story.

Before launching its first-generation model last year, Runway’s reputation for AI was as co-developer of image generator Stable Diffusion. Another London-based company, Stability AI, has since taken over development of Stable Diffusion.

The “propagation model” technology behind many of the leading AI image and video generators works by mapping noise or random data onto images, effectively destroying the original image and then predicting what the new one should look like. It borrows an idea from physics that can be used to describe how gas spreads outward, for example.

“What diffusion models do is reverse this process,” said Phillip Isola, an associate professor of computer science at the Massachusetts Institute of Technology. “They kind of take the randomness and squeeze it back into the volume. This is the way to go from randomness to content. This is how you can create random videos.”

Daniela Rus, another MIT professor who directs the Computer Science and Artificial Intelligence Laboratory, said creating video is more complex than still images because it must take into account temporal dynamics, or how elements within the video change over time and between frame sequences.

Rus said the computing resources required are “significantly higher than for still image rendering” because it “requires processing and rendering of multiple frames for each second of video.”

That’s not stopping some wealthy tech companies from trying to outdo each other in delivering high-quality AI video production over longer periods of time. Needing written descriptions to create an image was just the beginning. Google recently introduced a new project called Genie that can be directed to turn a photo or even a sketch into an “infinite variety” of explorable video game worlds.

Aditi Singh, a researcher at Cleveland State University who studies text-to-text, said AI-generated videos will likely feature in marketing and educational content in the near term, offering a cheaper alternative to producing original footage or sourcing stock videos. video market.

When Madonna first talked to her team about AI, “her main intention wasn’t, ‘Oh, look, this is an AI video,'” creative director Kasiuha said.

“He asked me: ‘Can you use one of those AI tools to make the image clearer and make it look up-to-date and high-resolution?’” Kasiuha said. “He likes it when you bring in new technology and new types of visual elements.”

Longer AI-generated films are already being made. Runway hosts an AI film festival every year to showcase such work. But it remains to be seen whether this is what human viewers will choose to watch.

“I still believe in people,” said Waibel, the CMU professor. “I still believe this will result in a symbiosis where an AI suggests something and a human improves or directs it. Or humans will do it and the AI will fix it.”

————

Associated Press journalists Joseph B. Frederick and Rodrique Ngowi contributed to this report.

Leave a Reply Cancel reply