Phenaki - Text to Video Generator.

Phenaki is a model capable of producing realistic videos from strange scenarios. To convert text (such as words or sentences) into video tokens, Phenaki uses a transformer, a sort of deep learning model.

How Phenaki works?

It works by taking a series of written prompts and compressing videos into tokens using the C-ViViT encoder. These tokens are then de-tokenized to make a video of any duration that is driven by time-varying text or stories. To maximize data, they perform collaborative training on image-text pairings and fewer video-text instances from datasets such as LAION5B and JFT4B, so that generalization goes beyond what is accessible in existing datasets.

Who is behind Phenaki?

As stated in the official paper published on the Openreview website, the identity of creators is kept anonymous.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
Statement from paper

A pdf copy of the paper is available for download

Prompts used to generate Video clips

The Phenaki team provided some examples of prompts used to generate videos. Here are some examples:

1 – The water is magical:

Prompts used:
A photorealistic teddy bear is swimming in the ocean at San Francisco
The teddy bear goes under water
The teddy bear keeps swimming under the water with colorful fishes
A panda bear is swimming under water

2 – Chilling on the beach

Prompts used:
A teddy bear diving in the ocean
A teddy bear emerges from the water
A teddy bear walks on the beach
Camera zooms out to the teddy bear in the campfire by the beach

3 – Fireworks on the spacewalk

Prompts used:
Side view of an astronaut is walking through a puddle on mars
The astronaut is dancing on mars
The astronaut walks his dog on mars
The astronaut and his dog watch fireworks

Can Phenaki generate video from images?

Phenaki can generate clips from a still image by inputting the first frame followed by a prompt. Some examples of this feature have been published on the website

“Camera zooms quickly into the eye of the cat”

“A white cat touches the camera with the paw“

“A white cat yawns loudly“

A huge step in the Video creation industry

Phenaki models cleared the way for a new era of video technology, where creative people can bring their ideas to life. These models show deep learning’s power and ability to handle massive volumes of data, producing high video with amazing accuracy and quality. However, with such technology comes responsibility, as the model’s designers acknowledge the danger of misuse and the inclusion of biases in the training data. Nonetheless, this is a huge step toward reaching generative models’ full potential, and it will be intriguing to see how this technology progresses in the future.