Although deep learning has revolutionized computer vision, current approaches of deep learning have many shortcomings, typical vision datasets are hard to create, hard to preserve, and expensive to build, common vision models are limited to just one task, and it takes lots of time and effort to adapt them to different tasks.
OpenAI acknowleges the problem and therefor has developed a neural network called CLIP (Contrastive Language–Image Pre-training) which can be applied to generate images from any given text without heavy training before running. CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.
What is Deep Daze?
Deep Daze is a small command line program that leverages openAI CLIP and Siren (Implicit neural representation network) power to generate an image from any given text.
CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized.
Zero-shot learning aims to recognize objects whose instances may not have been seen during training. In other words, zero-shot learning allows for recognizing things that the machine has never seen before based on similar things it already knew. For example, suppose someone has never seen a zebra, but he has seen a horse before and knows that a zebra is some kind of horse, so he can immediately recognize it.
Deep Daze demo
Below is a few of Deep Daze generated images and the their hints.
mist over green hills
shattered plates on the grass
cosmic love and attention
a time traveler in the crowd
life during the plague
meditative peace in a sunlit forest
a man painting a completely red image
a psychedelic experience on LSD
How to install Deep Daze
Deep Daze is available for Colab users as well as on PyPI. One can simply open the tool on Colab using one of the following link.
Deep Daze requires a Nvidia GPU or AMD GPU that has at least 4GB of VRAM to be able to work in the lowest possible settings and recommends 16GB VRAM to produce usable results.
pip install deep-daze
How to run Deep Daze
In order to generate an image with the hint : "a house in the forest", you would run the following command in the terminal :
imagine "a house in the forest"
For Windows users, the command would need to be ran in an elevated Command Prompt (Run as administrator).
--deeper flag into the command will allocate more memory to the program, thus generate better quality image.
In addition to generating an image from a hint, Deep Daze can also use a specific image as the starting point before processing the text (priming), as well as creating a short paragraph or poem describing an image.
Details about advanced usage of Deep Daze can be found at its official Github repository.
A notable Deep Daze alternative to Deep Daze that you can check out is Big SLEEP that also uses OpenAI’s CLIP with BigGAN.