Editorial Note: We’re happy to have a featured post from our collaborator Yingtao Tian, a member of the Google Brain Tokyo team. Here he’ll discuss his most recent work on the intersection of non-differentiable programs and visual creativity. You can learn more about the project in the original paper and its online animated version.

Inspiration

Inspired by much of Magenta’s work that has primarily focused on the role of machine learning for music music, we recently began to wonder how similar approaches could inform froms of modern visual art, such as abstract painting. To provide better context to our approach, it’s helpful to have a brief review of the area.

Looking at painting art, starting from the early 20th-century in the wider context of Modernism, a series of avant-garde art has been advocating revolutionary, abstract point of views instead of traditional rules of perspective. These art movements, including Cubism, Geometric Abstraction, and Abstract Expressionism, have far-reaching impacts to Minimalist Art and Minimalist Architecture. Interestingly, such art forms have also been explored in computer arts, like Low-complexity Art, and Algorithmic Art, which, in a broader sense, also includes Genetic Algorithm where the artist determines the rules governing art generation. Such an approach has gained popularity over the years with the creative coding community, resulting in a number of sophisticated extensions.

With the Genetic Algorithm that helps artists in various forms as an inspiration, and especially the recent resurgence of interest in one of its branches, Evolution Strategies (ES), in the machine learning community, we choose to explore its usage for painting creativity applications.

Goals

We propose to synthesize painting by placing transparent triangles using Evolution Strategy (ES) to fit a concrete image or an abstract concept. The fitting process can be in some way controlled by the artists and multiple results can be presented for the artists to pick from.

For fitting the image with triangles, we explore modern ES algorithms. As shown in the figure below, we find it provides good quality and efficiency.

Figure: Our method fits the painting "Mona Lisa".
The target image (left) is followed by the evolution process that fits it (right)

We go one step further and explore fitting an abstract concept that is represented as a natural language sentence. Like what is shown in the following figure, we find it can produce diverse, distinct geometric abstractions that make sense when we consider how humans may interpret the language.

"Self" "Walt Disney World" "Google located at 1600 Amphitheatre Parkway in Mountain View, California."
Figure: Our method fits the concept represented as text. The concept is below the evolution process.

Finally and interestingly, the results produced by our method (to some degree) resemble Abstract Expressionism and Minimalist Art.

About the ES-based Creativity

One benefit we find with ES-based creativity is that our proposed method can fit any target image / concept and can handle a wide range of triangles due to the efficiency of the ES algorithm. We show that in two figures below. Also, the ES algorithm is capable of using the number of triangles as a “computational budget” where extra triangles could always be utilized for gaining fitness. This allows a human artist to use the number of triangles in order to find the right balance between abstractness and details in the produced art.

Target Image 10 Triangles 25 Triangles 50 Triangles 200 Triangles
Figure: Fitting several target images with different numbers of triangles. Images are "Darwin", "Mona Lisa" (both from Here), "Anime Face" (generated by Waifu Labs), "Landscape" (from Wikipedia), "Impressionism" (A May Morning in Moret by Alfred Sisley, compiled here).
10 Triangles 25 Triangles 50 Triangles 200 Triangles
"Self"
"Human"
"Walt Disney World"
"A picture of Tokyo"
"Google located at 1600 Amphitheatre Parkway in Mountain View, California."
"The United States of America commonly known as the United States or America is a country primarily located in North America."
Figure: Fitting several abstract concepts with different numbers of triangles. The concept can be a single word ("Self" and "Human"), a phrase ("Walt Disney Land" and "A picture of Tokyo"), and a long sentence (last two examples).

Another benefit specific to abstract concept fitting is that our method is given much more freedom in arranging the configuration of triangles and can produce different solutions, as shown below. It is desired for computer-assisted art creation, since human creators can be put “in the loop”, not only poking around the text prompt but also picking from multiple candidates produced by our method.

4 Individual Runs
"Self"
"Human"
"Walt Disney World"
"A picture of Tokyo"
"The corporate headquarters complex of Google located at 1600 Amphitheatre Parkway in Mountain View, California."
"The United States of America commonly known as the United States or America is a country primarily located in North America."
Figure: Fitting several abstract concepts with multiple runs with the same numbers of triangles.

Technical Details

The architecture of our proposed method is shown below. Basically, we represent a configuration of triangles in a parameter space which consists of positions and colors of triangles, render such configuration onto a canvas, and calculate its fitness based on how well the rendered canvas fits a target image or a concept in the form of a text prompt. The ES algorithm (we use PGPE with ClipUp optimizer) keeps a pool of candidate configurations and uses mutations to evolve better ones measured by the said fitness. When fitting a concrete image, we use the pixel-wise L2 loss between the canvas and the target image as the fitness; while for Fitting a concept, we first represent the concept as a text and embed the text prompt using the text encoder in CLIP, embed the rendered canvas using the image encoder also available in CLIP, and use the Cosine distance for for fitness.

Figure: The architecture of our method.

Closing words

It’s interesting to see that we can leverage ES to produce results with high quality, and produce geometric abstractions aligned with how humans perceive language and images. It also produces a distinct art style.

But this work in our opinion opens more questions than it answers — for example, maybe in the future, further investigation into the broader spectrum of art forms beyond the minimalism explored here should be conducted. Also, since ES is agnostic to the domain, i.e., how the renderer works, maybe in the future we will see ES-inspired approaches could potentially unify various domains with significantly less effort for adoption in the future.

How to Cite / Use this Work

The code for this work is open sourced and please let us know if you find it useful and/or build your work on top of it — we will be very happy to hear that!

If you use the proposed technique, please consider citing the paper where it was introduced:

Yingtao Tian and David Ha. "Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Conceptst." 2021. arXiv:2109.08857.

You can also use the following BibTeX entry:

@misc{tian2021modern,
    title={Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Conceptst},
    author={Yingtao Tian and David Ha},
    year={2021},
    eprint={2109.08857},
    archivePrefix={arXiv},
    primaryClass={cs.NE}
}