AI generates art from complexity and suffering

Yesterday I went to the lab of Professor Takashi Ikegami from the University of Tokyo, who is a great specialist in artificial life.

Over the past few years, Professor Ikegami has teamed up with composer and pianist Keiichiro Shibuya to create an android (robot) opera. The latest “mirror” artwork was unveiled to the world for the first time the other day at the Dubai World Expo (link).

Considering some keywords, the android sung poems produced by GPT-3, a natural language artificial intelligence, shocked the world.

The title “mirror” means that Android is “a mirror that reflects you”.

Professor Ikegami also said:

“Actually, the output of GPT-2 may be more interesting.”

According to the teacher, GPT-3 has learned so many words that the words that come out are commonplace. Compared to this, the previous version, GPT-2, seems to spin poems that contain the unexpected.

When I heard such a story, I thought, “For now, the interesting output produced by the AI ​​is just nonsensical fun.”

Being absurd means it’s very surprising.
Amazement alone is not enough, and there has to be something in common that is surprising yet compelling. This is also the reason why the AI-generated results create an irresistible and adorable feeling.

Perhaps the biggest change that will happen as AI develops is that “being smart won’t benefit your life.”
When someone deals with an AI that is beyond the human thinking capacity, it is a direct person who blindly adheres to the judgment of the AI ​​even if he cannot study, rather than a person who doubts the judgment given by AI.

Then I wondered what is the value of the remaining human beings at this time, the positive emotions of honesty and honesty, and the negative emotions of anger, anxiety and resentment.

After all, there are always negative events in the creator’s story. It is true that emotions such as parting with a loved one, resentment and anger towards the world make people creative.

On the other hand, there are many creators who seem to be full of talent, but when they are financially satisfied, they suddenly stop creating.

Looking at the creators around me, a lot of people are thirsty. The thirst for success, the jealousy of young talents, the impatience of not being able to keep up with the world, the reactionary resentment and hatred of the world and of young people.

I think those negative emotions make creation a creation.

For example, the past year has been a year in which research into “AI that draws pictures” has become a turbulent world, starting with DALL-E announced by OpenAI in January.

A project in which young researchers who have been excited that DALL-E has greatly fueled people’s expectations and never regret the implementation will collaborate around the world to replicate DALL-E. I was born here and there.

On the contrary, if OpenAI had released DALL-E obediently, it wouldn’t have been so exciting.

It is also an example of seekers who felt the “thirst” to gather their energy to achieve their goal.

The excitement peaked when the Russian research team released the Russian version of DALL-E (ruDALL-E) using the Russian version of GPT-3 and the Russian version of CLIP.
The research team claims that this project was the largest artificial intelligence project in Russia.

If you put the word Emma Watson in ruDALL-E, women like Emma Watson will be drawn one after another.
It is possible to do what OpenAI has refrained from depicting without restriction.

However, as I touched it, I soon realized that I was fed up.
Sure, ruDALL-E is amazing because it’s not limited, but when you think about it, it’s like searching, and it’s not fun.

For example, if you want to generate facial images only, you can say “Asian uncle face” to the AI ​​that has learned a celebrity face database such as FFHQ, and a face like this will be generated .

Since the feature space of FFHQ is closed only to the face, no matter where the feature vector is taken, the face will be decent.
There is no doubt that it is a “learning process”, but the absence of an intrinsically impossible intermediate state is an obstacle to the art.

Searching the FFHQ feature space is almost the same as searching the material collection.
It doesn’t get “interesting”.

Rather than suddenly producing a beautiful image, the author wondered if AI would be the art of craving and struggling to achieve its goal.

Therefore, first, using a supercomputer called AI Bridging Cloud Infrastructure (ABCI) operated by the Ministry of Economy, Trade and Industry, I learned a large amount of photographs that I had collected in the past and expanded the feature space.

This feature space is tentatively called “shi3z”.

In this space, photographs taken by the author in the past and photographs collected on the Internet are randomly selected.

The part that overlaps FFHQ is a face-related image, but even the face-related image contains the faces of people I’m familiar with.

For example, find a “smiling man” in this feature space. Then the following result is output.

It’s very conceptual, but it certainly sounds like you’re looking for a “laughing man”.
Compared to the FFHQ, which only learned the face, things that seem to laugh are extracted from the sky, trees, and incomprehensible things.

This feature space is so large that it cannot be found by ordinary optimization functions.
Therefore, for this purpose, a new search algorithm has been designed, and it has become possible to perform a wide variety of searches simultaneously in a short time with the original optimization function.

As a result of expanding the feature space, some creativity has been added to AI-produced images.
It was quite surprising.

Of course, this is a manifestation of a lonely journey in which the AI ​​is thrown into an overly vast feature space and searches for a “laughing man” relying on a compass called cosine similarity.

Of course, I think it depends on each person if it can be called art or not, and how it feels, but the author is more “thirsty” like AI suffering and struggle than simply pull out beautiful pictures from words. it passed.

I’m looking left to right, but this swirl pattern is probably the result of having trouble finding the mouth rim distortion in a “smile”.

As a result of my struggles here, I found an expression that appears to have a raised corner of the mouth on the far right.

Looking at a more human example, it looks like the right end has more teeth than the left end, although that’s a slight difference.

This used OpenAI’s pre-training model called “CLIP” as the search index, but it should be noted that CLIP’s training data is created in Europe and America, so it’s like people in Europe and in America.

In other words, it is more important to know which standard is used as a benchmark than in which space the AI ​​learned.

Therefore, the author is currently conducting an experiment to translate CLIP itself into Japanese. CLIP relearning is not that difficult, all you need is the data.
However, the problem is what kind of data is prepared and what kind of procedure is used.

There seems to be a lot of ingenuity here too.

Leave a Comment