Let’s go back to the 2020 Nvidia launch for a moment, the lifelike announcement is good enough to fake it. The emergent technologies, Omniverse, are used for the announcement of the simulation. This new technology focuses on the animation part and the other part such as modelling is almost the same as before. Thousands of photographs were taken from different directions using a large number of cameras when making the model of the speaker. Same as before workflow, using Maya and Zbrush to make a visual character. Then it’s time for the technical climax, building on the original audio-driven animation technology, NVIDIA has adopted the new Face video to video technology. In simple terms, the technology maps a captured real video onto a virtual model. The realistic real-time textures reduce the uncanny valley. The consequent problem is that the light source of the mapping is fixed so that the highlights do not move correctly with the camera when the view is rotated in 3d software. The Nvidia used AI deep learning techniques. Instead of calculating light changes through a physical renderer, multiple videos were shot in advance for the computer to learn the distribution of highlights at different angles. Although this approach cannot calculate real-time light but for global lighting, getting results that look realistic is what we are ultimately aiming for. A similar method is used for the capture of body action. The speakers were asked to perform several times and videos of the performers’ previous presentations were used to learn the performers’ movement habits. After learning by the computer, once the speaker’s voice has been captured, the action can be selected from the corresponding actions. Each syllable corresponds to a number of different actions, and after computer calculation, different actions are selected to correspond to different scenes. Once Nvidia has enough character action models, it will be possible to directly use actions that match the character’s personality in the future in the film, television or gaming industries. This workflow will significantly reduce the product development cycle.
There are of course limitations to such a technology, as there are scenarios where humans use a particular language of movement to express emotions, which the technology is currently unable to do. And the light mapping by deep learning is very limited, if there are other special light sources in the scene, the computer will get incorrect results, because the result is not obtained by real light calculation. But I believe the cutting edge technology of the future will better help artists create their work.