Here comes metaverse. What kind of virtual digital person do you want

Wen Xiaolei, sun Xiaolei, Cui Shifeng

Virtual digital human development

The fire of metaverse concept has driven many people’s infinite reverie about the future, and the popularity of virtual digital people closely related to metaverse is also increasing. But in fact, virtual human is not unique to metaverse, but to a certain extent, metaverse has driven the upsurge of virtual human. In terms of the relationship between the two, in metaverse, virtual human is metaverse’s user interaction interface, and interacting with virtual human has become a means of interacting with the virtual world. Therefore, in the future, virtual human may be the role implementation mode in metaverse.

Every step forward of metaverse will lead to an outbreak of the digital industry. Metaverse is the integration of virtual and real world on the basis of mobile Internet and Internet. It has a more far-reaching impact on the future. In terms of industrial scale, the volume should be one order of magnitude higher than that of mobile Internet. In metaverse in the future, the number of virtual people will exceed the number of real people, possibly more than 10 billion. So many virtual people correspond to a huge market and opportunity.

Virtual digital human is a “human” existing in the digital world. It uses information technology and artificial intelligence technology to digitize and visually “copy” the whole body and limb movements of the human body in an all-round way, and finally achieve the accurate simulation and construction of people in real life in the digital world. In short, it is to use advanced technology to build the “digital twin” of real people. Unlike robots with entities, virtual digital people rely on display devices. Many virtual people we know can only be displayed through devices such as mobile phones, computers or smart screens. On the market, it is also called virtual image, virtual human, digital human, etc. representative subdivision applications include virtual assistant, virtual customer service, virtual idol / anchor, etc.

Specifically, the virtual digital human can be divided into three core elements: “virtual”, “digital” and “human”. Among them, human is the most core element. Whether it can be highly anthropomorphic and provide a natural and realistic experience is the key to whether the virtual human can land in each scene. The so-called anthropomorphism has three key points: first, whether the appearance is highly “anthropomorphic”, and the facial appearance and overall image of virtual digital human. It will be affected by the category of virtual digital human, production details, rendering level, design aesthetics, etc; Second, whether the behavior is “anthropomorphic”: the specific behavior, including facial expression, body expression and voice expression, will be affected by driving mode, driving model category, training data and driving model accuracy, showing different degrees of realistic effects; Third, whether the interaction ability is “anthropomorphic”: the interaction level between virtual digital human and the real world is the owner’s thought. It has the ability to recognize the external environment and communicate and interact with people, including answer content and body reaction. It will be affected by speech recognition ability, natural language understanding and processing level, knowledge map, preset knowledge base, etc.

1.1 development process of virtual digital human: mature technology drives the development of the industry, and the scene content is constantly enriched

The concept of virtual human did not appear until today. Since the 1980s, people began to try to introduce virtual human into the real world. At first, the virtual human technology was mainly based on manual drawing. After the Japanese animation “super time fortress” was broadcast, the animation official sold the episode according to the idol album in the name of the heroine Lin Mingmei. She was regarded as the world’s first “virtual singer”. The virtual human Max headroom created by the British has also made a number of films and advertising films. It is worth noting that these virtual people are scattered, and have not caused greater splash in the society after a moment of noise. Therefore, the virtual people at this stage are still in the embryonic stage.

At the beginning of the 21st century, with the innovation of CG and motion capture technology, virtual human entered the exploration period, and the landing scenes were mainly in the film and television entertainment industry. In film and television, using motion capture technology and computer processing, virtual characters are obtained for digital avatars in film production. In terms of virtual idols, Japan has developed the first widely recognized virtual human “chuyin future”. Her phenomenal success in the commercial field has accelerated the virtual idols to the direction of industrialization and professionalism, and opened up more subdivided fields. However, although the virtual human at this stage has reached the practical level, the cost is high and the presentation form is relatively rough.

In the last five years, thanks to the breakthrough of artificial intelligence technology, the production of virtual human has been simplified, more interactive, and has entered the fast lane of development. At present, the precision of modeling, motion capture and AI interaction are continuously improved. The virtual human is realistic, and has the ability of emotional expression and communication. In recent years, virtual human has been landing in more scenes, has been successfully applied to banking, medical treatment, education, government affairs and communication industries, and has gradually entered the mainstream market. In terms of IP virtual human, vups from various overseas platforms such as vtuber and domestic BiliBili have poured into the market; In terms of non IP virtual human: at this stage, some technology companies have begun to layout the virtual human track. For example, Xinhua news agency and Sogou jointly released the “Ai synthesis anchor”, and Shanghai Pudong Development Bank and Baidu jointly released the “Xiaopu” digital employee providing and financing services.

Since 2020, more and more virtual digital people have sprung up, and the commercial value has gradually emerged, stepping on the dividend of the outbreak of metaverse concept. In addition to entertainment companies and MCN institutions, major manufacturers have also entered the track one after another, laying out different types of virtual human IP. Virtual digital people are becoming the image of “generation Z” chasing worship on major social platforms. At present, virtual digital human is developing towards intelligence, convenience, refinement and diversification, and has entered a growth period.

1.2 technical architecture: real person driven and computing driven, jointly promoted by dual technical routes

At present, virtual digital human is still in a period of rapid development and has not formed a unified system framework. Using the framework of artificial intelligence industry development alliance, we divide the general system of virtual digital human into five parts: character image, voice generation, animation generation, audio and video, synthetic display, interaction, etc. The interaction module is an extension, which can intelligently recognize the user’s intention, determine its subsequent voice and action, and drive the next round of digital human interaction. According to whether there are interactive modules, it is divided into non interactive virtual digital human and interactive virtual digital human. The latter is divided into real person driven and computing driven according to different driving modes. Non interactive virtual human has relatively simple operation process and relatively early development. At present, there are many manufacturers. However, the overall development trend in the future still focuses on interactive virtual human.

Character images are divided into 2D and 3D according to dimensions. 2D digital people are relatively simple. 3D digital people need to use 3D modeling technology to generate digital images, which requires more calculation. At the current node, it is still difficult to achieve the daily application of low-cost 3D virtual image. For example, the popular virtual live broadcasting industry this year is in the process of upgrading from 2D to 3D. Because the cost of 3D live broadcasting is still difficult to reduce, most virtual anchors are still exposed in 2D / live 2D, which objectively leads to the limitations of performance effect and interaction with the audience in daily live broadcasting.

The reality driven technology has low cost and high sense of interaction, and is mostly used in the field of Pan entertainment, including virtual idol, virtual anchor, film and television, etc; Intelligent driven virtual human has higher technical requirements, high initial investment, weaker interaction ability than real person, and is generally used in enterprise services. Improve operational efficiency.

Real person driven digital human drives digital human through real people. The main principle is that real people communicate with users in real time according to the user video transmitted from the video monitoring system. At the same time, real people’s expressions and actions are presented on the image of virtual digital human through the action capture and acquisition system, so as to interact with users. This technical idea can be regarded as a further continuation of CG technology in traditional film and television production. In recent years, the main technical breakthrough lies in the action capture link. There is no need for expensive mobile capture equipment. A simple ordinary camera combined with an ideal recognition algorithm can achieve a more accurate drive, significantly reducing the threshold of fine virtual content generation. Due to the real person participation and operation, the real person driven virtual human has better flexibility and interaction effect. Based on such characteristics, in the use scene, on the one hand, the real person driven robot can be used in film and television content creation to reduce production costs; on the other hand, it can be used in virtual idols and live broadcast to complete highly interactive and time fragmented activity content.

In the computing driven model, the speech expression, facial expression and specific actions of virtual digital human will be driven in real time or offline mainly through the operation results of the deep learning model, and the final effect will be achieved after rendering. The final presentation effect of computing driven virtual human is comprehensively affected by various perception technologies, including speech synthesis, NLP, speech recognition and so on.

In order to make the virtual human more “anthropomorphic” dynamic and support the authenticity of the character image and the sense of interactive experience, at the technical level, the following three core technologies are required:

CG modeling / image migration technology: affect the presentation of character appearance. It is embodied in the anthropomorphic degree of the appearance of virtual digital human. The technical differences in this item at home and abroad partly lead to the fact that players at home and abroad mainly focus on different segmentation scenes and development paths.

Natural language processing (NLP) interaction technology: support interactive experience. NLP takes dialogue ability as the core. After text dialogue assistant and voice AI assistant, this technology continues to play a core role in virtual digital human, which is regarded as the brain of virtual digital human. Ideal results have been achieved in AI interactive assistant

Deep learning models such as computer vision (CV): the effect of the model is deeply affected by the amount of data, calculation framework, key feature points and other factors. The effect of voice driven deep learning model largely determines whether it can present natural facial expression changes and limb changes. In addition, whether it can specially design emotion and other factors will also have an important impact.

1.3 industrial chain: leading in overseas technology, focusing on application side innovation in China

At present, the technology and theory of virtual digital human are becoming more and more mature, the application scenarios are constantly enriched, and can be widely combined with various industries. The realization path and market potential are clear. Although the commercialization model is not fully mature, it is also evolving in the direction of diversification. In terms of industrial chain, there are certain differences between overseas and domestic in upstream technology level, product model and downstream landing scene. At the bottom technology level, foreign countries started earlier, with relatively leading comprehensive strength, and can better cut into service-oriented scenarios that require higher interaction capabilities. In China, the mobile Internet has developed rapidly with multiple elements, and has relatively strong innovation ability at the application end. In particular, the live broadcast scene with Chinese characteristics is a major characteristic mode of the commercialization of virtual digital people.

According to the current industrial situation, the upstream of the industrial chain is virtual human production factors, including technology and content. In the field of technology, some companies provide full stack technical services, while others start from vertical technology modules, including artificial intelligence, motion capture, modeling and rendering, etc. Upstream technology manufacturers have been deeply engaged in the industry for many years and have formed relatively deep technical barriers. In terms of content, both the content IP holder and the artist IP holder can become the source of personality of the virtual person.

The midstream of the industrial chain is the platform layer, including software and hardware system, production technology service platform and AI capability platform, which provides technical capability for the production and development of virtual digital human. AI capability platform provides computer vision, intelligent voice and natural language processing technology capabilities. Among them, AI technology is the core driving force of the virtual digital human industry. At present, the global AI technology has reached a mature stage with high technical maturity. The industry is accelerating the deep integration with the real economy and helping the industrial transformation and upgrading. There are many enterprises gathered at the platform layer. Large integrated Internet manufacturers such as Tencent, Baidu and Sogou, and vertical manufacturers magic enamel technology and Xiangxin technology all provide corresponding digital human technology service platforms.

The application layer is the actual application scenario of virtual digital human. The current application scenario mainly includes media, games, film and television, finance, culture and tourism, medical and other industries, forming an overall industry solution and enabling the development of various fields. Virtual digital human technology and products are combined with different industries and production and life. Its large-scale, replicable and customizable characteristics can greatly improve traditional processes, enhance user experience while improving business efficiency, and bring changes to traditional fields.

1.4 business model: the two applications of service + identity work together, with prominent business value and high industry ceiling

Based on different application scenarios, virtual digital people can be divided into service-oriented virtual digital people and identity oriented virtual digital people. Their core functions, output positioning, representative applications and industrial values are obviously different. Thanks to the great potential of virtual IP and the start of virtual second separation, identity virtual human may dominate in the future, and service virtual human will develop relatively stably. In the future, multimodal AI assistant will gradually develop in the fields of finance, government affairs, culture and tourism.

Due to the difference in technical level at home and abroad, foreign countries have advantages in CG technology, and can create virtual digital people with a high sense of care. Foreign service-oriented virtual people take the lead in landing in medical and other scenes, and develop virtual companion assistants and psychological consultants; Domestic service-oriented virtual human is mainly used to replace real people for broadcast and other content generation, and simple question and answer interaction.

Service-oriented virtual human: service-oriented, a single virtual human produces less economic benefits than identity type, but as the basic element of the virtual world, it can create a very considerable market space. According to its core service functions and underlying technologies, it can be divided into alternative real person service and multimodal AI assistant.

Identity virtual human: compared with the service virtual human lacking personality symbol, identity digital human emphasizes its own identity. Identity virtual human is mainly divided into two forms. One is the virtual IP and virtual idol with independent human settings in the real world, which can carry out activities and realize through photos, videos, advertisements, concerts, live broadcasts and other ways; The other is the second virtual identity created by people in metaverse in the future, which is a means of interacting with the virtual world. Virtual IP and virtual idol have broken the circle rapidly in recent years. They are the most concerned category at present, and the market scale has been very considerable

The ceiling of the industry is high, and the market is expected to exceed 10 billion US dollars in the future. The industry scale of virtual human is measured by conversational AI, including voice assistant, customer service chat robot and virtual digital human. According to the report of adroit market research, the market scale of global dialogue artificial intelligence platform is expected to increase from US $4 billion in 2019 to US $17 billion in 2025. This prediction was released before the outbreak of COVID-19. Therefore, due to the influence of the home economy, the market size may increase even more than expected. At present, the market of virtual digital human has a high ceiling, a clear path to commercialization, and there have been quite competitive technology companies. In the past two years, after various industries began to realize their technical feasibility and commercial value, as well as the further promotion of metaverse, the industry will quickly enter a period of rapid development.

We believe that there are many application scenarios for virtual digital human, which can be widely combined with various industry fields. The realization path and market potential are clear, and the industry potential is large. The main driving forces in the future are:

1) User intergenerational changes, “Z generation” has higher acceptance of the virtual world and is more eager for new forms of content consumption;

2) AI, CG modeling and other related technologies are becoming more and more mature, the industry threshold is reduced, and the cost side is reduced;

3) Demand driven. Firstly, people need temperature services. Compared with traditional forms such as voice and text, the visual image and interaction of virtual human can be closer to users; Secondly, virtual human can greatly improve efficiency and reduce labor cost;

4) Combined with various industries, virtual digital human has great innovation and application space, and can bring innovative value. As the main interactive carrier of metaverse, virtual human has clear and huge growth potential, and there will be more ideal extension space based on NFT and VR in the future.

Application scenarios: application scenarios and value points of virtual human in the future?

Generally speaking, we believe that the value of virtual human mainly lies in:

Ø virtual people can replace real people for services, reduce the demand for real-life performance, reduce the production cost of standardized content, and improve efficiency and quality. Today, with the rapid development of enterprises, the rapid change of business makes the training cost of service manpower increase, the training difficulty increases, and the service quality is uneven. In the 5g era, the demand of customer groups for content gradually changes from graphic to video, and the quality of information transmission under the traditional information interaction mode is low.

Ø virtual human can replace real people and be applied in some impossible scenes. Virtual human can break the limitations of time and space and have more choices in application scenarios.

Ø virtual human has the advantage of “personification” and provides more sense of care and reality. Virtual human based on real image can establish empathy with users, enhance users’ sense of trust and security, and provide consulting, care, company, transaction processing and other services in general scenes or specific care scenes.

Virtual human can effectively reduce cost and increase efficiency by replacing real person service. It can be used in specific scene customer service, virtual anchor, news broadcast and other places.

The content production platform of virtual digital human is gradually enriched. Users only need to input the content to be broadcast into the platform and select the host’s relevant parameters to quickly generate relevant broadcast videos. Overseas, the content platforms represented by hour one and synthesia are equipped with different virtual images and multiple standardized templates to support the automatic generation of virtual content, such as the automatic generation of virtual digital human video according to text. Domestic manufacturers, represented by iFLYTEK, Xiangxin technology and volcanic engine, release content platform products with news broadcasting as the main scene. The maturity of the content production platform further reduces the application threshold.

Taking virtual live broadcasting as an example, virtual live broadcasting is the most mature and successfully commercialized scene at present because it has no time limit, controllable cost, price and other advantages. With the strong demand for e-commerce live broadcasting, virtual people are more and more widely used in major e-commerce platforms. Virtual anchor can interact with audience AI, conduct 7 * 24-hour live broadcasting, save labor cost and expand live broadcasting sales. The competition in the domestic market is fierce, and some manufacturers begin to provide one-stop services. With the maturity of the application, the use threshold and price threshold of virtual live broadcasting have decreased significantly, and the price has decreased from tens of thousands to thousands.

In reality, virtual customer service provides interactive services through terminals such as integrated large screen, and better serves customers online through app. Reduce enterprise labor costs. Due to the relatively clear and standardized service requirements and business processes of the customer service scenario, it has become an ideal virtual human landing scenario under the limitations of the existing technology. The virtual “digital man” customer service created by JD has started internal testing, actively explored a new mode of human-computer interaction, worked 24 hours a day, and reduced the time and labor cost of repetitive work.

Virtual human can explore scenes that can not be reached by real people and break the restrictions of time and space.

In June 2021, next studios of Tencent mutual entertainment and Xinhua News Agency jointly created the first digital reporter “Xiaozheng”, which introduced the space station facilities and astronaut life to the audience during the mission of Shenzhou 12 manned spacecraft. Compared with training a real reporter to enter the space station through professional training, the virtual human has less external restrictions and investment, and can be organically combined with digital content and scenes. For example, Xiaozheng’s “Mars branch” of Xinhua news agency in the video shows the audience the appearance of Mars. In the future, virtual human may become an alternative solution in some scenes where real people are difficult to realize or the cost is very high.

Virtual human can also break the restrictions on space. In the future, it may be combined with shopping scenes to create real shopping scenes with VR / AR technology, so that consumers can obtain the same shopping experience as offline or beyond offline shopping experience.

Taking the car shopping scene as an example, retail enterprises can quickly create complex real scenes and 3D realistic views of dynamic products through light field technology. For example, the application of VR panoramic virtual car viewing gives customers a novel online 360 degree panoramic test drive experience. Combined with VR panoramic technology, build a 3D interactive display system for the car, carry out 3D ring shot fine reconstruction of the car, and restore the car details 360 degrees in detail. Customers can zoom in and out of any corner at will through mouse operation, experience opening and closing doors, opening and closing lights, switching body colors, etc., rich interactive functions, so that customers can also experience the pleasure of watching new cars at home. In addition, they can also cooperate with the virtual host to explain, improve the image and professionalism of the salesperson, and increase customer confidence and ordering intention. In terms of car display, virtual car viewing can better highlight the design concept and characteristics of the car. Compared with the introduction of traditional salespeople, VR experience can more truly explain the performance of the car. For example, it can show the different effects of driving vehicles under different road conditions and what special skills to deal with ice and snow weather in virtual scenes, which can not be experienced and felt in real car viewing, Showing the advantages to customers can not only play a role in publicity, but also improve the sales rate. In terms of car selection, customers can bring VR equipment to enter the virtual scene, choose different seat leather colors, interior styles, car audio and many other things that can be changed, choose their favorite styles, and create their own exclusive car. In the future, through the further development of virtual reality technology, it is expected to create a virtual human image, Provide a realistic test drive experience in the virtual world.

In addition to virtual car watching, virtual R & D, virtual assembly, virtual training and virtual driving have also become popular. Panoramic car viewing has changed the consumption habits of consumers and the production mode of automobile manufacturers, and greatly reduced the product production cycle and cost. Automotive virtual reality also allows the automotive industry to see a new growth point. For example, in terms of Virtual R & D, BMW’s car designers use VR technology to establish digital product models in the VR environment with the help of virtual engines, so as to realize the visualization inside and outside the car, help designers quickly repair the design draft, and simulate the test drive scene. In addition, virtual reality technology can get rid of the limitations of time and space. Design team members from different regions do not need to gather together. As long as they wear VR head display, they can view the prototype car together and communicate their feedback. This not only improves the design efficiency, but also reduces the cost. Even during the epidemic, designers can continue their projects as a team while maintaining a safe social distance.

Virtual human AI assistant has the advantage of “personification” and can provide a sense of care and warmth.

Overseas virtual human companies focus on developing AI assistants that can provide emotional value in specific scenarios. Key landing scenarios include medical consultants, daily companionship, personalized financial consultants, psychological consultants, shopping assistants, etc. This emotional scene has extremely high requirements for “personification”. Foreign companies have invested a lot in the appearance and interaction of prefabricated virtual images in the early stage to create a sense of user trust. For example, the health consultant launched by UneeQ during COVID-19.

Voice AI assistant is still in the early stage. Multimodal digital people with specific images still need some time to transform. Promoted by the development of XR or holographic projection, it may accelerate the birth of the specific image of the virtual assistant. Compared with voice assistant, AI assistant with human form can provide better sense of companionship and emotional effect. At present, some companies at home and abroad have begun to try this field. Hybri, a company combined with AR, has launched the first AI virtual assistant application. Users can generate virtual images based on photos, which will continue to exist in the form of AR and have simple interaction. In China, Xiaobing, Xiaoai and other well-known AI assistants have been customizing their exclusive virtual digital human image, but it has not been widely carried due to technical reasons. With the breakthrough and wide application of VR terminals, or the breakthrough of naked eye 3D technology, in the future, as envisaged by hybri, real-life virtual assistants may appear in our daily life.

Future development trend of virtual digital human: more natural, lower threshold and safer

Make the virtual anthropomorphic performance more natural and drive more ways. In particular, the delicate and smooth facial expression, eyes and muscle movement not only need more real data and better algorithms, but also need the cross-border support of biology, graphics and film and television industry. The strengthening of body movement is also the future direction. At present, digital people are very realistic in appearance, but there is still much room for improvement in emotional expression and recognition and body movements. For example, they have rich expressions and corresponding actions like people, and can make appropriate responses according to the actions and emotions of interactive objects.

The threshold of digital people with real image is still relatively high. To promote it in the future, it is still necessary to further simplify the production process, reduce the use threshold and let more enterprises use this service. At present, digital human image customization has very high requirements for raw data, and models need to be photographed and recorded in a high standard recording environment. In the future, as like as two peas of video technology can be developed, users can easily customize digital figures just like themselves.

Strengthen the rational use of face data and AI technology, and advocate “science and technology for the good”. Although the technology has initially supported the rapid generation of virtual human through photos and videos, at the same time, 3D virtual human is difficult to forge face recognition results because of its poor fusion effect with the surrounding environment. However, we should also pay attention to strengthening rational use and risk prevention in future development.

risk analysis

The development of virtual human technology is not as expected: virtual digital human is very dependent on the development of technology. If CG modeling, mobile capture, AI and other technologies can not be further upgraded and can not meet higher interaction requirements, some scenes will not be truly implemented. In addition, the technical threshold cannot be reduced, resulting in high cost, which may not be able to carry out large-scale commercial applications.

Unclear policy supervision: at present, as a cutting-edge technology, there is no clear policy. If the regulatory policy is implemented and improved in the future, it may bring adjustment to the industry.

Legal issues: the law is not perfect, and there is still much room for improvement in virtual assets, digital assets and copyright.

Ethical and moral issues: the application of digital human technology will also bring ethical challenges, such as some security and privacy protection issues. Corresponding anti-counterfeiting and detection technologies need to be developed, including the possible social problems caused by the increasingly blurred boundary between human and machine