Business introduction

 The company provides complete central processing algorithm services for customers is based on the organic combination of hardware and software. In the era of digital applications to reduce costs and increase efficiency, reduce energy and increase efficiency to create an intensive and sustainable future digital world, the combination of hardware and software optimization of the central processing algorithm will be the inevitable future trend and is a necessary condition for the long-term development of future digital applications.

We provide customers with central processing algorithm services is a collection of hardware and software arithmetic algorithm optimization services to provide customers with more efficient energy-saving and cost-reducing digital services in the cloud, the company's technology and solution advantages mainly serve the Internet multimedia video advertising and Internet games and entertainment field. In the field of digital network requires algorithms, computing power, the two areas of the highest network is video advertising and game entertainment, is also the Internet field market share of the two largest industry sectors, with a stable growth space, in this industry sector has a large number of customer groups and a wide range of central processing algorithm services demand. 

Hardware architecture of central processing algorithm service

With the rapid development of modern society's information technology and intelligence, more and more devices are connected to the Internet, thus giving rise to a huge demand for computing. However, the power consumption problem has greatly affected the development of computing power-based digital space with two major limitations: power consumption and cooling. In order to meet the rapidly growing demand for computing power in the smart world, multi-core architecture has become the most important evolutionary direction.

Traditional multicore solutions use SMP (Symmetric Multi-Processing) technology, i.e., symmetric multiprocessing data structure, as shown in Figure 1. Under the symmetric multi-processing architecture, each processing data central is equal in status and has the same access to space. Any program or process or thread can be assigned to run on any processing unit, and with the support of the operating system, very good load balancing can be achieved, allowing a large increase in the performance and throughput of the whole system. However, since multiple cores use the same bus to access memory, as the number of cores grows, the bus will become a bottleneck, limiting the scalability and performance of the system.


The company's central processing algorithm service also supports the NUMA (Non-uniform memory access) architecture, which combines multiple cores into a node, each of which is equivalent to a symmetric multiprocessor (SMP), and the nodes in each data processing center communicate with each other via an On-chip Network. Each node is equivalent to a symmetric multiprocessor (SMP). Under the NUMA architecture, the entire memory space is physically distributed, and the set of all these spaces is the global picture of the entire system. The time for each core to access depends on the location relative to the location, and accessing locally (within this node) is a bit faster. The system nowadays also provides rich tools and interfaces to help us accomplish the optimization and configuration of proximity access. Therefore, computer systems implemented using the company's central processing algorithm service can achieve both good performance with proper performance tuning to provide greater multi-core scalability and better and more flexible computing power.


Software services for central processing algorithm services

 The development of computer is mainly manifested in the development of its core component, microprocessor. Whenever a new microprocessor appears, it will lead to the corresponding development of other components of the computer system, such as further optimization of computer architecture, increasing memory access capacity and access speed, continuous improvement of peripheral devices and the emergence of new devices, central processing is the core of arithmetic service components.

Massive applications, tens of billions of connections, and ubiquitous intelligence will generate massive amounts of data, drive massive data analysis and processing, and create value around the data. The new computing platform must have massive data processing and analysis capabilities, artificial intelligence training and reasoning capabilities in various application scenarios, as well as edge computing, IoT security and real-time processing capabilities in large-scale connectivity scenarios.

 The future direction of development of computing applications is inevitably the coexistence of multiple computing architectures, and the popularity of cloud services will accelerate this process. The cloud management platform unifies the scheduling of heterogeneous and diverse computing resources within the data center, deploys the underlying computing resources with the best processing efficiency, and allows the most suitable computing resources to handle the corresponding business, thus achieving the optimal matching of computing resources and maximizing utilization. Memory is the transit station for data moving, and the performance of memory access determines the application processing efficiency of the whole system to a certain extent. The basic feature of data-intensive applications represented by scientific research, deep learning, and memory database is that frequent moving and reading and writing of data are required during processing, and the dependence on memory access bandwidth is more obvious, and if the memory access bandwidth is lower than the bandwidth of the arithmetic core, it If the memory access bandwidth is lower than the bandwidth of the arithmetic core, it will cause the arithmetic core to spend a lot of time waiting for data, which will lead to a significant decline in system performance, and requires software optimization and support to achieve the optimization of the maximum arithmetic processing.

1、Establish service benchmark

Before optimizing or starting to monitor, we must first establish a benchmark data and optimization goals. This benchmark includes hardware configuration, networking, test models, and system operation data. We do a comprehensive evaluation and monitoring of the system to provide customers with a better analysis of the optimization, as well as the performance changes of the system after the implementation of optimization measures. The optimization goal is the performance target expected from the system based on the current hardware and software architecture. Performance tuning is a long-term process. At the beginning of the optimization process, it is easy to identify bottlenecks and implement effective optimization measures, and the optimization results are often significant, but the later the optimization process becomes more difficult, the optimization measures are more difficult to find, and the optimization results will become weaker and weaker. Therefore, we suggest a reasonable balance point.

2、Optimization testing

Use peak workloads or professional stress testing tools to stress test the system. Use some performance monitoring tools to observe the system status. During the stress test, it is recommended to record the running status of the system and programs in detail. An accurate history will be more helpful to analyze the bottlenecks and confirm whether the optimization measures are effective.

3、Determine the optimization plan

 The purpose of stress testing and monitoring the system is to identify bottlenecks. Bottlenecks in the system usually occur in terms of data processing being too busy, IO waiting, network waiting, etc. It is important to note that identifying bottlenecks is a matter of analyzing the entire test system, including the test tool, the test tool and the system under test, and so on. Many "performance crisis" projects are actually caused by these easily overlooked aspects of the test tool, and a little time should be spent first on troubleshooting these aspects during performance optimization.

4、Implementation of optimization

After the bottlenecks are identified, they should then be optimized. Most of them are standard system bottlenecks and optimization measures. While we prepare the optimization measures, we should also prepare the operational guidance to roll back the optimization measures. Avoid wasting a lot of time and effort by implementing some irreversible optimization measures that lead to restoring the environment again.

5、Confirm the effect of optimization

 After implementing optimization measures, confirm the effect of optimization. Measures that produce negative optimization effects should be rolled back in time to adjust the optimization plan. If there is a positive optimization effect but the optimization target is not reached, repeat step 2 "optimization test". If the optimization target is reached, all effective optimization measures and parameters need to be summarized and archived, and enter the subsequent production system version release preparation and other work.

 With the increase in network bandwidth, the amount of data in the cloud for customers has increased dramatically, and massive data processing has become the main load of the data center for central processing algorithm services, and the computing platform has started to develop in the direction of large memory by adopting high main frequency, wide bus, and multi-channel memory interface technology in the processor to improve memory access performance and meet the business requirements of data-intensive applications.

With the rapid development of customers' digitalization process, different customers put forward different requirements for IT infrastructure and computing power at different stages. New applications, new technologies, and new architectures are the direction of customer demand changes and the key to their future digital transformation, so central processing algorithm service innovation is the foundation of their digital transformation.

We have developed to the stage of multiplying technology application with central processing algorithm as the core feature, and are rapidly moving to the stage of intelligent technology application. Intelligent technology application will drive cloud services, data management and business model changes for customers, reshape customers' business capabilities and cloud service efficiency, and ultimately accelerate the algorithmic capabilities of customers' cloud services.

 With the industry rise of cloud entertainment games, VR/AR and other applications, as well as the growth in demand for IoT, mobile applications, personal entertainment, and artificial intelligence, customers' applications are becoming more and more scenario-based and diverse, and customers' pursuit of application experience continues to improve. The traditional single architecture can hardly meet the requirements and puts forward new demands on the computing platform, driving our computing architecture in the area of central algorithmic services toward diversity.