Skip to main content

Principles of Server Virtualization

 Principles of Server Virtualization


Virtualization technology originates from mainframes and can be traced back to the virtual partitioning technology on mainframes in the 1960s and 1970s, which allows multiple operating systems to run on a single host, allowing users to make the most of expensive mainframes. Resources. With the development of technology and the need of market competition, virtualization technology is transplanted to minicomputers or UNIX servers, but because there are still a few users who really use mainframes and minicomputers, and the incompatibility between products and technologies of various manufacturers , So virtualization technology is not of much public concern. (Note: Since the x86 architecture did not consider supporting virtualization technology at the beginning of its design, its own structure and complexity made it very difficult to virtualize on it. The early x86 architecture did not become a beneficiary of virtualization technology. )


In the 1990s, virtualization software vendors adopted a software solution to virtualize the x86 server platform with VMM (Virtual Machine Monitor) as the center. However, in this pure software "full virtualization" mode, the key platform resources acquired by each Guest OS (guest operating system) must be controlled and allocated by the VMM, and binary conversion needs to be used. The overhead caused by binary conversion makes The performance of "full virtualization" is greatly reduced. In order to solve the performance problem, a new virtualization technology "para-virtualization" has emerged, which does not require binary conversion, but through code-level modifications to the guest operating system, the customized Guest OS can obtain additional performance and high expansion. However, modifying the Guest OS also brings system-level conflicts and operational efficiency problems, requiring a lot of optimization work. At present, virtualization technology has developed to the stage of hardware support. The "chip-assisted virtualization" technology is to implement various functions of pure software virtualization technology with hardware circuits. It can reduce the system overhead of VMM operation, meet the requirements of CPU paravirtualization and binary conversion technology at the same time, simplify the design of VMM, and enable VMM to be written according to common standards. Chip-assisted virtualization technology not only integrates chip-assisted virtualization instructions on the processor, but also provides virtualization support in terms of I/O, which can finally realize the virtualization of the entire platform. The realization and development of virtualization technology have shown people the broad prospects of virtualization applications.


The virtualization of non-x86 architecture servers is a closed system, provided by the original server vendors, and will not be involved here. The following virtualization and virtualization management all refer to the virtualization of x86 servers. Figure 3.4 is a schematic diagram of the principle of x86 server virtualization.


Virtualization is a logical representation of resources without being restricted by physical limitations. The implementation form of virtualization technology is to add a virtualization layer to the system, abstract the resources of the lower layer into another form of resources, and provide them to the upper layer for use.


Server virtualization is to separate software and hardware from each other. That is, a virtualization software layer VMM is added between the operating system and hardware. Through space division, time sharing and simulation, the physical resources of the server are abstracted into logical resources. Provide a server hardware environment VM (Virtual Machine) consistent with its original expectations to the upper operating system, so that the upper operating system can run directly on the virtual environment, and allow multiple virtual machines with different operating systems to be isolated from each other, Run concurrently on the same physical machine, thereby providing higher IT resource utilization and flexibility. Figure 3.5 is a schematic diagram of the x86 virtualization architecture.




The virtualization software layer of server virtualization is the Virtual Machine Monitor (VMM), also called Hypervisor. Common Hypervisors are divided into two categories:


Type-I (bare metal type) means that VMM runs directly on bare metal, uses, and manages the underlying hardware resources. Guest OS's access to real hardware resources must be completed through VMM. As the direct operator of the underlying hardware, VMM owns The driver of the hardware.


Type-II type (host type) means that there is a layer of host operating system under the VMM. Since the guest OS must access the hardware through the host operating system, it brings additional performance overhead, but it can make full use of the host operating system to provide The device drivers and underlying services are used for memory management, process scheduling and resource management.


The huge difference before and after server virtualization stems from the essential difference between virtual machines and physical servers.


A virtual machine is an efficient and independent virtual computer system provided by the virtualization layer. Each virtual machine is a complete system. It has a processor, memory, network equipment, storage equipment, and BIOS. Therefore, the operating system and application programs are There is no difference between how they run in virtual machines and how they run on physical servers.


Compared with physical servers, virtual machines are not composed of real electronic components, but are composed of a set of virtual components (files), which have nothing to do with the hardware configuration of the physical server. Compared with physical servers, virtual machines have the following advantages:


  • Abstract decoupling
  • Can run on any server of the same architecture;
  • The upper application operating system can run without modification.
  • Partition isolation
  • Can run simultaneously with other virtual machines;
  • Realize the safe isolation of data processing, network connection and data storage.
  • Package move
  • It can be encapsulated in files, and quick deployment, backup and restoration can be realized through simple file copy
  • The entire system (including virtual hardware, operating system, and configured applications) can be easily migrated between different physical servers, even when the virtual machine is running.


Flexible expansion


The virtual resources (virtual CPU, virtual network card, etc.) on a single physical server can be dynamically expanded on demand (without downtime);


It can be constructed and distributed as a plug-and-play virtual tool, and dynamically expands according to the cluster's elastic resource allocation mechanism.


(2) Server virtualization implementation


VMM's virtualization of physical resources can be divided into three parts: CPU virtualization, memory virtualization, and I/O device virtualization. Among them, CPU virtualization is the most critical.


① CPU virtualization


Classic virtualization method


Modern computer architecture generally has at least two privilege levels (ie, user mode and core mode, x86 has 4 privilege levels, namely Ring0~Ring3) to separate system software and application software. Those instructions that can only be executed at the highest privilege level of the processor (kernel mode) are called privileged instructions. Generally, most of the instructions that can read and write the key resources of the system (ie sensitive instructions) are privileged instructions (x86 there are several sensitive instructions. In the case of privileged instructions). If the state of the processor is not in the kernel state when the privileged instruction is executed, an exception signal is usually raised and the system software is handed over to the system software to handle the illegal access (trapping). The classic virtualization method is to use the "privilege relief" and "trap-simulation" methods, that is, the Guest OS runs at the non-privileged level, and the VMM runs at the highest privilege level (full control of system resources). After the privilege level of the Guest OS is lifted, most of the instructions of the Guest OS can still be run directly on the hardware. Only when the privileged instructions are executed, will they fall into VMM simulation execution ("trap-simulation"). The essence of "trap-simulation" is to ensure that the instructions that may affect the correct operation of the VMM are executed by the VMM simulation, and most of the non-sensitive instructions still run as usual.


Virtualization vulnerabilities in x86


Because there are several instructions in the x86 instruction set that are sensitive instructions that need to be captured by VMM, but they are not privileged instructions (called critical instructions), so "privilege release" does not cause them to fall into simulation, and execution of them will not happen automatically. "Getting into" is captured by VMM, which hinders the virtualization of instructions.


The classification of sensitive instructions under x86 is roughly as follows:


Instructions to access or modify the state of the machine or virtual machine;


Instructions to access or modify sensitive registers or storage units, such as access to clock registers and interrupt registers;


Instructions to access the storage protection system or memory, address allocation system (segment pages, etc.);


All I/O instructions.


The first and fourth items are all privileged instructions, which will automatically generate traps and be captured by VMM when executed in the kernel mode, but the second and third items are not privileged instructions, but critical instructions. Some critical instructions will fail to execute due to the removal of the Guest OS's authority, but they will not throw an exception, so they cannot be caught, such as the VERW instruction in the third item.


Figure 3.6 is a schematic diagram of x86 virtualization technology classification.

Since there are more than a dozen sensitive instructions in the x86 instruction set that are not privileged instructions, x86 cannot be fully virtualized using classic virtualization technology. In view of the limitations of the x86 instruction set itself, the virtualization implementation for x86 has been roughly divided into two groups for a long time: Full virtualization group and Para virtualization group. The difference between the two factions is mainly in the processing of non-privileged sensitive instructions. The Full virtualization faction adopts a dynamic method, namely runtime monitoring and simulation in VMM after capture; while the Paravirtualization faction takes the initiative to attack and removes all the non-privileged ones that are used. All the sensitive instructions are replaced, which reduces a lot of "trapping-context switching-simulation-context switching" process, and a substantial performance improvement is obtained.


The specific characteristics of virtualization technology classification are:


Full virtualization


Full virtualization means that the abstracted VM has complete physical machine characteristics, and the operating system runs on it without any modification. Full virtualization faction adheres to the concept of running directly without modification, and optimizes the process of "runtime monitoring, and simulation after capture". There are some differences in the internal implementation of the school. Among them, the full virtualization based on binary translation (BT) is represented. The main idea is to translate the Guest OS instructions executed on the VM into a subset of the x86 instruction set during execution. The sensitive instructions are replaced with trapped instructions. The translation process and instruction execution are interleaved, and user-mode programs that do not contain sensitive instructions can be executed directly without translation.


Paravirtualization


Paravirtualization refers to virtualization that requires the assistance of an operating system, and the operating system running on it needs to be modified.


The basic idea of ​​the Para virtualization school is to modify the code of the Guest OS to replace the operations containing sensitive instructions with the Hypercall to the VMM. It is similar to the OS system call and transfers the control to the VMM. This technology is due to the Xen project. And widely known. The advantage of this technology is that the performance of the VM can be close to that of a physical machine. The disadvantage is that the Guest OS needs to be modified (for example, Windows does not support modification) and will increase maintenance costs. Modifying the Guest OS will cause the operating system to depend on a specific hypervisor, so Many virtualization vendors have abandoned Linux paravirtualization in part of their virtualization products developed based on Xen, and focus on chip-assisted full virtualization development to support unmodified operating systems.


Chip assisted virtualization


Chip-assisted virtualization refers to the realization of efficient full virtualization with the support of processor hardware.


The basic idea is to introduce new processor operating modes and new instructions to make VMM and Guest OS run in different modes. Guest OS runs in controlled mode. Some original sensitive instructions will all fall into VMM in controlled mode. , In this way, the "trap-simulation" problem of some non-privileged sensitive instructions is solved. Moreover, the saving and restoring of the context during mode switching is done by hardware, which greatly improves the efficiency of context switching during "trapping-simulation". Take Intel VT-x chip assisted virtualization technology as an example. This technology adds two processor operating modes in a virtual state: Root operation mode and Non-root operation mode. Xen runs in root operating mode, while Guest OS runs in non-root operating mode. These two operating modes have their own privilege level ring respectively. Xen and Guest OS without modification of the kernel run in the 0 ring of these two operating modes. In this way, both Xen can run on ring 0 and the Guest OS can run on ring 0, avoiding modifying the Guest OS. The switch between root operation mode and non-root operation mode is completed by newly added CPU instructions (VMXON, VMXOFF, etc.).

The chip-assisted virtualization technology eliminates the ring conversion problem of the operating system, lowers the virtualization threshold, supports the virtualization of any operating system without modifying the OS kernel, and is supported by virtualization software vendors. Chip-assisted virtualization technology has gradually eliminated the differences between software virtualization technologies and has become a future development trend. Figure 3.7 shows the vCPU scheduling allocation mechanism.


From the structure and function division of the virtual machine system, it can be seen that the guest operating system and the virtual machine monitor together constitute a two-level scheduling framework for the virtual machine system. Figure 3.7 is a two-level scheduling framework for a virtual machine system in a multi-core environment. The guest operating system is responsible for the second-level scheduling, that is, the scheduling of threads or processes on the vCPU (the core thread is mapped to the corresponding virtual CPU). The virtual machine monitor is responsible for the first level of scheduling, that is, the scheduling of vCPUs on the physical processing unit. There is no dependency on the scheduling strategy and mechanism of the two-level scheduling. The vCPU scheduler is responsible for the allocation and scheduling of physical processor resources among virtual machines. Essentially, the vCPUs in each virtual machine are scheduled according to certain policies and mechanisms. Any strategy can be used to allocate on the physical processing unit. Physical resources to meet the different needs of virtual machines. The vCPU can be scheduled to execute on one or more physical processing units (time-division multiplexing or spatial multiplexing physical processing units), and can also establish a one-to-one fixed mapping relationship with physical processing units (restrict access to designated physical processing units).




② Memory virtualization


Because VMM controls all system resources, it holds the entire memory resources, is responsible for page memory management, and maintains the mapping relationship between virtual addresses and machine addresses. Because Guest OS also has a page memory management mechanism, the entire system with VMM has an extra layer of mapping than the normal system. Figure 3.8 shows the three-layer model of memory virtualization.


Virtual address (VA): Refers to the linear address space that Guest OS provides to its applications.


Physical address (PA): The pseudo physical address abstracted by the VMM and seen by the virtual machine.


Machine address (MA): The real machine address, that is, the address signal appearing on the address bus.




The mapping relationship is as follows: Guest OS: PA = f(VA), VMM: MA = g(PA).


VMM maintains a set of page tables and is responsible for the mapping from PA to MA. Guest OS maintains a set of page tables, responsible for the mapping from VA to PA. In actual operation, the user program accesses VA1, and PA1 is obtained by the guest OS page table conversion, and then VMM intervenes to convert PA1 to MA1 using the VMM page table.


An ordinary MMU (Memory Management Unit, memory management unit) can only complete the mapping of a virtual address to a physical address once. In a virtual machine environment, the "physical address" obtained by MMU conversion is not a real machine address. If you need to get the real machine address, VMM must intervene, and the machine address used on the bus can be obtained after another mapping. If every memory access of the virtual machine requires VMM intervention and the software simulates address translation, then the efficiency is very low and there is almost no actual availability. In order to realize the efficient conversion of virtual addresses to machine addresses, the commonly adopted idea is that the VMM generates a compound mapping fg according to the mappings f and g, and directly writes this mapping relationship into the MMU. The currently adopted page table virtualization methods are mainly MMU-type virtualization (MMU Paravirtualization) and shadow page table, the latter has been replaced by memory chip-assisted virtualization technology.


MMU Paravirtualization


The basic principle is: When the Guest OS creates a new page table, it will allocate a page from the free memory it maintains and register the page with Xen. Xen will deprive the Guest OS of the write permission of the page table. Guest OS writes to this page table will be verified and converted to Xen. Xen will check each item in the page table to ensure that they only map machine pages belonging to the virtual machine and must not contain writable mappings to page table pages. Then, Xen will replace the physical address in the page table entry with the corresponding machine address according to the mapping relationship it maintains, and finally load the modified page table into the MMU. In this way, the MMU can directly complete the conversion of the virtual address to the machine address according to the modified page table.


Memory chip assisted virtualization


Memory chip-assisted virtualization technology is a chip-assisted virtualization technology used to replace the "shadow page table" implemented by software in virtualization technology. Its basic principle is: GVA (virtual address of guest operating system)-GPA (customer The physical address of the operating system)—HPA (physical address of the host operating system), the two address conversions are automatically completed by the CPU hardware (the software achieves high memory overhead and poor performance). Take the Extended Page Table (EPT) technology of VT-x technology as an example. First, the VMM pre-sets the EPT page table of the client's physical address to the machine address in the CPU; secondly, the client modifies the client page table. No VMM intervention is required; finally, during address conversion, the CPU automatically searches two page tables to complete the conversion from the guest virtual address to the machine address. Using memory chip-assisted virtualization technology, the client does not need VMM intervention during the running process, which saves a lot of software overhead, and the memory access performance is close to that of a physical machine.


Figure 3.9 is a schematic diagram of memory chip assisted virtualization technology.

③ I/O device virtualization


VMM reuses limited peripheral resources through I/O virtualization. It intercepts the access request of Guest OS to I/O devices, and then simulates real hardware through software. At present, the virtualization methods of I/O devices mainly include 3 Species: Device interface is fully simulated, front-end/back-end simulation, and direct division.


Device interface is fully simulated


The device interface is fully simulated, that is, the software accurately simulates the same interface as the physical device, and the Guest OS driver can drive this virtual device without modification. The advantage is that there is no additional hardware overhead, and the existing driver can be reused. The disadvantage is that multiple register operations are required to complete an operation, which makes VMM intercept each register access and perform corresponding simulations, resulting in multiple context switches. Performance is low.


Frontend/backend simulation


VMM provides a simplified driver (Backend). The driver in Guest OS is Frontend (FE). The front-end driver sends requests from other modules directly to Guest OS through a special communication mechanism with Guest OS. The back-end driver of, the back-end driver sends back a notification to the front-end after processing the request (Xen uses this method). The advantage is that due to the transaction-based communication mechanism, the context switching overhead can be reduced to a large extent without additional hardware overhead; the disadvantage is that the VMM is required to implement the front-end driver, and the back-end driver may become a bottleneck.


Direct division


Direct division means that the physical device is directly assigned to a Guest OS, and the I/O device is directly accessed by the Guest OS (without VMM). The current related technologies include IOMMU (Intel VT-d, PCI-SIG's SR-IOV) Etc.), aimed at establishing efficient I/O virtualized direct channels. The advantage is that direct access reduces the virtualization overhead, but the disadvantage is that you need to purchase additional hardware.

Comments

Popular posts from this blog

Defination of the essential properties of operating systems

Define the essential properties of the following types of operating sys-tems:  Batch  Interactive  Time sharing  Real time  Network  Parallel  Distributed  Clustered  Handheld ANSWERS: a. Batch processing:-   Jobs with similar needs are batched together and run through the computer as a group by an operator or automatic job sequencer. Performance is increased by attempting to keep CPU and I/O devices busy at all times through buffering, off-line operation, spooling, and multi-programming. Batch is good for executing large jobs that need little interaction; it can be submitted and picked up later. b. Interactive System:-   This system is composed of many short transactions where the results of the next transaction may be unpredictable. Response time needs to be short (seconds) since the user submits and waits for the result. c. Time sharing:-   This systems uses CPU scheduling and multipro-gramming to provide economical interactive use of a system. The CPU switches rapidl

What is a Fair lock in multithreading?

  Photo by  João Jesus  from  Pexels In Java, there is a class ReentrantLock that is used for implementing Fair lock. This class accepts optional parameter fairness.  When fairness is set to true, the RenentrantLock will give access to the longest waiting thread.  The most popular use of Fair lock is in avoiding thread starvation.  Since longest waiting threads are always given priority in case of contention, no thread can starve.  The downside of Fair lock is the low throughput of the program.  Since low priority or slow threads are getting locks multiple times, it leads to slower execution of a program. The only exception to a Fair lock is tryLock() method of ReentrantLock.  This method does not honor the value of the fairness parameter.

How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service?

 How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service? Answer: Clustered systems are typically constructed by combining multiple computers into a single system to perform a computational task distributed across the cluster. Multiprocessor systems on the other hand could be a single physical entity comprising of multiple CPUs. A clustered system is less tightly coupled than a multiprocessor system. Clustered systems communicate using messages, while processors in a multiprocessor system could communicate using shared memory. In order for two machines to provide a highly available service, the state on the two machines should be replicated and should be consistently updated. When one of the machines fails, the other could then take‐over the functionality of the failed machine. Some computer systems do not provide a privileged mode of operation in hardware. Is it possible t