System architecture design notes (19)-network storage technology

System architecture design notes (19)-network storage technology

At present, there are mainly three mainstream network storage technologies, namely, Direct Attached Storage (DAS), Network Attached Storage (NAS), and Storage Area Network (SAN).

1 Direct Attached Storage (Direct Attached Storage, DAS)

DAS connects the storage device directly to the server through a SCSI (Small Computer System Interface) cable. It is a hardware stack. The storage operation depends on the server without any storage operating system. Therefore, some documents also refer to DAS as SAS (Server Attached Storage).

The applicable environment of DAS is: (1) The servers are geographically distributed and it is very difficult to interconnect them via SAN or NAS; (2) The storage system must be directly connected to the application server (for example, Microsoft Cluster Server Or the "raw partition" used by some databases); (3) When applications including many database applications and application servers need to be directly connected to the storage.

Because DAS directly connects storage devices to the server, it is restricted in terms of transmission distance, number of connections, and transmission rate. Therefore, when the storage capacity increases, the DAS method is difficult to expand, which is a huge bottleneck for the upgrade of storage capacity; on the other hand, since data reading must be processed by the server, the processing pressure of the server will inevitably increase. Data processing and transmission capabilities will be greatly reduced; in addition, when abnormal conditions such as server downtime occur, the stored data will also be affected, making it unusable. Currently DAS is basically replaced by NAS.

2 Network Attached Storage (Network Attached Storage, NAS)

Storage devices using NAS technology are no longer attached to a specific server through the I/O bus, but are directly connected to the network through a network interface, and are accessed by users through the network. The structure of the NAS storage system is shown in Figure 1.

A NAS storage device is similar to a dedicated file server. It removes most of the computing functions of a general-purpose server and only provides file system functions, thereby reducing the cost of the device. And in order to facilitate the transmission of data between the storage device and the network in the most effective way, it specifically optimizes the system hardware and software architecture. NAS takes the data as the center and separates the storage device from the server. The storage device is completely independent of the main server in the network in terms of function. The data access between the client and the storage device no longer requires the intervention of the file server. At the same time, it allows the client There is direct data access between the computer and the storage device, so not only the response speed is fast, but the data transmission rate is also high.

NAS technology supports a variety of TCP/IP network protocols, mainly NFS (Net File System) and CIFS (Common Internet File System) for file access, so the performance feature of NAS is to carry out small files Level of shared access. In specific use, the NAS device is usually configured as a file server, and the configuration of system resources, user configuration management, and user access login are realized by using a web-based management interface.

NAS storage supports plug and play, and storage can be established anywhere on the network. Web-based management makes the installation, use and management of the equipment easier. NAS can economically solve the problem of insufficient storage capacity, but it is difficult to obtain satisfactory performance.

3 Storage Area Network (Storage Area Network, SAN)

SAN is a high-speed dedicated subnet that connects disk arrays and servers through dedicated switches. It does not use file sharing access mode, but uses block (block) level storage. SAN is a dedicated storage system that connects one or more network storage devices and servers through a dedicated high-speed network. Its biggest feature is to separate the storage devices from the traditional Ethernet and become an independent storage area network. The system structure of SAN as shown in picture 2.

FC switch, namely Fiber Channel Switch.

According to the protocol used in the data transmission process, its technology is divided into FC SAN and IP SAN. In addition, there is an emerging IB SAN technology.

3.1 FC SAN

FC (Fiber Channel, Fibre Channel) is the same as the SCSI interface. Initially, it was not an interface technology designed and developed for hard disks, but specifically designed for network systems. As the storage system requires speed, it is gradually applied to hard disk systems.

The main characteristics of Fibre Channel are: hot pluggability, high-speed bandwidth, remote connection, and a large number of connected devices. It is today's most expensive and complex storage architecture, requiring significant investment in hardware, software, and personnel training.

FC SAN is composed of three basic components, namely interfaces (SCSI, FC), connecting devices (switches, routers), and protocols (IP, SCSI). These three components plus additional storage devices and servers form a SAN system. It is a dedicated, high-speed, and highly reliable network that allows independent and dynamic addition of storage devices, which simplifies management and centralized control.

FC SAN has two major shortcomings, namely high cost and complexity, the reason is because of the use of FC. Deploying SAN on Fibre Channel requires FC adapters, dedicated FC switches, and independent cabling infrastructure on each server. These facilities have greatly increased the cost, not to mention the cost of training personnel proficient in the FC protocol.

3.2 IP SAN

IP SAN is a storage network that implements block-level storage based on IP networks. Due to the low cost of the equipment, the simple configuration technology, and the ability to share and use large-capacity storage space, it has gradually been widely used.

In specific applications, IP storage mainly refers to ISCSI (Internet SCSI). As an emerging storage technology, ISCSI implements a SAN architecture based on an IP network, which not only has the advantages of simple IP network configuration and management, but also provides the powerful functions and scalability of the SAN architecture. ISCSI is a directly addressed storage library connected to a TCP/IP network. By using the TCP/IP protocol to encapsulate SCSI commands, the commands can be transmitted through the IP network, and the process is completely independent of location.

The main manifestation of the advantages of ISCSI is that, firstly, it is based on stable and familiar standards such as SCSI and TCP/IP, so installation and maintenance costs are very low; secondly, ISCSI supports general Ethernet switches instead of special Fibre Channel switches , Thereby reducing heterogeneous networks and cables; finally, ISCSI transmits storage commands through IP, so it can be transmitted across the Internet without distance limitation.

The disadvantage of ISCSI is that storage and network are the same physical interface. At the same time, the protocol itself has a large overhead. The protocol itself needs to frequently encapsulate SCSI commands into IP packets and parse SCSI commands from IP packets. These two factors All cause bandwidth occupancy and the burden of the main processor. However, with the development of chips dedicated to processing ISCSI instructions (to solve the burden of the main processor) and the popularization of 10G Ethernet (to solve the bandwidth problem), ISCSI will have a better development.

3.3 IB SAN

IB (Infiniband, unlimited bandwidth) is a switch structure I/O technology. Its design idea is to establish a single connection link between remote storage, network and servers through a set of central institutions (IB switches), and The IB switch directs the traffic. This structure is designed very tightly, which greatly improves the performance, reliability and effectiveness of the system, and can alleviate data traffic congestion between various hardware devices. This is a problem that many shared bus technologies have not solved well, because in a shared bus environment, connections between devices must establish separate links through designated ports.

IB mainly supports two environments: one is a module-to-module computer system (supporting additional slots for I/O modules) and the other is a chassis-to-chassis interconnection system in a data center environment, external storage systems, and external LAN and WAN access devices . IB supports higher bandwidth than current mainstream I/O carriers (such as SCSI, FC, etc.). In addition, due to the use of IPv6 headers, IB also supports effective connections with traditional Internet/Intranet facilities. The most important change brought about by replacing the bus structure with IB technology is the establishment of a flexible and efficient data center, eliminating the complicated I/O part of the server.

IB SAN adopts a hierarchical structure to separate the structure of the system from the function definition of the access device. Different hosts can use TCA (Target Channel Adapter) through network storage devices such as HCA (Host Channel Adapter) and RAID. ) Access to IB SAN.

IB SAN mainly has the following characteristics:

  1. Scalable Switched Fabric interconnect structure;
  2. The transport layer interconnection realized by hardware is efficient and reliable;
  3. Support multiple virtual channels; hardware realizes automatic path change;
  4. High bandwidth, the total bandwidth increases exponentially with the scale of IB Switch;
  5. Support SCSI remote DMA (DirectMemoryAccess, direct memory access) protocol;
  6. Has high fault tolerance and survivability;
  7. Support hot swap.

The purpose of network storage technology is to expand storage capacity and improve storage performance. These storage technologies can provide centralized data storage and effective access to files; all support multiple operating systems, and allow users to use data through multiple operating systems at the same time: they can be stored separately from the application server, and provide high data Availability; At the same time, it can reduce long-term operating costs through centralized storage management.

Therefore, from the point of view of the essence of storage, their functions are the same. In fact, the difference between them is becoming blurred, and all technologies are challenged by users' storage requirements. In practical applications, it is necessary to choose according to the business characteristics and requirements of the system (for example, environmental requirements, performance requirements, price requirements, etc.).