Detailed explanation and application of HTTP protocol principle

Detailed explanation and application of HTTP protocol principle

What is the HTTP protocol

  1. HTTP protocol (HyperText Transfer Protocol) is a transfer protocol used to transfer hypertext from a World Wide Web (WWW: World Wide Web) server to a local browser. It is the most widely used network transmission protocol on the Internet, and all WWW documents must comply with this standard.

  2. HTTP is a communication protocol based on TCP/IP to transfer data (HTML files, picture files, query results, etc.).

How the HTTP protocol works

  1. The HTTP protocol works on a client-server architecture. The browser, as an HTTP client, sends all requests to the HTTP server, namely the WEB server, through URL.

  2. Web servers include: Apache server, IIS server (Internet Information Services), etc.

  3. The web server sends response information to the client according to the received request.

  4. The default HTTP port number is 80, but you can also change it to 8080 or other ports.

The following diagram shows the communication process of the HTTP protocol:

3.precautions for HTTP protocol:

  • HTTP is connectionless: the meaning of connectionless is to limit each connection to only process one request. After the server has processed the client's request and received the client's response, it will disconnect. In this way, transmission time can be saved.
  • HTTP is media independent: This means that any type of data can be sent via HTTP as long as the client and server know how to process the data content. The client and server specify the appropriate MIME-type content type.
  • HTTP is stateless: The HTTP protocol is a stateless protocol. Statelessness means that the protocol has no memory capacity for transaction processing. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which may result in an increase in the amount of data transmitted per connection. On the other hand, when the server does not need previous information, its response is faster

HTTP protocol workflow

1. Request process

The working process of an HTTP request is roughly as follows:

  1. The user enters the URL of the webpage to be accessed in the browser or clicks on a link in a webpage;

  2. The browser resolves the IP address of the target webpage through DNS according to the domain name in the URL;

     http://hackr.ip/index.html
     DNS hackr.ip, IP  20X.189.105.112 
     http 
     
  3. Before HTTP starts to work, the client will first establish a connection with the server through the TCP/IP protocol (TCP three-way handshake)

  4. After the connection is established, the client sends a request to the server. The format of the request is: Uniform Resource Identifier (URL), protocol version number, followed by MIME information including request modifiers, client information, and content.

  5. After receiving the request, the server gives the corresponding response information, the format of which is a status line, including the protocol version number of the information, a success or error code, and MIME information including server information, entity information, and possible content.

  6. Generally, once the web server sends the request data to the browser, it will close the TCP connection, and then if the browser or server adds this line of code in its header: Connection: keep-alive, the TCP connection will It remains open, so the browser can continue to send requests through the same connection. Keeping connected saves the time required to establish a new connection for each request and also saves network bandwidth.

    The process is as follows

2. Message structure

HTTP is based on the client/server (C/S) architecture model, which exchanges information through a reliable link, and is a stateless request/response protocol. HTTP uses Uniform Resource Identifiers (URI) to transmit data and establish connections. Once the connection is established, the data message is transmitted in a format similar to that used by Internet mail [RFC5322] and Multipurpose Internet Mail Extensions (MIME) [RFC2045].

2.1 URI and URL

  • URI

URI (Uniform Resource Identifier) is a uniform resource identifier, which can be uniquely marked under a certain rule, such as a person s ID number

  1. Uniform does not need to identify resource-specific access methods based on context
  2. Anything that Resource can identify
  3. Identifier represents an identifiable object
  • URL

Uniform resource locator, indicating the location of the resource, the URL that needs to be entered when using the browser to access the WEB page

  1. Uniform does not need to identify resource-specific access methods based on context
  2. Anything that Resource can identify
  3. Location
  • URL format

3. Request message

The request message that the client sends an HTTP request to the server includes the following format: request line, request header, blank line, and request data. The following figure shows the general format of the request message. .

1. Request line

  • Request method
    GET  
    POST  
    PUT  
    HEAD  
    DELETE  
    OPTIONS  
    TRACE  
    /
    URL
     
  • Request header
     (General Header)
     (Request Header)
     (Response Header)
     (Entity Header Fields)
     
  • Request body

Four. Response message

HTTP response is also composed of four parts, namely: status line, message header, blank line and response body.

HTTP protocol status code

The status code is responsible for indicating the return result of the client request, marking whether the server is normal, and notifying errors

Status code category

category Reason phrase
1XX Informational (informative status code)
2XX Success (success status code)
3XX Redirection
4XX Client Error (client error status code)
5XX Server Error (is the server error status)

2XX success

  • 200 (OK The data sent by the client is processed normally
  • 204 (Not Content normal response, no entity
  • 206 (Partial Content range request, partial data is returned, and the content of the entity specified by the Content-Range in the response message

3XX redirect

  • 301(Moved Permanently) Permanent redirect
  • 302 (Found) Temporary redirection, the specification requires the method name to remain unchanged, but it will change
  • 303 (See Other) is similar to 302, but must use the GET method
  • 304 (Not Modified) The status has not changed to cooperate (If-Match, If-Modified-Since, If-None_Match, If-Range, If-Unmodified-Since)
  • 307 (Temporary Redirect) Temporary redirection, the request method should not be changed

4XX client error

  • 400 (Bad Request) Request message syntax error
  • 401 (unauthorized) authentication required
  • 403 (Forbidden) The server refused to access the corresponding resource
  • 404 (Not Found) The resource cannot be found on the server

5XX server-side error

  • 500 (Internal Server Error) server failure
  • 503 (Service Unavailable) The server is overloaded or is shutting down for maintenance

HTTP request method

According to the HTTP standard, HTTP requests can use multiple request methods.

HTTP1.1 adds five new request methods: OPTIONS, PUT, DELETE, TRACE and CONNECT methods.

capital

Common header field

Request header field

Response header field

Entity header field