Best practices for Kubernetes Ingress log analysis and monitoring

Best practices for Kubernetes Ingress log analysis and monitoring

Ingress mainly provides routing functions at the HTTP layer (layer 7), which is currently the mainstream exposure method of HTTP/HTTPS services in K8s. In order to simplify the threshold for users to analyze and monitor Ingress logs, Alibaba Cloud Container Service and Log Service connect Ingress logs, and only need to apply a yaml resource to complete the deployment of a complete set of Ingress log solutions such as log collection, analysis, and visualization.

Preface

At present, Kubernetes (K8s) has truly occupied the container orchestration market, which is the default cloud-independent computing abstraction. More and more enterprises are beginning to build services on K8s clusters. In K8s, components expose services to the outside through Service, common ones include NodePort, LoadBalancer, Ingress, etc. Among them, Ingress mainly provides routing functions at the HTTP layer (layer 7). Compared with TCP (layer 4) load balancing, it has many advantages (routing rules are more flexible, support canary, blue-green, A/B Test release mode, SSL Support, logging, monitoring, support for custom extensions, etc.) are currently the mainstream exposure methods of HTTP/HTTPS services in K8s.

Introduction to Ingress

Ingress in K8s is just a declaration of API resources. The specific implementation requires the installation of the corresponding Ingress Controller. The Ingress Controller takes over the definition of Ingress and forwards the traffic to the corresponding Service. At present, there are many implementations of Ingress Controller (for details, please refer to the  official Ingress Controller documentation ). The more popular ones are Nginx, Traefik, Istio, Kong, etc., and the most widely accepted in China is Nginx Ingress Controller.

Logging and monitoring

Logging and monitoring are basic functions provided by all Ingress Controllers. Logs generally include Access Log, Controller Log, and Error Log. Monitoring mainly extracts some metric information from the log and Controller. Among these data, the access log has the largest magnitude, the most information, and the highest value. Generally, the 7-layer access log includes: URL, source IP, UserAgent, status code, inbound traffic, outbound traffic, response time, etc. For Ingress Controller The forwarding log also includes additional information such as the forwarded service name and service response time. From this information, we can analyze a lot of information, such as:

  1. PV and UV visited on the website;
  2. Geographical distribution and equipment distribution of visits;
  3. The percentage of errors in website visits;
  4. The response of the back-end service is delayed;
  5. Distribution of different URL visits.

Our development, operation and maintenance, operations, and security personnel can complete their requirements based on this information, such as:

  1. Comparison of data indicators before and after the release of the new and old versions;
  2. Website quality monitoring, cluster status monitoring;
  3. Malicious attack detection and anti-cheating;
  4. Statistics of website visits and advertising conversion rate.

However, it is very complicated to manually build and operate a complete set of Ingress log analysis and monitoring system. The modules required by the system are:

  1. Deploy the log collection agent and configure collection and analysis rules;
  2. Since the number of visits in the K8s cluster is relatively large, it is necessary to build a buffer queue, such as Redis, Kafka, etc.;
  3. Deploy real-time data analysis engines, such as Elastic Search, clickhouse, etc.;
  4. Deploy visualization components and build reports, such as grafana, kibana, etc.;
  5. Deploy alarm modules and configure alarm rules, such as ElastAlert, alertmanager, etc.

Alibaba Cloud Log Service Ingress Solution

In order to simplify the threshold for the majority of users to analyze and monitor Ingress logs, Alibaba Cloud Container Service and Log Service will open up Ingress logs ( official document help.aliyun.com/document_de []( help.aliyun.com/document_de ), all you need is The deployment of a complete set of Ingress log solutions, such as log collection, analysis, and visualization, can be completed with a single yaml resource.

Ingress visual analysis

By default, the Log Service creates 5 reports for Ingress, which are: Ingress Overview, Ingress Access Center, Ingress Monitoring Center, Ingress Blue-Green Release Monitoring Center, and Ingress Anomaly Detection Center. People in different roles can use different reports according to their needs. At the same time, each report provides a filter box for filtering specific services, URLs, status codes, etc. All reports are implemented based on the basic visualization components provided by Log Service, and can be customized and adjusted according to the company's actual scenarios.

Ingress overview

The Ingress overview report mainly displays the overall status of the current Ingress, including the following types of information:

  1. Overall architecture status (1 day), including: PV, UV, traffic, response delay, mobile terminal proportion, error proportion, etc.;
  2. Website real-time status (1 minute), including: PV, UV, success rate, 5XX ratio, average delay, P95/P99 delay, etc.;
  3. User request information (1 day), including: 1 day/7 days visit PV comparison, visit geographical distribution, TOP visited provinces/cities, mobile terminal proportion, Android/IOS proportion, etc.;
  4. TOPURL statistics (1 hour), including: access TOP10, delayed TOP10, 5XX error TOP10, 404 error TOP10.

Ingress visit center

The Ingress visit center mainly focuses on statistical information related to access requests, which is generally used for operational analysis, including: UV/PV, UV/PV distribution, UV/PV trend, TOP visiting province/city, TOP visiting browser, TOP Access IP, mobile terminal percentage, Android/IOS percentage, etc.

Ingress Monitoring Center

Ingress monitoring center mainly focuses on website real-time monitoring data, which is generally used for real-time monitoring and warning, including: request success rate, error rate, 5XX rate, request not forwarded rate, average delay, P95/P99/P9999 delay, status code distribution, Ingress pressure distribution, Service access TOP10, Service error TOP10, Service delay TOP10, Service traffic TOP10, etc.

Ingress blue-green release monitoring center

Ingress blue-green release monitoring center is mainly used for real-time monitoring and comparison (comparison before and after the version and the current comparison of the blue-green version) when the version is released, so as to quickly detect anomalies and rollback when the service is released. In this report, you need to select the blue-green version (ServiceA and ServiceB) for comparison. The report will dynamically display the relevant indicators of the blue-green version according to the selection, including: PV, 5XX ratio, success rate, average delay, P95/P99/P9999 delay, Flow, etc.

Ingress Anomaly Detection Center

Based on the machine learning algorithm provided by the log service, the Ingress Anomaly Detection Center automatically detects abnormal points from Ingress indicators through a variety of timing analysis algorithms to improve the efficiency of problem discovery.

Real-time monitoring and warning

As the main entrance of K8s website request, Ingress real-time monitoring and alerting is one of the indispensable Ops methods. On the log service, based on the above report, the creation of an alarm can be completed in only 3 simple steps. The following example configures 5XX ratio alarms for Ingress. The alarm is executed every 5 minutes and triggers when the 5XX ratio exceeds 1%.

In addition to the general alarm function, the log service additionally supports:

  1. Multi-dimensional data association, that is, alarms are made through cross-judgment of multiple sets of SQL results to increase the accuracy of alarms;
  2. In addition to supporting SMS, voice, notification center, and email, it also supports DingTalk robot notifications and custom WebHook extensions;
  3. The alarm record is also recorded in the form of a log, which can realize double insurance for alarm failure.

Subscribe to reports

In addition to supporting notifications via alarms, the log service also supports the report subscription function, which can be used to periodically render reports into pictures and send them through emails, Dingding groups, etc. For example, sending yesterday's website visits to the operation group at 10 o'clock every morning, sending reports to the mail group every week for archiving, and sending monitoring reports every 5 minutes when the new version is released...

Custom analysis

If the default reports provided by the Kubernetes version of Container Service cannot meet your analysis needs, you can directly use Log Service SQL, dashboards and other functions for custom analysis and visualization.

Early adopters

In order for everyone to experience the Kubernetes audit log function, we have specially opened the experience center. You can  enter through  promotion.aliyun.com/ntms/act/lo... This page provides a lot of reports related to Kubernetes.

Reference documents

[1] www.aliyun.com/product/sls

[2] www.aliyun.com/product/kub...

[3] help.aliyun.com/document_de...

[4] help.aliyun.com/document_de...

[5] help.aliyun.com/document_de...

[6] kubernetes.io/docs/concep...

[7] kubernetes.io/docs/concep...