Front-end monitoring SDK development sharing

Front-end monitoring SDK development sharing

table of Contents

I. Introduction

With the development and attention paid to the front-end, the emphasis on the front-end monitoring system in the industry has gradually increased. I will not explain why monitoring is needed here. Let's talk about the demand first.

For small and medium-sized companies, you can directly use three-party monitoring. For example, sentryyou can build a set of free to capture exceptions and report events, or use Alibaba Cloud, ARMSwhich has more comprehensive functions and is not too expensive. There are many similar open source systems or payment systems, which can meet our certain needs.

If the company gradually grows and has become a medium-to-large company, the number of users, business services, and the overall structure of the company are all upgrading, so that the three-party monitoring system may slowly appear some problems that cannot meet the needs. For example, the relationship between various systems within the enterprise is too independent and scattered, cannot use the internal unified login, cannot jump to each other, and wants to increase some field collections and cannot be supported quickly, etc. These problems will cause the efficiency to fail to meet the requirements of enterprise development. A front-end monitoring system that is internally controllable and can respond to enterprise needs at a high speed is very necessary.

We have invested a certain amount of energy and time in the internal front-end monitoring system. Today, we will share SDKthe contents of the front-end monitoring part, mainly in three aspects:

  • What data is collected
  • Client SDK (probe) and principle
  • Write test cases

2. What data to collect

The core of the front-end monitoring system is to collect client-related data. The client-side probes we currently support are:, webWeChat applet, andriodand ios. They mainly collect the following information as shown in the figure:

2.1 Performance

Collect ajaxperformance information such as page loading, static resources, interfaces, etc. The indicators include loading time, httpprotocol version, response body size, etc. This is to provide data support for the overall quality of the business and solve slow query problems.

2.2 Error

Collecting jserrors, static resource loading errors, and ajaxinterface loading errors, these general error collections are well understood. The following mainly explains the "business interface error (bussiness)":

The client sends a ajaxrequest to the back-end business interface, and the interface will return a jsondata structure, and there will usually be two fields errorcodeand messagetwo fields, which errorcodeare the status codes defined within the business interface. The normal business response will internally agree such as errorcode==0waiting. If it is not 0due to some abnormal problems or foreseeable abnormal problems, this kind of error data needs to be collected.

Since different teams or interfaces may have different conventions, we will only provide a preset method. The preset method will be ajaxcalled after the request is responded. The business side jsonwrites the judgment logic in the preset method according to the agreement and response data. Control whether to report. Like the following:

errcodeReport(res) {
  if (Object.prototype.toString.call(res) === '[object Object]' && res.hasOwnProperty('errcode') && res.errcode !== 0) {
    return { isReport: true, errMsg: res.errmsg,code: res.errcode };
  }
  return { isReport: false };
}
 

2.3 Auxiliary information

In addition to the above two types of hard indicator data, we also need a lot of other information, such as: user's access track, user click behavior, user ID, device version, device model, UV/UA identification, traceIdand so on. In many cases, the problem we want to solve is not so simple and straightforward to find out, and even we need front-end monitoring and other systems to be able to correlate in some cases, so these soft indicator information is also very important.

Explain specifically here traceId:

Today's back-end services will use the APM(application performance management) system, and the APMtool will generate a unique one at the beginning of a complete request call id, usually called traceId, it will record the link details of the server during the entire request process. If the front-end can obtain it, it can be used to APMquery the log information of a certain request in the back-end system. As long as the back-end is properly configured, the back-end interface httpcan traceIdreturn to the client when responding to the client's request, and the SDK can collect the ajaxrequest traceId, so that the front-end and back-end monitoring can be associated.

2.4 Summary

Collecting the above information and developing a management console can achieve the purpose of monitoring front-end performance and abnormal errors. Imagine a scenario. When we receive an alarm from the monitoring system or a problem feedback from a related colleague, we can open the management console and first check the real-time error. If we find jsthe problem caused by the code, we can quickly find the front-end code error The place. If it is not a front-end error, we can find that it is a problem with the back-end interface through the collected business interface errors. We can also inform the back-end colleagues in time, and at what time, which interface reported errorcodethe error of xx, and we can also traceIdcheck it directly The ajaxback-end link monitoring data to this request. If it is not an obvious problem that can be checked, we can also analyze and restore the user's scene at the time through the collected user trajectory, device information and network request data, to assist us in troubleshooting the difficult-to-reproduce code bugOr compatibility issues.

In the above scenario, we can improve the front-end troubleshooting ability and even assist back-end classmates. Most of the time, when it appears bug, it is very likely that the first thing to do is to find the front-end for feedback, and the front-end is the first force in troubleshooting. When we have such a front-end monitoring system, we will not be at a loss every time we encounter a problem, and the time to solve the problem will be much faster.

[List of specific fields]

After determining what information to collect, you need to implement the client SDK, which can automatically collect data in the business project and report it to the server.

3. client SDK (probe) related principles and API

The so-called probe is because we SDKrely on the operating environment of the front-end project to be monitored, APIand add probe functions to the bottom layer of the operating environment to collect information. The main principles and use of the WEBWeChat applet are shared below .SDKAPI

3.1 WEB

The following figure is SDKmainly used Web API, through which APIwe can obtain: page performance information, resource performance information, ajaxinformation, and error information.

3.1.1 Performance

By performance.timingcan get performance data for the page is first loaded, dns, , tcpwhite screen time, but in the latest standard performance.timingit has been abandoned, so we transformed into use performance.getEntriesByType('navigation'). The white screen time here may be different from the actual white screen time of the user's senses, which is for reference only.

By new PerformanceObserverlisteners, we can listen to all the resources ( css, script, img, ajaxetc.) to load the performance data: load time, response size, httpprotocol version ( http1.1/http2) and so on. Then we need to manage resource performance data through an array, and clear the array after completing the data report.

3.1.2 fetch/xmlHttpRequest

Because the browser does not provide a unified request and response data APIthat we can collect ajax, and whether we use axoisor use other httprequest libraries, they are all based fetchand xmlHttpRequestimplemented. Therefore only by rewriting fetchand xmlHttpRequest, and the corresponding logic functions and custom code is inserted, to achieve the collection purpose. There are many related articles, so I won't go into details here.

let _fetch = fetch;
window.fetch = function () {
 //custom code
  return _fetch
    .apply(this, arguments)
    .then((res) => {
     //custom code
      return res;
    })
};
 

3.1.3 window.onerror | unhandledrejection | console.error | and the monitoring function that comes with the framework

Finally, these APIs collect js related error information. Need to pay attention to two issues:

One onerroris that cross-domain scripterrors cannot be obtained , and the solution is very simple: scriptset crossoriginattributes for cross-domain tags , and the static server needs to set CORSresponse headers for the current resource .

2. the error message after code compression needs to be sourceMapparsed through the file to find out the ranks and error messages corresponding to the source code. It sourceMapis a data structure that stores the relational data between the source code and the compressed code, and can be easily converted through the parsing library. But how to automate the management and operation of sourceMapfiles is the core problem that needs to be solved in the front-end monitoring system. Here we need to combine the internal static resource publishing system and front-end monitoring system to solve the problem of inefficient manual packaging and uploading.

3.2 WeChat Mini Program

Wechat Mini Program jshas its own set of life cycle and provides a global view API. By rewriting part of its global functions and related APIwe can obtain: network requests, error information, device and version information, etc. Since the loading process of the WeChat applet is controlled by WeChat APP, jsand other resources are also hosted internally by WeChat, so webunlike, we have no way to obtain the page and resource loading information webthat performancecan be obtained in China (later found that the applet has been in v2.11.0 (2020-04-24) In the (2020-04-24) version, a new API provides performance object indicators, which can be used in the future). The picture below isSDK mainly usedAPI

3.2.1 App and Component

By rewriting the global Appfunction, binding the onErrormethod to listen for errors, and rewriting its onShowmethod to execute SDKthe logic required when the applet is started . By rewriting Componentthe onShowmethod may be performed in our collection path and performs other logic reporting when the page switching assembly.

//SDK 
init(){
    this.appMethod = App;
    this.componentMethod = Component;
    const ctx = this;
   //Component
    Component = (opts) => {
      overrideComponent(opts, ctx);
      ctx.componentMethod(opts);
    };
   //App
    App = (app) => {
      overrideApp(app, ctx);
      ctx.appMethod(app);
    };
}  

//ctx sdk this
overrideComponent(opts, ctx) => {
  const compOnShow = opts.methods.onShow;
  opts.methods.onShow = function(){
   //do something
   //this 
    compOnShow.apply(this, arguments)
  }
})

overrideApp(app, ctx) => {
  const _onError = app.onError || function () {};
  const _onShow = app.onShow || function () {};
  app.onError = function (err) {
    reportError(err, ctx);
    return _onError.apply(this, arguments);
  };
  app.onShow = function () {
   //do something
    return _onShow.apply(this, arguments);
  };
})
 

3.2.2 Rewrite wx.request

And also because here fetch/xmlHttpRequest, as there is no global APIallow us to capture information request, it can only by rewriting wx.requestto achieve monitor the collection function.

const originRequest = wx.request;
const ctx = this;
//wx.request 
Object.defineProperty(wx, 'request', {
  value: function () {
     //sdk code
      const _complete = config.complete || function (data) {};
      config.complete = function (data) {
       //sdk code
        return _complete.apply(this, arguments);
      };

    return originRequest.apply(this, arguments);
  }
})
 

After we have implemented SDKit or in the process of implementation, we need to write test code. Let's talk about writing test cases.

4. write test cases

SDKIt belongs to an independent library that needs long-term maintenance and update. It is used in many business projects and requires more stability. When a problem occurs, its update cost is high. Need to go through: update code -> release new version -> business side update dependent version, wait for the process, and if in this process, if SDKother problems are corrected, the above cycle will be restarted, and business colleagues will definitely be troubled . With the increase of access to monitoring systems, changing any code in the iterative process has made people start to panic, because there are many process-related logics, and they are afraid of correcting problems. In a code refactoring and optimization process, determined to improve the unit test and process test.

4.1 Unit test

Unit test is common for some significant input and output method, such as SDKthe utilsconventional method, SDKthe parameter configuration method. For monitoring SDK, more test codes are mainly focused on process testing, so I won t elaborate on unit testing here.

4.2 Process test

After the monitoring SDKis initialized in the business project, it mainly collects information and uploads it by adding probes to monitor the running status of the business project. In most cases, it does not execute what the business party calls. For example, when our page is loaded SDKfor the first time, we collect and upload relevant information about the first load at a suitable time. Then we need to simulate this process through test code to ensure that the reported data is expected.

We are SDKrunning in a browser environment, and the nodeenvironment does not support Webcorrelation API. Therefore, we need to let our test code run in the browser, or provide related APIsupport. Below we will introduce two different ways to support the normal operation of our test code.

4.2.1 Ways to provide a Web environment

If we use mochaor jestserve as a test framework, we can write and execute our test code in the mochabuilt-in mocha.runmethod html, and open and run it in the browser; jest-liteit can also support jestrunning in the browser.

But sometimes we don't want it to open the browser, and hope that the test code can be run in the terminal. You can use a headless browser to nodeload the browser environment in, for example, phontomjsor puppeteer. They provide related tools, such as being mocha-phantomjsable to run htmlthe test process directly in the terminal .

Based on the written htmltest file, and then use mocha-phantomjsand phantomjs, the following is package.jsonthe command configuration.

scripts:{
    test: mocha-phantomjs -p ./node_modules/.bin/phantomjs/test/unit/index.html
}
 

phontomjsIt has been deprecated and is not recommended for use. Recommended puppeteer, related functions and similar tools are supported.

for example:

WebSocketThis method has been used in previous code bases. Because Api rely Web: WebSocket. Need to pass new WebSocket()to complete the test process, but the nodeenvironment does not have this API. So use writing test cases mochain the htmlmiddle, if you want to use the terminal to run the test throughout, you can also use it together mocha-phantomjsso that the test htmlfile can be executed in the terminal without opening the local web page to run.

Of course, you can open it directly in the browser to htmlview the test run results, and the phantomjsrelated dependency packages are very large and the installation is relatively slow. But at that time we used the continuous inheritance service travis . When our code is updated to the remote warehouse, travismultiple independent containers will be started and our test files will be executed in the terminal. If you don t use mocha-phantomjsthe terminal to run the test, there is no way travisto succeed in it . by.

4.2.2 Mock Web API

In SDKthe process of perfecting the monitoring test, another method was tried, the method used throughout Mock.

The above Webenvironment operation mode requires a browser or a headless browser. But the actual code we need to test is not Web API, we just use them. We assume that they are stable, we only need to care about its input and output, if they are internally out bug, we can not control it, that is the matter of the browser developer. So what I have to do is just nodeto simulate the relevant in the environment Web API.

Take the WebSocketexample mentioned earlier , because nodeChina does not support it WebSocket, we have no way new WebSocket. That if there is full simulation WebSocketof the tripartite nodelibrary, we can nodecode directly to the execution environment support WebSocket: const WebSocket = require('WebSocket'). This way we don't need to run in a browser or headless browser environment.

Here's our take specific monitoring SDKof fetch, for example, it is how to simulate the flow test, in general to support the following three content,

  1. Start a httpserver service to provide interface services
  2. Introduce a tripartite library and let node support fetch
  3. Manually simulate part of the performance API in node

1. explain SDKin fetchthe normal process, when our SDKafter initialization in business projects, SDKrewritten fetch, so the project really use business fetchto do business interface when requested, SDKwill be able to get through before logical overwrite httprequest and response information , And will also passperformancefetch requested performance information obtained and reported. The test code we want to write is to verify that this process can be completed smoothly.

(1) http server

Because it is to verify the fetchcomplete process, we need to start a httpserverservice and provide an interface to receive and respond this timefetch request.

(2) mock fetch

nodeIf supported fetchin the environment , we can directly use the tripartite library node-fetch . At the top of the execution environment, we can define it in advance fetch.

/** MockFetch.js */
import fetch from 'node-fetch';
window = {};
window.fetch = fetch;
global.fetch = fetch;
 
(3) mock performance

But performanceit is more special, and no three-party library can support it. For the fetchprocess, if we want to simulate performance, we only need to simulate what we use, PerformanceObserverand even some input and return we can only simulate what we need. The following code is PerformanceObserveran example of its use. In SDK, we mainly use this piece of code.

/** PerformanceObserver   */
var observer = new PerformanceObserver(function(list, obj) {
  var entries = list.getEntriesByType('resource');
  for (var i=0; i < entries.length; i++) {
   //Process "resource" events
  }
});
observer.observe({entryTypes: ['resource']});
 

The performancebottom layer inside the browser will automatically monitor resource requests, and we just provide it PerformanceObserverto collect its data. Essentially, the actively collected behavior probes performanceare implemented internally.

Below we simulate PerformanceObserverpart of the function to support the testing process we need. Defined window.PerformanceObserveras a constructor, add the incoming method parameters fnto the array. mockPerformanceEntriesAddIt is the method we need to call manually. When we initiate it once fetch, we will manually call this method and pass the mockdata to the registered listener function, so that PerformanceObserverthe instance can receive our mockdata and simulate the browser performanceInternal behavior.

/** MockPerformance.js */
let observerCallbacks = [];
//PerformanceObserver 
window.PerformanceObserver = function (fn) {
  this.observe = function () {};
  observerCallbacks.push(fn);
};

//performance 
window.mockPerformanceEntriesAdd = (resource) => {
  observerCallbacks.forEach((cb) => {
    cb({
      getEntriesByType() {
        return [resource];
      },
    });
  });
};
 

In popular terms, for example, if the No. 10 company wants to pay wages to workers bank cards, the mortgage will be deducted the next day from the workers wage bank cards. The protection workers are most concerned about is the normal deduction of mortgages, otherwise it will affect the credit investigation. Originally, the workers only need to pay attention to whether the bank successfully completes the deduction, but the worker recently lost his job and the company will not send the payment to the salary card, so he can only use his savings card to transfer money to his deducted bank card so that subsequent banks can deduct money. Money to repay the mortgage. The company is the performancebottom layer of the browser. When the worker transfers money to himself mockPerformanceEntriesAdd, he replaces the company's salary to the bank card with the money transferred by himself, changing from passive reception to active execution. Fine taste, fine taste~

mockPerformanceEntriesAddIt is to simulate the active behavior of the browser, the input parameters are performance information, we can directly write down (below mockData). Look at the test code

/** test/fetch.js */
import 'MockFetch.js';
import 'MockPerformance.js';
import webReportSdk from '../dist/monitorSDK';
//sdk sdk fetch
const monitor = webReportSdk({
  appId: 'appid_test',
});
const mockData = {
    name: 'http://localhost:xx/api/getData',
    entryType: 'resource',
    startTime: 90427.23999964073,
    duration: 272.06500014290214,
    initiatorType: 'fetch',
    nextHopProtocol: 'h2',
    ...
}
test('web api: fetch', () => {
 //GET
  const requestAddress = mockData.name;
  fetch(requestAddress, {
    method: 'GET',
  });

 //performace 
  window.mockPerformanceEntriesAdd(mockData);
})
 

When mockPerformanceEntriesAddexecuted, the SDKinternal PerformanceObserverperformance information of the mock can be collected. (Note here, we also need to start a httpserverservice, the service provideshttp://localhost:xx/api/getData interface)

When the above operation of test code, SDKcan obtain the address of http://localhost:xx/api/getDatathe fetchrequest, the response and performance information, and SDKalso sends a fetchrequest to report the data collected backend service. We can rewrite it again window.fetchto intercept SDKthe report request, and then we can get the request content, and use the request content to do the expected test judgment

//fetch 
const monitorFetch = window.fetch;
let reportData;
window.fetch = function () {
 //sdk type SDK 
  if (arguments[1] && arguments[1].type === 'report-data') {
   //
    reportData = JSON.parse(arguments[1].body);
    return Promise.resolve();
  }
  return monitorFetch.apply(this, arguments);
};

//

expect(reportData.resourceList[0].name).toEqual(mockData.name);


 

Combined test code

/** test/fetch.js */
import 'MockFetch.js';
import 'MockPerformance.js';
import webReportSdk from '../dist/monitorSDK';
//sdk sdk fetch
const monitor = webReportSdk({
  appId: 'appid_test',
});

//fetch 
const monitorFetch = window.fetch;
let reportData;
window.fetch = function () {
 //sdk type SDK 
  if (arguments[1] && arguments[1].type === 'report-data') {
   //
    reportData = JSON.parse(arguments[1].body);
    return Promise.resolve();
  }
  return monitorFetch.apply(this, arguments);
};

const mockData = {
    name: 'xxx.com/api/getData',
    entryType: 'resource',
    startTime: 90427.23999964073,
    duration: 272.06500014290214,
    initiatorType: 'fetch',
    nextHopProtocol: 'h2',
    ...
}
test('web api: fetch', (done) => {
 //GET
  const requestAddress = mockData.name;
  fetch(requestAddress, {
    method: 'GET',
  });

 //performace 
  window.mockPerformanceEntriesAdd(mockData);
  
 //
  setTimeout(()=>{
  	expect(reportData.resourceList[0].name).toEqual(mockData.name);
  	//more expect...
    done()
  },3000)
})
 

As shown in the figure above, we mainly conduct SDKprocess testing and code writing in this mode . With the test code, the stability and controllability of the code maintenance iteration process can be guaranteed to a large extent, and a lot of later test costs can also be saved.

V. Conclusion

The above sharing is the SDKthree core aspects of our monitoring , and there are many other details and implementations, such as: how to throttle, report timing, data merging, initial configuration, etc. In the iterative process of development, it is necessary to avoid SDKcompatibility problems caused by the iteration of the client or back-end services. It is also more important to consider the needs of later database query and storage. Only collection, storage and query can completely constitute this front-end monitoring system.

-End-

Pay attention to the public account of great poets and get the latest articles as soon as possible.