Summary of Java off-heap memory troubleshooting

Summary of Java off-heap memory troubleshooting

Introduction

JVMOff-heap memory is difficult to troubleshoot but problems often occur. This may be the most comprehensive JVM off-heap memory troubleshooter at present.

Through this article, you should know:

  • pmap command
  • gdb command
  • perf command
  • The difference between memory RSS and VSZ
  • java NMT

cause

I have encountered a strange problem these days, and I feel it is necessary to share it with you. One of our services runs on dockerit. After a certain version, the occupied memory starts to grow until the dockerallocated memory limit, but it does not OOM. The version changes are as follows:

  • Upgraded the version of the basic software
  • Extend docker's memory limit from 4GB to 8GB
  • A change from the previous version is the use of EhCache's Heap cache
  • No file read, no mmap operation

Use to jpsview the startup parameters and found that about 3GB of heap memory is allocated

[root]$ jps -v
75 Bootstrap -Xmx3000m -Xms3000m  -verbose:gc -Xloggc:/home/logs/gc.log -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSCompactAtFullCollection -XX:MaxTenuringThreshold=10 -XX:MaxPermSize=128M -XX:SurvivorRatio=3 -XX:NewRatio=2 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 

Use ps to view the memory and virtual memory used by the process (Linux memory management). In addition to the relatively high virtual memory 17GB, the actual memory RSS actually used has reached an exaggerated 7GB, far exceeding the -Xmxsetting.

[root]$ ps -p 75 -o rss,vsz  

RSS    VSZ 7152568 17485844
 

Investigation process

Obviously, there is the use of off-heap memory, which is unlikely to be EhCachecaused (because we use the heap method). It is understood that the upgrade of the basic software involves the upgrade of the netty version, and netty will use some DirectByteBuffer. In the first round of investigation, we use the following methods:

  • jmap -dump:format=b,file=75.dump 75DirectByteBufferReference and size found by analyzing heap memory
  • Deploy a version before upgrading the basic software and continue to observe
  • Deploy another version, change EhCache to limit its size to 1024M
  • Considering that it may be caused by Docker's memory allocation mechanism, deploy an instance to the physical machine

As a result, the services in the four environments, without exception, had the problem of memory overuse. The problem is strange, the baby can't sleep.

pmap

In order to further analyze the problem, we use pmap to view the memory allocation of the process, sorted by RSS in ascending order. It turns out that in addition 000000073c800000to the 3GB heap allocated on the address , there are a very large number of 64M memory segments, and a huge number of small physical memory blocks are mapped to different virtual memory segments. But up to now, we don't know what the content is inside, and what is produced through it.

[root]$ pmap -x 75  | sort -n -k3

..... N 

0000000040626000   55488   55484   55484 rwx--    [ anon ]

00007fa07c000000   65536   55820   55820 rwx--    [ anon ]

00007fa044000000   65536   55896   55896 rwx--    [ anon ]

00007fa0c0000000   65536   56304   56304 rwx--    [ anon ]

00007f9db8000000   65536   56360   56360 rwx--    [ anon ]

00007fa0b8000000   65536   56836   56836 rwx--    [ anon ]

00007fa084000000   65536   57916   57916 rwx--    [ anon ]

00007f9ec4000000   65532   59752   59752 rwx--    [ anon ]

00007fa008000000   65536   60012   60012 rwx--    [ anon ]

00007f9e58000000   65536   61608   61608 rwx--    [ anon ]

00007f9f18000000   65532   61732   61732 rwx--    [ anon ]

00007fa018000000   65532   61928   61928 rwx--    [ anon ]

00007fa088000000   65536   62336   62336 rwx--    [ anon ]

00007fa020000000   65536   62428   62428 rwx--    [ anon ]

00007f9e44000000   65536   64352   64352 rwx--    [ anon ]

00007f9ec0000000   65528   64928   64928 rwx--    [ anon ]

00007fa050000000   65532   65424   65424 rwx--    [ anon ]

00007f9e08000000   65512   65472   65472 rwx--    [ anon ]

00007f9de0000000   65524   65512   65512 rwx--    [ anon ]

00007f9dec000000   65532   65532   65532 rwx--    [ anon ]

00007f9dac000000   65536   65536   65536 rwx--    [ anon ]

00007f9dc8000000   65536   65536   65536 rwx--    [ anon ]

00007f9e30000000   65536   65536   65536 rwx--    [ anon ]

00007f9eb4000000   65536   65536   65536 rwx--    [ anon ]

00007fa030000000   65536   65536   65536 rwx--    [ anon ]

00007fa0b0000000   65536   65536   65536 rwx--    [ anon ]

000000073c800000 3119140 2488596 2487228 rwx--    [ anon ]

total kB        17629516 7384476 7377520
 

Through google, find the following information Linux glibc >= 2.10 (RHEL 6) malloc may show excessive virtual memory usage)

The article pointed out that the reason for the large number of applications for 64M large memory blocks is caused by a version upgrade of Glibc, which export MALLOC_ARENA_MAX=4can solve the problem of excessive VSZ occupation. Although this is also a problem, it is not what we want, because we are growing physical memory, not virtual memory.

NMT

Fortunately, JDK1.8 Native Memory Trackercan help positioning. -XX:NativeMemoryTracking=detailIt can be enabled by adding it to the startup parameter . Execute jcmd on the command line to view the memory allocation.

#jcmd 75 VM.native_memory summary

Native Memory Tracking: Total: reserved=5074027KB, committed=3798707KB -                 Java Heap (reserved=3072000KB, committed=3072000KB)                            (mmap: reserved=3072000KB, committed=3072000KB) -                     Class (reserved=1075949KB, committed=28973KB)                            (classes #4819)                            (malloc=749KB #13158)                            (mmap: reserved=1075200KB, committed=28224KB) -                    Thread (reserved=484222KB, committed=484222KB)                            (thread #470)                            (stack: reserved=482132KB, committed=482132KB)                            (malloc=1541KB #2371)                            (arena=550KB #938) -                      Code (reserved=253414KB, committed=25070KB)                            (malloc=3814KB #5593)                            (mmap: reserved=249600KB, committed=21256KB) -                        GC (reserved=64102KB, committed=64102KB)                            (malloc=54094KB #255)                            (mmap: reserved=10008KB, committed=10008KB) -                  Compiler (reserved=542KB, committed=542KB)                            (malloc=411KB #543)                            (arena=131KB #3) -                  Internal (reserved=50582KB, committed=50582KB)                            (malloc=50550KB #13713)                            (mmap: reserved=32KB, committed=32KB) -                    Symbol (reserved=6384KB, committed=6384KB)                            (malloc=4266KB #31727)                            (arena=2118KB #1) -    Native Memory Tracking (reserved=1325KB, committed=1325KB)                            (malloc=208KB #3083)                            (tracking overhead=1117KB) -               Arena Chunk (reserved=231KB, committed=231KB)                            (malloc=231KB) -                   Unknown (reserved=65276KB, committed=65276KB)                            (mmap: reserved=65276KB, committed=65276KB)
 

Although the memory address obtained by pmap is roughly NMTthe same, there are still many memory destinations that are still a mystery. Although it is a good tool, the problem cannot be solved.

gdb

I am very curious about what is in the 64M or other small memory block, and then gdbdump it out. Reading /procthe maps file in the directory can accurately know the memory distribution of the current process.

The following script can dump all the associated memory into a file by passing in the process id (it will affect the service, use it with caution).

grep rw-p/proc/$1/maps | sed -n 's/^\([0-9a-f]*\)-\([0-9a-f]*\) .*$/\1/2/p' | while read start stop; do gdb --batch --pid $1 -ex "dump memory $1-$start-$stop.dump 0x$start 0x$stop"; done
 

More often, it is recommended to dump part of the memory. (Remind again that the operation will affect the service, pay attention to the size of the dump memory block, and use it with caution).

gdb --batch --pid 75 -ex "dump memory a.dump 0x7f2bceda1000 0x7f2bcef2b000
 
[root]$ du -h *
dump 4.0K
55-00600000-00601000.dump 400K
55-00eb7000-00f1b000.dump 0
55-704800000-7c0352000.dump 47M
55-7f2840000000-7f2842eb8000.dump 53M
55-7f2848000000-7f284b467000.dump 64M
55-7f284c000000-7f284fffa000.dump 64M
55-7f2854000000-7f2857fff000.dump 64M
55-7f285c000000-7f2860000000.dump 64M
55-7f2864000000-7f2867ffd000.dump 1016K
55-7f286a024000-7f286a122000.dump 1016K
55-7f286a62a000-7f286a728000.dump 1016K
55-7f286d559000-7f286d657000.dump
 

It's time to check the contents

[root]$ view 55-7f284c000000-7f284fffa000.dump
^@^@X+^?^@^@^@^@^@d(^?^@^@^@  ^C^@^@^@^@^@  ^C^@^@^@^@^@^@^@^@^@^@^@^@ <97>p^C^@^@^@^@ 8^^Z+^?^@^@ ^@^@d(^?^@^@ 8^^Z+^?^@^@ ^@^@d(^?^@^@
achine":524993642,"timeSecond":1460272569,"inc":2145712868,"new":false},"device":{"client":"android","uid":"xxxxx","version":881},"
device_android":{"BootSerialno":"xxxxx","CpuInfo":"0-7","MacInfo":"2c:5b:b8:b0:d5:10","RAMSize":"4027212","SdcardInfo":"xxxx","Serialno":"xxxx",
"android_id":"488aedba19097476","buildnumber":"KTU84P/1416486236","device_ip":"0.0.0.0","mac":"2c:5b:b8:b0:d5:10","market_source":"12","model":"OPPO ...more
 

Nani? Shouldn't this content be in the heap? Why is additional memory used for allocation? The reasons why netty applies for directbuffer have been investigated above, so where else is the off-heap memory allocated?

perf

Traditional tools are malfunctioning. It's almost time for the donkey to lose their skills. It's time to sacrifice the artifact perf.

use perf record -g -p 55 open stack monitoring function call. After running for a period of time, Ctrl+C ends, and a file perf.data will be generated.

Perform perf report -i perf.dataview report.

As shown in the figure, the process performs a large number of bzip related functions. Search zip, the results are as follows:

-.-!

The process called Java_java_util_zip_Inflater_inflatBytes() to apply for memory, and only a small part called Deflater to release the memory. Compared with the memory address of pmap, bzip is indeed messing with it.


solve

The java project searched zip to locate the code, and found that there was indeed a related bzip compression and decompression operation, and GZIPInputStreamthere was no close in one place.

GZIPInputStream uses Inflater to apply for off-heap memory, Deflater releases the memory, and calls the close() method to release it actively. If you forget to close, the life of the Inflater object will continue to the next GC. During this process, the off-heap memory will keep growing.

Original code:

public byte[] decompress ( byte[] input) throws IOException {
                ByteArrayOutputStream out = new ByteArrayOutputStream();
                IOUtils.copy(new GZIPInputStream(new ByteArrayInputStream(input)), out);
                return out.toByteArray();
            }
 

After modification:

 public byte[] decompress(byte[] input) throws IOException {
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        GZIPInputStream gzip = new GZIPInputStream(new ByteArrayInputStream(input));
        IOUtils.copy(gzip, out);
        gzip.close();
        return out.toByteArray();
    }

 

After observation, the problem was solved.