Read the difference between hashmap, hashtable, concurrenthashmap of JDK7, 8, and 9 in one article

Read the difference between hashmap, hashtable, concurrenthashmap of JDK7, 8, and 9 in one article


The content is as long as the title, it's been written for a long time. Unless otherwise specified, the source code corresponding to the content is jdk1.7 (compared with 1.8 later)

1: Introduction to hashmap (as follows, array-linked list form)

HashMap storage structure

In the figure, the purple part represents the hash table, also known as the hash array (the default array size is 16, each key-value key-value pair is actually stored in the internal class entry of the map), and each element of the array is It is the head node of a singly linked list, and the following green linked list is used to resolve conflicts. If different keys are mapped to the same position in the array, they will be put into the singly linked list using the head interpolation method.

2: The principle of hashmap (that is, the principle of put and get)

2.1 Put principle

1. Obtain the corresponding hash value according to the key: int hash = hash(key.hash.hashcode())

2. According to the hash value and the length of the array, determine the corresponding array citation int i = indexFor(hash, table.length); The simple understanding is that i = hash value% modulo the array length (in fact, bitwise AND operation). If different keys are mapped to the same position in the array, put them in the singly linked list. And the newcomer is placed at the head node.

2.2 get principle

1. Obtain the corresponding array position through hash, and traverse the linked list where the array is located (key.equals())

3.1: The hashcode is the same, what should I do if there is a conflict?

"Head insertion method", put it at the head of the corresponding linked list.

3.2: Why is the head insertion method (why is it designed this way)?

Because the inventor of HashMap believes that the entry inserted later is more likely to be searched, it is placed at the head (because the get() query will traverse the entire linked list).

4.1: What is the default array length of hashmap?

The default is 16

4.2: Why?

The reason for choosing 16 is to serve the hash algorithm that maps from key to index (see below).

5.1: What should I do if the hashmap reaches the default load factor (0.75)?

Automatic double expansion, and recalculate the position of each key-value pair after expansion. And the length must be 16 or a power of 2

5.2: Why do we need a power of 16 or 2?

If it is not a power of 16 or 2, the result of bit operation is not evenly distributed, which obviously does not conform to the principle of uniform distribution of Hash algorithm.

In contrast, the length of 16 or other powers of 2, the value of Length-1 is that all binary digits are all 1. In this case, the result of index is equal to the value of the last few digits of HashCode. As long as the input HashCode itself is evenly distributed, the result of the Hash algorithm is uniform.

6.1: Is hashmap thread safe?

It's not.

6.2: Why?

Because there is no lock

6.3: What problems will it cause during concurrency?

When the hashmap is close to the critical point, if two or more threads perform the put operation at this time, both resize (expansion) and ReHash (recalculate the location of the key), and ReHash may form a linked list ring in the case of concurrency. When executing get, an infinite loop will be triggered, causing 100% CPU problems.

Note: jdk8 has fixed the hashmap problem, and the order in the original linked list is maintained during expansion in jdk8. However, HashMap is still non-concurrently safe. Under concurrency, ConcurrentHashMap must be used.

6.4: How to judge that there is a ring table?

Optimal: First create two pointers A and B (two object references in java), and at the same time point to the head node of this linked list. Then start a big loop. In the body of the loop, let the pointer A move down one node at a time, and let the pointer B move down two nodes at a time, and then compare whether the nodes pointed to by the two pointers are the same. If they are the same, it is judged that the linked list has a ring; if they are different, the next cycle continues.

Understanding example: On a circular track, two athletes start at the same place, one athlete is fast and the other is slow. When the two run for a period of time, the fast athlete will inevitably catch up and overtake the slow athlete again. The reason is simple, because the track is circular.

7: What is the difference between hashmap and hashtable?

The difference between the two is thread efficiency. The default value of the array is the null value. The hashmap is not safe. The key-value is allowed. The hashtable is slightly safer. 11 It is not allowed (throwing an exception).

8.0: The hashmap is insecure and the hashtable performance is low, what should I do?

Use concurrenthashmap to ensure safety and performance.

8.1: What exactly is concurrenthashmap?

The structure of the entire ConcurrentHashMap is as follows:

Understanding: hashmap is composed of entry arrays, while concurrenthashmap is composed of segment arrays. And what is Segment? Segment itself is equivalent to a HashMap.

Like HashMap, Segment contains an array of HashEntry. Each HashEntry in the array is not only a key-value pair, but also the head node of a linked list.

The structure of a single segment is as follows (whether it looks like a hashmap):

How many Segment objects like this are in the ConcurrentHashMap collection? There are 2 to the Nth power, and they are stored together in an array named segments.

It can be said that ConcurrentHashMap is a secondary hash table. Below a total hash table, there are several sub-hash tables. (This analogy understands multiple hashmaps to form a cmap)

8.2: What about his put and get methods?

Put method:

1. Do Hash operation for the input Key to get the hash value.

2. Locate the corresponding Segment object through the hash value

3. Obtain a reentrant lock

4. Through the hash value again, locate the specific position of the array in the Segment.

5. Insert or overwrite the HashEntry object.

6. Release the lock.

Get method:

1. Do Hash operation for the input Key to get the hash value.

2. Locate the corresponding Segment object through the hash value

3. Through the hash value again, locate the specific position of the array in the Segment.

It can be seen that, compared with hashmap, ConcurrentHashMap requires secondary positioning when reading and writing. First locate the segment, and then locate the specific array subscript in the segment.

9: What is the difference between hashmap and concurrenthashmap?

Thread: unsafe and safe

10.1: Why concurrenthashmap and hashtable are both thread-safe, but the former has higher performance

Because the former is a segment lock, the corresponding Segment object is locked according to the hash value. When the hash value is different, it can be inserted in parallel, which is more efficient, while the hashtable will lock the entire map.

How to understand parallel insertion: When cmap needs a put element, it does not lock the entire map, but first knows which segment (Segment object) he wants to put in through the hashcode, and then adds this segment Lock, so when multi-threaded puts, as long as they are not placed in the same segment, true parallel insertion is realized.

However, when counting the size, that is, when obtaining the global information of the concurrenthashmap, it is necessary to obtain all the segment locks for statistics (that is, the efficiency is slightly lower).

10.2: What problem does the design of the segmented lock solve?

The design purpose of the segmented lock is to refine the granularity of the lock. When the operation does not need to update the entire array, it only locks a part of the rows in the array.

11: What is the difference between the hashmap of JDK1.7 and the hashmap of JDK1.8 (that is, what optimizations have been made in 1.8)?

1. In order to speed up query efficiency, the hashmap of java8 introduces a red-black tree structure. When the length of the array is greater than the default threshold of 64, and when the element of a linked list is> 8, the linked list will be converted to a red-black tree structure. higher. (The question is, what is a red-black tree? What is a B+ tree? (mysql index has a B+ tree index) What is a B tree? What is a binary search tree?) The knowledge points of data structure will be updated in [Data Structure Topics , do not expand here.

Here is just a brief introduction to the red-black tree:

The red-black tree is a self-balancing binary tree with excellent query and insert/delete performance and is widely used in associative arrays. Compared with AVL trees, AVL requires that the absolute value of the difference between the height of the left and right subtrees of each node (balance factor) is at most 1, and the red-black tree can lower this condition by appropriately (the red-black tree limits the distance from root to leaf) The longest possible path is not more than twice as long as the shortest possible path, and the result is that the tree is roughly balanced), in order to reduce the time-consuming adjustment of the balance during insert/delete, so as to obtain better performance, and Although this will cause red-black tree queries to be slightly slower than AVL, compared to the time obtained during insert/delete, this effort is obviously worthwhile in most cases. Okay, I know you guys are dizzy, so move on to take a look at my [Data Structure Topic].

2. Optimize the expansion method, keep the original order in the linked list during expansion, and avoid endless loops

12: What is the difference between concurrenthashmap of JDK1.7 and JDK1.8?

The 1.8 implementation has abandoned the Segment lock mechanism, and uses the Node array + CAS + Synchronized to ensure the safety of concurrent updates. The bottom layer uses the storage structure of array + linked list + red-black tree.

Java has brought us ConcurrentHashMap which is concurrency safe. Its implementation relies on the Java memory model, so we must understand some underlying knowledge before understanding ConcurrentHashMap:

java memory model

Unsafe in java

CAS in java

java synchronizer AQS

ReentrantLock

So here I am not going to explain ConcurrentHashMap in depth. I will explain the basics of concurrency step by step in the topic of [Concurrent Programming], from java memory model, synchronized, volatile, Unsafe to CAS, AQS, various locks to JUC concurrent package related.

Let's put a mind map of the java memory model to seduce a wave. There is so much to talk about in a single point of the java memory model.

13: So the question is, what is CAS?

Regarding the knowledge points of CAS, it will also involve ABA issues, and can also involve optimistic locking, pessimistic locking, lock programming, AQS, etc. The relevant content will be updated in [Concurrent Programming Topics], not expanded here

14: What about 1.9?

Glancing at it, it seems to be no different from 1.8, so I won t expand here... (don t slap your face)