HashMap源码分析

发布时间：2022-10-18 15:54:49 作者：iii
来源：亿速云阅读：168

HashMap源码分析

引言

HashMap是Java集合框架中最常用的数据结构之一，它提供了高效的键值对存储和检索功能。HashMap的实现基于哈希表，具有快速的查找、插入和删除操作。本文将深入分析HashMap的源码，探讨其内部实现机制、性能优化策略以及常见问题。

HashMap概述

HashMap是Java集合框架中的一个重要类，它实现了Map接口，提供了键值对的存储和检索功能。HashMap允许null键和null值，并且不保证元素的顺序。HashMap的主要特点包括：

基于哈希表实现，具有快速的查找、插入和删除操作。
允许null键和null值。
不保证元素的顺序。
非线程安全。

HashMap的核心数据结构

HashMap的核心数据结构是一个数组，数组中的每个元素是一个链表或红黑树的头节点。这个数组被称为“桶数组”（bucket array），每个桶对应一个哈希值。当多个键的哈希值相同时，它们会被存储在同一个桶中，形成一个链表或红黑树。

桶数组

桶数组是HashMap的核心数据结构，它是一个Node数组，每个Node包含键、值以及指向下一个节点的引用。桶数组的大小通常是2的幂次方，这样可以方便地通过位运算来计算索引。

transient Node<K,V>[] table;

Node类

Node类是HashMap的内部静态类，用于表示桶数组中的每个节点。它包含键、值以及指向下一个节点的引用。

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }
    public final int hashCode()    { return Objects.hashCode(key) ^ Objects.hashCode(value); }
    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

红黑树

当链表长度超过一定阈值时，HashMap会将链表转换为红黑树，以提高查找效率。红黑树是一种自平衡的二叉查找树，具有较好的查找、插入和删除性能。

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // 父节点
    TreeNode<K,V> left;    // 左子节点
    TreeNode<K,V> right;   // 右子节点
    TreeNode<K,V> prev;    // 前驱节点
    boolean red;          // 颜色标志

    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }

    // 其他方法...
}

HashMap的初始化

HashMap的初始化主要包括以下几个步骤：

设置初始容量和负载因子。
创建桶数组。

构造方法

HashMap提供了多个构造方法，允许用户指定初始容量和负载因子。默认的初始容量为16，默认的负载因子为0.75。

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

tableSizeFor方法

tableSizeFor方法用于计算大于等于给定容量的最小2的幂次方。这个方法通过位运算实现，效率较高。

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

putMapEntries方法

putMapEntries方法用于将另一个Map中的元素放入当前HashMap中。这个方法在构造方法和putAll方法中被调用。

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        if (table == null) { // pre-size
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        else if (s > threshold)
            resize();
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

HashMap的put方法

put方法是HashMap的核心方法之一，用于将键值对插入到HashMap中。put方法的实现主要包括以下几个步骤：

计算键的哈希值。
根据哈希值计算桶的索引。
如果桶为空，则直接插入节点。
如果桶不为空，则遍历链表或红黑树，查找是否已存在相同的键。
如果存在相同的键，则更新值；否则，插入新节点。
如果链表长度超过阈值，则将链表转换为红黑树。
如果桶数组的大小超过阈值，则进行扩容。

put方法源码

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

hash方法

hash方法用于计算键的哈希值。HashMap通过将键的哈希码与高16位进行异或运算，以减少哈希冲突。

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

resize方法

resize方法用于扩容桶数组。当桶数组的大小超过阈值时，HashMap会调用resize方法进行扩容。resize方法的实现主要包括以下几个步骤：

计算新的桶数组大小和阈值。
创建新的桶数组。
将旧桶数组中的元素重新分配到新桶数组中。

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null)
                        loTail.next = null;
                    newTab[j] = loHead;
                    if (hiTail != null)
                        hiTail.next = null;
                    newTab[j + oldCap] = hiHead;
                }
            }
        }
    }
    return newTab;
}

HashMap的get方法

get方法用于根据键查找对应的值。get方法的实现主要包括以下几个步骤：

计算键的哈希值。
根据哈希值计算桶的索引。
遍历链表或红黑树，查找对应的键。
如果找到对应的键，则返回对应的值；否则，返回null。

get方法源码

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

HashMap的remove方法

remove方法用于根据键删除对应的键值对。remove方法的实现主要包括以下几个步骤：

计算键的哈希值。
根据哈希值计算桶的索引。
遍历链表或红黑树，查找对应的键。
如果找到对应的键，则删除对应的节点，并返回对应的值；否则，返回null。

remove方法源码

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

HashMap的并发问题

HashMap是非线程安全的，如果在多线程环境下使用HashMap，可能会导致数据不一致或其他并发问题。常见的并发问题包括：

数据丢失：多个线程同时插入数据时，可能会导致数据丢失。
死循环：在扩容时，多个线程同时操作链表，可能会导致链表形成环，从而引发死循环。

解决方案

为了避免HashMap的并发问题，可以使用以下解决方案：

使用Collections.synchronizedMap方法：将HashMap包装成线程安全的Map。

Map<K,V> m = Collections.synchronizedMap(new HashMap<K,V>());

使用ConcurrentHashMap：ConcurrentHashMap是线程安全的HashMap实现，它通过分段锁机制来提高并发性能。

Map<K,V> m = new ConcurrentHashMap<K,V>();

HashMap的性能优化

为了提高HashMap的性能，可以采取以下优化策略：

选择合适的初始容量和负载因子：根据实际需求选择合适的初始容量和负载因子，可以减少扩容次数，提高性能。
避免频繁的扩容操作：在插入大量数据时，可以预先设置较大的初始容量，避免频繁的扩容操作。
使用合适的哈希函数：良好的哈希函数可以减少哈希冲突，提高查找效率。

HashMap的常见问题

1. HashMap的初始容量和负载因子如何选择？

HashMap的初始容量和负载因子会影响其性能。初始容量过小会导致频繁的扩容操作，初始容量过大会浪费内存。负载因子过高会增加哈希冲突的概率，负载因子过低会增加扩容的频率。通常，初始容量可以设置为预计元素数量的1.5倍，负载因子可以设置为默认值0.75。

2. HashMap的扩容机制是什么？

HashMap的扩容机制是通过resize方法实现的。当桶数组的大小超过阈值时，HashMap会调用resize方法进行扩容。扩容时，桶数组的大小会变为原来的两倍，阈值也会变为原来的两倍。

3. HashMap如何处理哈希冲突？

HashMap通过链表和红黑树来处理哈希冲突。当多个键的哈希值相同时

HashMap源码分析

HashMap源码分析

目录

引言

HashMap概述

HashMap的核心数据结构

桶数组

Node类

红黑树

HashMap的初始化

构造方法

tableSizeFor方法

putMapEntries方法

HashMap的put方法

put方法源码

hash方法

resize方法

HashMap的get方法

get方法源码

HashMap的remove方法

remove方法源码

HashMap的并发问题

解决方案

HashMap的性能优化

HashMap的常见问题

1. HashMap的初始容量和负载因子如何选择？

2. HashMap的扩容机制是什么？

3. HashMap如何处理哈希冲突？

相关阅读