Java数据结构之HashMap和HashSet源码分析

发布时间：2023-03-25 10:05:30 作者：iii
来源：亿速云阅读：377

Java数据结构之HashMap和HashSet源码分析

引言

在Java编程中，HashMap和HashSet是两个非常常用的数据结构。它们都基于哈希表实现，具有高效的查找、插入和删除操作。本文将深入分析HashMap和HashSet的源码，探讨它们的实现原理、核心数据结构以及常见操作的具体实现。

HashMap源码分析

2.1 HashMap概述

HashMap是Java集合框架中的一个重要类，它实现了Map接口，提供了键值对的存储和检索功能。HashMap允许null键和null值，并且是非线程安全的。

2.2 HashMap的核心数据结构

HashMap的核心数据结构是一个数组，数组中的每个元素是一个链表或红黑树的头节点。这个数组被称为table，它的每个元素被称为bucket。

transient Node<K,V>[] table;

Node是HashMap中的一个静态内部类，表示键值对的节点：

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }
    public final int hashCode()    { return Objects.hashCode(key) ^ Objects.hashCode(value); }
    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

2.3 HashMap的构造方法

HashMap提供了多个构造方法，最常用的是无参构造方法和指定初始容量的构造方法。

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

DEFAULT_LOAD_FACTOR是默认的负载因子，值为0.75。负载因子决定了HashMap在扩容时的阈值。

2.4 HashMap的put方法

put方法是HashMap中最核心的方法之一，用于将键值对插入到HashMap中。

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

putVal方法的执行流程如下：

检查table是否为空，如果为空则进行初始化。
计算键的哈希值，并找到对应的bucket。
如果bucket为空，则直接插入新节点。
如果bucket不为空，则遍历链表或红黑树，查找是否已经存在相同的键。
如果存在相同的键，则更新值；否则插入新节点。
如果链表长度超过阈值，则将链表转换为红黑树。
如果size超过阈值，则进行扩容。

2.5 HashMap的get方法

get方法用于根据键获取对应的值。

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

getNode方法的执行流程如下：

计算键的哈希值，并找到对应的bucket。
如果bucket为空，则返回null。
如果bucket的第一个节点就是目标节点，则直接返回。
如果bucket的第一个节点不是目标节点，则遍历链表或红黑树，查找目标节点。
如果找到目标节点，则返回其值；否则返回null。

2.6 HashMap的resize方法

resize方法用于扩容HashMap。

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null)
                        loTail.next = null;
                    newTab[j] = loHead;
                    newTab[j + oldCap] = hiHead;
                }
            }
        }
    }
    return newTab;
}

resize方法的执行流程如下：

计算新的容量和阈值。
创建新的table。
将旧table中的节点重新分配到新table中。
返回新table。

2.7 HashMap的remove方法

remove方法用于根据键删除对应的键值对。

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

removeNode方法的执行流程如下：

计算键的哈希值，并找到对应的bucket。
如果bucket为空，则返回null。
如果bucket的第一个节点就是目标节点，则直接删除。
如果bucket的第一个节点不是目标节点，则遍历链表或红黑树，查找目标节点。
如果找到目标节点，则删除并返回其值；否则返回null。

2.8 HashMap的并发问题

HashMap是非线程安全的，在多线程环境下可能会出现数据不一致的问题。为了解决这个问题，可以使用Collections.synchronizedMap方法将HashMap包装成线程安全的Map，或者使用ConcurrentHashMap。

HashSet源码分析

3.1 HashSet概述

HashSet是Java集合框架中的一个重要类，它实现了Set接口，提供了不重复元素的存储和检索功能。HashSet允许null元素，并且是非线程安全的。

3.2 HashSet的核心数据结构

HashSet的核心数据结构是一个HashMap，它使用HashMap来存储元素。

private transient HashMap<E,Object> map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

HashSet中的元素实际上是HashMap中的键，而值则是一个固定的PRESENT对象。

3.3 HashSet的构造方法

HashSet提供了多个构造方法，最常用的是无参构造方法和指定初始容量的构造方法。

public HashSet() {
    map = new HashMap<>();
}

public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}

3.4 HashSet的add方法

add方法用于向HashSet中添加元素。

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

add方法的执行流程如下：

调用HashMap的put方法，将元素作为键，PRESENT作为值插入到HashMap中。
如果put方法返回null，则表示插入成功，返回true；否则返回false。

3.5 HashSet的remove方法

remove方法用于从HashSet中删除元素。

public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
}

remove方法的执行流程如下：

调用HashMap的remove方法，删除指定的键。
如果remove方法返回PRESENT，则表示删除成功，返回true；否则返回false。

3.6 HashSet的contains方法

contains方法用于判断HashSet中是否包含指定的元素。

public boolean contains(Object o) {
    return map.containsKey(o);
}

contains方法的执行流程如下：

调用HashMap的containsKey方法，判断指定的键是否存在。
如果存在，则返回true；否则返回false。

HashMap与HashSet的关系

HashSet是基于HashMap实现的，它使用HashMap来存储元素，并且只使用了HashMap的键部分。因此，HashSet的很多操作实际上都是通过调用HashMap的相应方法来实现的。

总结

HashMap和HashSet是Java中非常常用的数据结构，它们都基于哈希表实现，具有高效的查找、插入和删除操作。通过深入分析它们的源码，我们可以更好地理解它们的实现原理和使用场景。在实际开发中，应根据具体需求选择合适的数据结构，并注意它们的线程安全问题。

Java数据结构之HashMap和HashSet源码分析

Java数据结构之HashMap和HashSet源码分析

目录

引言

HashMap源码分析

2.1 HashMap概述

2.2 HashMap的核心数据结构

2.3 HashMap的构造方法

2.4 HashMap的put方法

2.5 HashMap的get方法

2.6 HashMap的resize方法

2.7 HashMap的remove方法

2.8 HashMap的并发问题

HashSet源码分析

3.1 HashSet概述

3.2 HashSet的核心数据结构

3.3 HashSet的构造方法

3.4 HashSet的add方法

3.5 HashSet的remove方法

3.6 HashSet的contains方法

HashMap与HashSet的关系

总结

相关阅读