941 lines
34 KiB
Diff
941 lines
34 KiB
Diff
From mboxrd@z Thu Jan 1 00:00:00 1970
|
|
Return-Path: <linux-kernel-owner@kernel.org>
|
|
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
|
|
aws-us-west-2-korg-lkml-1.web.codeaurora.org
|
|
X-Spam-Level:
|
|
X-Spam-Status: No, score=-26.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
|
|
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
|
|
INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,
|
|
USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no
|
|
version=3.4.0
|
|
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
|
|
by smtp.lore.kernel.org (Postfix) with ESMTP id EF4FEC43462
|
|
for <linux-kernel@archiver.kernel.org>; Tue, 13 Apr 2021 06:57:18 +0000 (UTC)
|
|
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
|
|
by mail.kernel.org (Postfix) with ESMTP id CFA6161278
|
|
for <linux-kernel@archiver.kernel.org>; Tue, 13 Apr 2021 06:57:18 +0000 (UTC)
|
|
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
|
|
id S1345133AbhDMG5g (ORCPT
|
|
<rfc822;linux-kernel@archiver.kernel.org>);
|
|
Tue, 13 Apr 2021 02:57:36 -0400
|
|
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44204 "EHLO
|
|
lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
|
|
with ESMTP id S1345075AbhDMG5O (ORCPT
|
|
<rfc822;linux-kernel@vger.kernel.org>);
|
|
Tue, 13 Apr 2021 02:57:14 -0400
|
|
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49])
|
|
by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1B27C061342
|
|
for <linux-kernel@vger.kernel.org>; Mon, 12 Apr 2021 23:56:54 -0700 (PDT)
|
|
Received: by mail-yb1-xb49.google.com with SMTP id g7so15243258ybm.13
|
|
for <linux-kernel@vger.kernel.org>; Mon, 12 Apr 2021 23:56:54 -0700 (PDT)
|
|
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
|
|
d=google.com; s=20161025;
|
|
h=date:in-reply-to:message-id:mime-version:references:subject:from:to
|
|
:cc;
|
|
bh=o5Jou7hUitprbLWSkwF9m0rzlQtpjYePVUNvL8744B4=;
|
|
b=j0OnRRuICsaUkKDFgMmxVB6XdLNdlw7bkERy4WEKt8hjBSvD+Kp0+iOIcFy8N7824S
|
|
fiIZT/4kse0kGwqLNz6aT5fmfZX9JxxYEdOVwlR/Ws0MZO827eTQkQKIlfbqh7xkc4GT
|
|
TA7uVRsWqbOXCZgWt9zOAQjOZb/rs2P9QMKUlOFvfucJY2YuTWnwAyhKKGoanMVjppPe
|
|
XiDsyf+xl36l8HZCKTFf1nC3jlDQYELifqMsU7LnJQvyp4qL2Ghw5qGYALRz1HLWn1HT
|
|
nDo94se9xqkySvHWr7K7F6f3bxkPeLasd/CUo3jf80RHfUmgLwPgfJh9UGJtXbKnz7fZ
|
|
QiIQ==
|
|
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
|
|
d=1e100.net; s=20161025;
|
|
h=x-gm-message-state:date:in-reply-to:message-id:mime-version
|
|
:references:subject:from:to:cc;
|
|
bh=o5Jou7hUitprbLWSkwF9m0rzlQtpjYePVUNvL8744B4=;
|
|
b=GyMzG4Y9CRlIQTVJmAqzu40iDf9Ip5RESHdeLQAYm+tiJUh2RGVBJa6vKg38UMcgXC
|
|
EphRx2fv2WzLbuzG3KYV63fQ6mVN44J7Q5DZllmGANTY0ulI4ONN6upN04OPR+6Py8nD
|
|
thVg9bECRFbbKis2TNfSLXbGoO0/p8IfhjTpTAY+/gcDlXuuEwdN42+F5w+mKC73Ybd4
|
|
YzMfYRrVWHdmd49KirIiJ2yKVwsTTFfOgJlsRhMjIxnKiDO88ZiQPXOhSThi9Pq3d4xZ
|
|
AKWIylGhQNKmESlmvpmEzuo3lhpofz6NtP61MD5kogRHKN8cOrfEwHfr81CTzg1JSAjQ
|
|
d+PQ==
|
|
X-Gm-Message-State: AOAM530BBghVYsHEGPHYaVOEjeRU+Fi6DhCLAJz+E/4KNkH046B//NxP
|
|
jRpr98Lw0DozCkFBmdQ3Y2SqfxcTm/k=
|
|
X-Google-Smtp-Source: ABdhPJw4gIvDWjMb3eWqmdPfHBjM8mpzIQ6uMlcwopqsTVyafHAw8KFn3kdXyj3+PrOeIymH0kmLZduE+GQ=
|
|
X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:d02d:cccc:9ebe:9fe9])
|
|
(user=yuzhao job=sendgmr) by 2002:a5b:f51:: with SMTP id y17mr7630772ybr.398.1618297013927;
|
|
Mon, 12 Apr 2021 23:56:53 -0700 (PDT)
|
|
Date: Tue, 13 Apr 2021 00:56:26 -0600
|
|
In-Reply-To: <20210413065633.2782273-1-yuzhao@google.com>
|
|
Message-Id: <20210413065633.2782273-10-yuzhao@google.com>
|
|
Mime-Version: 1.0
|
|
References: <20210413065633.2782273-1-yuzhao@google.com>
|
|
X-Mailer: git-send-email 2.31.1.295.g9ea45b61b8-goog
|
|
Subject: [PATCH v2 09/16] mm: multigenerational lru: activation
|
|
From: Yu Zhao <yuzhao@google.com>
|
|
To: linux-mm@kvack.org
|
|
Cc: Alex Shi <alexs@kernel.org>, Andi Kleen <ak@linux.intel.com>,
|
|
Andrew Morton <akpm@linux-foundation.org>,
|
|
Benjamin Manes <ben.manes@gmail.com>,
|
|
Dave Chinner <david@fromorbit.com>,
|
|
Dave Hansen <dave.hansen@linux.intel.com>,
|
|
Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
|
|
Johannes Weiner <hannes@cmpxchg.org>,
|
|
Jonathan Corbet <corbet@lwn.net>,
|
|
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
|
|
Matthew Wilcox <willy@infradead.org>,
|
|
Mel Gorman <mgorman@suse.de>,
|
|
Miaohe Lin <linmiaohe@huawei.com>,
|
|
Michael Larabel <michael@michaellarabel.com>,
|
|
Michal Hocko <mhocko@suse.com>,
|
|
Michel Lespinasse <michel@lespinasse.org>,
|
|
Rik van Riel <riel@surriel.com>,
|
|
Roman Gushchin <guro@fb.com>,
|
|
Rong Chen <rong.a.chen@intel.com>,
|
|
SeongJae Park <sjpark@amazon.de>,
|
|
Tim Chen <tim.c.chen@linux.intel.com>,
|
|
Vlastimil Babka <vbabka@suse.cz>,
|
|
Yang Shi <shy828301@gmail.com>,
|
|
Ying Huang <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
|
|
linux-kernel@vger.kernel.org, lkp@lists.01.org,
|
|
page-reclaim@google.com, Yu Zhao <yuzhao@google.com>
|
|
Content-Type: text/plain; charset="UTF-8"
|
|
Precedence: bulk
|
|
List-ID: <linux-kernel.vger.kernel.org>
|
|
X-Mailing-List: linux-kernel@vger.kernel.org
|
|
Archived-At: <https://lore.kernel.org/lkml/20210413065633.2782273-10-yuzhao@google.com/>
|
|
List-Archive: <https://lore.kernel.org/lkml/>
|
|
List-Post: <mailto:linux-kernel@vger.kernel.org>
|
|
|
|
For pages accessed multiple times via file descriptors, instead of
|
|
activating them upon the second accesses, we activate them based on
|
|
the refault rates of their tiers. Pages accessed N times via file
|
|
descriptors belong to tier order_base_2(N). Pages from tier 0, i.e.,
|
|
those read ahead, accessed once via file descriptors and accessed only
|
|
via page tables, are evicted regardless of the refault rate. Pages
|
|
from other tiers will be moved to the next generation, i.e.,
|
|
activated, if the refault rates of their tiers are higher than that of
|
|
tier 0. Each generation contains at most MAX_NR_TIERS tiers, and they
|
|
require additional MAX_NR_TIERS-2 bits in page->flags. This feedback
|
|
model has a few advantages over the current feedforward model:
|
|
1) It has a negligible overhead in the access path because
|
|
activations are done in the reclaim path.
|
|
2) It takes mapped pages into account and avoids overprotecting
|
|
pages accessed multiple times via file descriptors.
|
|
3) More tiers offer better protection to pages accessed more than
|
|
twice when buffered-I/O-intensive workloads are under memory
|
|
pressure.
|
|
|
|
For pages mapped upon page faults, the accessed bit is set and they
|
|
must be properly aged. We add them to the per-zone lists index by
|
|
max_seq, i.e., the youngest generation. For pages not in page cache
|
|
or swap cache, this can be done easily in the page fault path: we
|
|
rename lru_cache_add_inactive_or_unevictable() to
|
|
lru_cache_add_page_vma() and add a new parameter, which is set to true
|
|
for pages mapped upon page faults. For pages in page cache or swap
|
|
cache, we cannot differentiate the page fault path from the read ahead
|
|
path at the time we call lru_cache_add() in add_to_page_cache_lru()
|
|
and __read_swap_cache_async(). So we add a new function
|
|
lru_gen_activation(), which is essentially activate_page(), to move
|
|
pages to the per-zone lists indexed by max_seq at a later time.
|
|
Hopefully we would find those pages in lru_pvecs.lru_add and simply
|
|
set PageActive() on them without having to actually move them.
|
|
|
|
Finally, we need to be compatible with the existing notion of active
|
|
and inactive. We cannot use PageActive() because it is not set on
|
|
active pages unless they are isolated, in order to spare the aging the
|
|
trouble of clearing it when an active generation becomes inactive. A
|
|
new function page_is_active() compares the generation number of a page
|
|
with max_seq and max_seq-1 (modulo MAX_NR_GENS), which are considered
|
|
active and protected from the eviction. Other generations, which may
|
|
or may not exist, are considered inactive.
|
|
|
|
Signed-off-by: Yu Zhao <yuzhao@google.com>
|
|
---
|
|
fs/proc/task_mmu.c | 3 +-
|
|
include/linux/mm_inline.h | 101 +++++++++++++++++++++
|
|
include/linux/swap.h | 4 +-
|
|
kernel/events/uprobes.c | 2 +-
|
|
mm/huge_memory.c | 2 +-
|
|
mm/khugepaged.c | 2 +-
|
|
mm/memory.c | 14 +--
|
|
mm/migrate.c | 2 +-
|
|
mm/swap.c | 26 +++---
|
|
mm/swapfile.c | 2 +-
|
|
mm/userfaultfd.c | 2 +-
|
|
mm/vmscan.c | 91 ++++++++++++++++++-
|
|
mm/workingset.c | 179 +++++++++++++++++++++++++++++++-------
|
|
13 files changed, 371 insertions(+), 59 deletions(-)
|
|
|
|
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
|
|
index e862cab69583..d292f20c4e3d 100644
|
|
--- a/fs/proc/task_mmu.c
|
|
+++ b/fs/proc/task_mmu.c
|
|
@@ -19,6 +19,7 @@
|
|
#include <linux/shmem_fs.h>
|
|
#include <linux/uaccess.h>
|
|
#include <linux/pkeys.h>
|
|
+#include <linux/mm_inline.h>
|
|
|
|
#include <asm/elf.h>
|
|
#include <asm/tlb.h>
|
|
@@ -1718,7 +1719,7 @@ static void gather_stats(struct page *page, struct numa_maps *md, int pte_dirty,
|
|
if (PageSwapCache(page))
|
|
md->swapcache += nr_pages;
|
|
|
|
- if (PageActive(page) || PageUnevictable(page))
|
|
+ if (PageUnevictable(page) || page_is_active(compound_head(page), NULL))
|
|
md->active += nr_pages;
|
|
|
|
if (PageWriteback(page))
|
|
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
|
|
index 2bf910eb3dd7..5eb4b12972ec 100644
|
|
--- a/include/linux/mm_inline.h
|
|
+++ b/include/linux/mm_inline.h
|
|
@@ -95,6 +95,12 @@ static inline int lru_gen_from_seq(unsigned long seq)
|
|
return seq % MAX_NR_GENS;
|
|
}
|
|
|
|
+/* Convert the level of usage to a tier. See the comment on MAX_NR_TIERS. */
|
|
+static inline int lru_tier_from_usage(int usage)
|
|
+{
|
|
+ return order_base_2(usage + 1);
|
|
+}
|
|
+
|
|
/* Return a proper index regardless whether we keep a full history of stats. */
|
|
static inline int sid_from_seq_or_gen(int seq_or_gen)
|
|
{
|
|
@@ -238,12 +244,93 @@ static inline bool lru_gen_deletion(struct page *page, struct lruvec *lruvec)
|
|
return true;
|
|
}
|
|
|
|
+/* Activate a page from page cache or swap cache after it's mapped. */
|
|
+static inline void lru_gen_activation(struct page *page, struct vm_area_struct *vma)
|
|
+{
|
|
+ if (!lru_gen_enabled())
|
|
+ return;
|
|
+
|
|
+ if (PageActive(page) || PageUnevictable(page) || vma_is_dax(vma) ||
|
|
+ (vma->vm_flags & (VM_LOCKED | VM_SPECIAL)))
|
|
+ return;
|
|
+ /*
|
|
+ * TODO: pass vm_fault to add_to_page_cache_lru() and
|
|
+ * __read_swap_cache_async() so they can activate pages directly when in
|
|
+ * the page fault path.
|
|
+ */
|
|
+ activate_page(page);
|
|
+}
|
|
+
|
|
/* Return -1 when a page is not on a list of the multigenerational lru. */
|
|
static inline int page_lru_gen(struct page *page)
|
|
{
|
|
return ((READ_ONCE(page->flags) & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
|
|
}
|
|
|
|
+/* This function works regardless whether the multigenerational lru is enabled. */
|
|
+static inline bool page_is_active(struct page *page, struct lruvec *lruvec)
|
|
+{
|
|
+ struct mem_cgroup *memcg;
|
|
+ int gen = page_lru_gen(page);
|
|
+ bool active = false;
|
|
+
|
|
+ VM_BUG_ON_PAGE(PageTail(page), page);
|
|
+
|
|
+ if (gen < 0)
|
|
+ return PageActive(page);
|
|
+
|
|
+ if (lruvec) {
|
|
+ VM_BUG_ON_PAGE(PageUnevictable(page), page);
|
|
+ VM_BUG_ON_PAGE(PageActive(page), page);
|
|
+ lockdep_assert_held(&lruvec->lru_lock);
|
|
+
|
|
+ return lru_gen_is_active(lruvec, gen);
|
|
+ }
|
|
+
|
|
+ rcu_read_lock();
|
|
+
|
|
+ memcg = page_memcg_rcu(page);
|
|
+ lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
|
|
+ active = lru_gen_is_active(lruvec, gen);
|
|
+
|
|
+ rcu_read_unlock();
|
|
+
|
|
+ return active;
|
|
+}
|
|
+
|
|
+/* Return the level of usage of a page. See the comment on MAX_NR_TIERS. */
|
|
+static inline int page_tier_usage(struct page *page)
|
|
+{
|
|
+ unsigned long flags = READ_ONCE(page->flags);
|
|
+
|
|
+ return flags & BIT(PG_workingset) ?
|
|
+ ((flags & LRU_USAGE_MASK) >> LRU_USAGE_PGOFF) + 1 : 0;
|
|
+}
|
|
+
|
|
+/* Increment the usage counter after a page is accessed via file descriptors. */
|
|
+static inline bool page_inc_usage(struct page *page)
|
|
+{
|
|
+ unsigned long old_flags, new_flags;
|
|
+
|
|
+ if (!lru_gen_enabled())
|
|
+ return PageActive(page);
|
|
+
|
|
+ do {
|
|
+ old_flags = READ_ONCE(page->flags);
|
|
+
|
|
+ if (!(old_flags & BIT(PG_workingset)))
|
|
+ new_flags = old_flags | BIT(PG_workingset);
|
|
+ else
|
|
+ new_flags = (old_flags & ~LRU_USAGE_MASK) | min(LRU_USAGE_MASK,
|
|
+ (old_flags & LRU_USAGE_MASK) + BIT(LRU_USAGE_PGOFF));
|
|
+
|
|
+ if (old_flags == new_flags)
|
|
+ break;
|
|
+ } while (cmpxchg(&page->flags, old_flags, new_flags) != old_flags);
|
|
+
|
|
+ return true;
|
|
+}
|
|
+
|
|
#else /* CONFIG_LRU_GEN */
|
|
|
|
static inline bool lru_gen_enabled(void)
|
|
@@ -261,6 +348,20 @@ static inline bool lru_gen_deletion(struct page *page, struct lruvec *lruvec)
|
|
return false;
|
|
}
|
|
|
|
+static inline void lru_gen_activation(struct page *page, struct vm_area_struct *vma)
|
|
+{
|
|
+}
|
|
+
|
|
+static inline bool page_is_active(struct page *page, struct lruvec *lruvec)
|
|
+{
|
|
+ return PageActive(page);
|
|
+}
|
|
+
|
|
+static inline bool page_inc_usage(struct page *page)
|
|
+{
|
|
+ return PageActive(page);
|
|
+}
|
|
+
|
|
#endif /* CONFIG_LRU_GEN */
|
|
|
|
static __always_inline void add_page_to_lru_list(struct page *page,
|
|
diff --git a/include/linux/swap.h b/include/linux/swap.h
|
|
index de2bbbf181ba..0e7532c7db22 100644
|
|
--- a/include/linux/swap.h
|
|
+++ b/include/linux/swap.h
|
|
@@ -350,8 +350,8 @@ extern void deactivate_page(struct page *page);
|
|
extern void mark_page_lazyfree(struct page *page);
|
|
extern void swap_setup(void);
|
|
|
|
-extern void lru_cache_add_inactive_or_unevictable(struct page *page,
|
|
- struct vm_area_struct *vma);
|
|
+extern void lru_cache_add_page_vma(struct page *page, struct vm_area_struct *vma,
|
|
+ bool faulting);
|
|
|
|
/* linux/mm/vmscan.c */
|
|
extern unsigned long zone_reclaimable_pages(struct zone *zone);
|
|
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
|
|
index 6addc9780319..4e93e5602723 100644
|
|
--- a/kernel/events/uprobes.c
|
|
+++ b/kernel/events/uprobes.c
|
|
@@ -184,7 +184,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
|
|
if (new_page) {
|
|
get_page(new_page);
|
|
page_add_new_anon_rmap(new_page, vma, addr, false);
|
|
- lru_cache_add_inactive_or_unevictable(new_page, vma);
|
|
+ lru_cache_add_page_vma(new_page, vma, false);
|
|
} else
|
|
/* no new page, just dec_mm_counter for old_page */
|
|
dec_mm_counter(mm, MM_ANONPAGES);
|
|
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
|
|
index 26d3cc4a7a0b..2cf46270c84b 100644
|
|
--- a/mm/huge_memory.c
|
|
+++ b/mm/huge_memory.c
|
|
@@ -637,7 +637,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
|
|
entry = mk_huge_pmd(page, vma->vm_page_prot);
|
|
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
|
|
page_add_new_anon_rmap(page, vma, haddr, true);
|
|
- lru_cache_add_inactive_or_unevictable(page, vma);
|
|
+ lru_cache_add_page_vma(page, vma, true);
|
|
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
|
|
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
|
|
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
|
|
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
|
|
index a7d6cb912b05..08a43910f232 100644
|
|
--- a/mm/khugepaged.c
|
|
+++ b/mm/khugepaged.c
|
|
@@ -1199,7 +1199,7 @@ static void collapse_huge_page(struct mm_struct *mm,
|
|
spin_lock(pmd_ptl);
|
|
BUG_ON(!pmd_none(*pmd));
|
|
page_add_new_anon_rmap(new_page, vma, address, true);
|
|
- lru_cache_add_inactive_or_unevictable(new_page, vma);
|
|
+ lru_cache_add_page_vma(new_page, vma, true);
|
|
pgtable_trans_huge_deposit(mm, pmd, pgtable);
|
|
set_pmd_at(mm, address, pmd, _pmd);
|
|
update_mmu_cache_pmd(vma, address, pmd);
|
|
diff --git a/mm/memory.c b/mm/memory.c
|
|
index 550405fc3b5e..9a6cb6d31430 100644
|
|
--- a/mm/memory.c
|
|
+++ b/mm/memory.c
|
|
@@ -73,6 +73,7 @@
|
|
#include <linux/perf_event.h>
|
|
#include <linux/ptrace.h>
|
|
#include <linux/vmalloc.h>
|
|
+#include <linux/mm_inline.h>
|
|
|
|
#include <trace/events/kmem.h>
|
|
|
|
@@ -839,7 +840,7 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
|
|
copy_user_highpage(new_page, page, addr, src_vma);
|
|
__SetPageUptodate(new_page);
|
|
page_add_new_anon_rmap(new_page, dst_vma, addr, false);
|
|
- lru_cache_add_inactive_or_unevictable(new_page, dst_vma);
|
|
+ lru_cache_add_page_vma(new_page, dst_vma, false);
|
|
rss[mm_counter(new_page)]++;
|
|
|
|
/* All done, just insert the new page copy in the child */
|
|
@@ -2907,7 +2908,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
|
|
*/
|
|
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
|
|
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
|
|
- lru_cache_add_inactive_or_unevictable(new_page, vma);
|
|
+ lru_cache_add_page_vma(new_page, vma, true);
|
|
/*
|
|
* We call the notify macro here because, when using secondary
|
|
* mmu page tables (such as kvm shadow page tables), we want the
|
|
@@ -3438,9 +3439,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
|
|
/* ksm created a completely new copy */
|
|
if (unlikely(page != swapcache && swapcache)) {
|
|
page_add_new_anon_rmap(page, vma, vmf->address, false);
|
|
- lru_cache_add_inactive_or_unevictable(page, vma);
|
|
+ lru_cache_add_page_vma(page, vma, true);
|
|
} else {
|
|
do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
|
|
+ lru_gen_activation(page, vma);
|
|
}
|
|
|
|
swap_free(entry);
|
|
@@ -3584,7 +3586,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
|
|
|
|
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
|
|
page_add_new_anon_rmap(page, vma, vmf->address, false);
|
|
- lru_cache_add_inactive_or_unevictable(page, vma);
|
|
+ lru_cache_add_page_vma(page, vma, true);
|
|
setpte:
|
|
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
|
|
|
|
@@ -3709,6 +3711,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
|
|
|
|
add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR);
|
|
page_add_file_rmap(page, true);
|
|
+ lru_gen_activation(page, vma);
|
|
/*
|
|
* deposit and withdraw with pmd lock held
|
|
*/
|
|
@@ -3752,10 +3755,11 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
|
|
if (write && !(vma->vm_flags & VM_SHARED)) {
|
|
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
|
|
page_add_new_anon_rmap(page, vma, addr, false);
|
|
- lru_cache_add_inactive_or_unevictable(page, vma);
|
|
+ lru_cache_add_page_vma(page, vma, true);
|
|
} else {
|
|
inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
|
|
page_add_file_rmap(page, false);
|
|
+ lru_gen_activation(page, vma);
|
|
}
|
|
set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
|
|
}
|
|
diff --git a/mm/migrate.c b/mm/migrate.c
|
|
index 62b81d5257aa..1064b03cac33 100644
|
|
--- a/mm/migrate.c
|
|
+++ b/mm/migrate.c
|
|
@@ -3004,7 +3004,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
|
|
inc_mm_counter(mm, MM_ANONPAGES);
|
|
page_add_new_anon_rmap(page, vma, addr, false);
|
|
if (!is_zone_device_page(page))
|
|
- lru_cache_add_inactive_or_unevictable(page, vma);
|
|
+ lru_cache_add_page_vma(page, vma, false);
|
|
get_page(page);
|
|
|
|
if (flush) {
|
|
diff --git a/mm/swap.c b/mm/swap.c
|
|
index f20ed56ebbbf..d6458ee1e9f8 100644
|
|
--- a/mm/swap.c
|
|
+++ b/mm/swap.c
|
|
@@ -306,7 +306,7 @@ void lru_note_cost_page(struct page *page)
|
|
|
|
static void __activate_page(struct page *page, struct lruvec *lruvec)
|
|
{
|
|
- if (!PageActive(page) && !PageUnevictable(page)) {
|
|
+ if (!PageUnevictable(page) && !page_is_active(page, lruvec)) {
|
|
int nr_pages = thp_nr_pages(page);
|
|
|
|
del_page_from_lru_list(page, lruvec);
|
|
@@ -337,7 +337,7 @@ static bool need_activate_page_drain(int cpu)
|
|
static void activate_page_on_lru(struct page *page)
|
|
{
|
|
page = compound_head(page);
|
|
- if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
|
|
+ if (PageLRU(page) && !PageUnevictable(page) && !page_is_active(page, NULL)) {
|
|
struct pagevec *pvec;
|
|
|
|
local_lock(&lru_pvecs.lock);
|
|
@@ -431,7 +431,7 @@ void mark_page_accessed(struct page *page)
|
|
* this list is never rotated or maintained, so marking an
|
|
* evictable page accessed has no effect.
|
|
*/
|
|
- } else if (!PageActive(page)) {
|
|
+ } else if (!page_inc_usage(page)) {
|
|
activate_page(page);
|
|
ClearPageReferenced(page);
|
|
workingset_activation(page);
|
|
@@ -467,15 +467,14 @@ void lru_cache_add(struct page *page)
|
|
EXPORT_SYMBOL(lru_cache_add);
|
|
|
|
/**
|
|
- * lru_cache_add_inactive_or_unevictable
|
|
+ * lru_cache_add_page_vma
|
|
* @page: the page to be added to LRU
|
|
* @vma: vma in which page is mapped for determining reclaimability
|
|
*
|
|
- * Place @page on the inactive or unevictable LRU list, depending on its
|
|
- * evictability.
|
|
+ * Place @page on an LRU list, depending on its evictability.
|
|
*/
|
|
-void lru_cache_add_inactive_or_unevictable(struct page *page,
|
|
- struct vm_area_struct *vma)
|
|
+void lru_cache_add_page_vma(struct page *page, struct vm_area_struct *vma,
|
|
+ bool faulting)
|
|
{
|
|
bool unevictable;
|
|
|
|
@@ -492,6 +491,11 @@ void lru_cache_add_inactive_or_unevictable(struct page *page,
|
|
__mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages);
|
|
count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
|
|
}
|
|
+
|
|
+ /* tell the multigenerational lru that the page is being faulted in */
|
|
+ if (lru_gen_enabled() && !unevictable && faulting)
|
|
+ SetPageActive(page);
|
|
+
|
|
lru_cache_add(page);
|
|
}
|
|
|
|
@@ -518,7 +522,7 @@ void lru_cache_add_inactive_or_unevictable(struct page *page,
|
|
*/
|
|
static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec)
|
|
{
|
|
- bool active = PageActive(page);
|
|
+ bool active = page_is_active(page, lruvec);
|
|
int nr_pages = thp_nr_pages(page);
|
|
|
|
if (PageUnevictable(page))
|
|
@@ -558,7 +562,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec)
|
|
|
|
static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec)
|
|
{
|
|
- if (PageActive(page) && !PageUnevictable(page)) {
|
|
+ if (!PageUnevictable(page) && page_is_active(page, lruvec)) {
|
|
int nr_pages = thp_nr_pages(page);
|
|
|
|
del_page_from_lru_list(page, lruvec);
|
|
@@ -672,7 +676,7 @@ void deactivate_file_page(struct page *page)
|
|
*/
|
|
void deactivate_page(struct page *page)
|
|
{
|
|
- if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
|
|
+ if (PageLRU(page) && !PageUnevictable(page) && page_is_active(page, NULL)) {
|
|
struct pagevec *pvec;
|
|
|
|
local_lock(&lru_pvecs.lock);
|
|
diff --git a/mm/swapfile.c b/mm/swapfile.c
|
|
index c6041d10a73a..ab3b5ca404fd 100644
|
|
--- a/mm/swapfile.c
|
|
+++ b/mm/swapfile.c
|
|
@@ -1936,7 +1936,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
|
|
page_add_anon_rmap(page, vma, addr, false);
|
|
} else { /* ksm created a completely new copy */
|
|
page_add_new_anon_rmap(page, vma, addr, false);
|
|
- lru_cache_add_inactive_or_unevictable(page, vma);
|
|
+ lru_cache_add_page_vma(page, vma, false);
|
|
}
|
|
swap_free(entry);
|
|
out:
|
|
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
|
|
index 9a3d451402d7..e1d4cd3103b8 100644
|
|
--- a/mm/userfaultfd.c
|
|
+++ b/mm/userfaultfd.c
|
|
@@ -123,7 +123,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
|
|
|
|
inc_mm_counter(dst_mm, MM_ANONPAGES);
|
|
page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
|
|
- lru_cache_add_inactive_or_unevictable(page, dst_vma);
|
|
+ lru_cache_add_page_vma(page, dst_vma, true);
|
|
|
|
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
|
|
|
|
diff --git a/mm/vmscan.c b/mm/vmscan.c
|
|
index 8559bb94d452..c74ebe2039f7 100644
|
|
--- a/mm/vmscan.c
|
|
+++ b/mm/vmscan.c
|
|
@@ -898,9 +898,11 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
|
|
|
|
if (PageSwapCache(page)) {
|
|
swp_entry_t swap = { .val = page_private(page) };
|
|
- mem_cgroup_swapout(page, swap);
|
|
+
|
|
+ /* get a shadow entry before page_memcg() is cleared */
|
|
if (reclaimed && !mapping_exiting(mapping))
|
|
shadow = workingset_eviction(page, target_memcg);
|
|
+ mem_cgroup_swapout(page, swap);
|
|
__delete_from_swap_cache(page, swap, shadow);
|
|
xa_unlock_irqrestore(&mapping->i_pages, flags);
|
|
put_swap_page(page, swap);
|
|
@@ -4375,6 +4377,93 @@ static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
|
|
get_nr_gens(lruvec, 1) <= MAX_NR_GENS;
|
|
}
|
|
|
|
+/******************************************************************************
|
|
+ * refault feedback loop
|
|
+ ******************************************************************************/
|
|
+
|
|
+/*
|
|
+ * A feedback loop modeled after the PID controller. Currently supports the
|
|
+ * proportional (P) and the integral (I) terms; the derivative (D) term can be
|
|
+ * added if necessary. The setpoint (SP) is the desired position; the process
|
|
+ * variable (PV) is the measured position. The error is the difference between
|
|
+ * the SP and the PV. A positive error results in a positive control output
|
|
+ * correction, which, in our case, is to allow eviction.
|
|
+ *
|
|
+ * The P term is the current refault rate refaulted/(evicted+activated), which
|
|
+ * has a weight of 1. The I term is the arithmetic mean of the last N refault
|
|
+ * rates, weighted by geometric series 1/2, 1/4, ..., 1/(1<<N).
|
|
+ *
|
|
+ * Our goal is to make sure upper tiers have similar refault rates as the base
|
|
+ * tier. That is we try to be fair to all tiers by maintaining similar refault
|
|
+ * rates across them.
|
|
+ */
|
|
+struct controller_pos {
|
|
+ unsigned long refaulted;
|
|
+ unsigned long total;
|
|
+ int gain;
|
|
+};
|
|
+
|
|
+static void read_controller_pos(struct controller_pos *pos, struct lruvec *lruvec,
|
|
+ int file, int tier, int gain)
|
|
+{
|
|
+ struct lrugen *lrugen = &lruvec->evictable;
|
|
+ int sid = sid_from_seq_or_gen(lrugen->min_seq[file]);
|
|
+
|
|
+ pos->refaulted = lrugen->avg_refaulted[file][tier] +
|
|
+ atomic_long_read(&lrugen->refaulted[sid][file][tier]);
|
|
+ pos->total = lrugen->avg_total[file][tier] +
|
|
+ atomic_long_read(&lrugen->evicted[sid][file][tier]);
|
|
+ if (tier)
|
|
+ pos->total += lrugen->activated[sid][file][tier - 1];
|
|
+ pos->gain = gain;
|
|
+}
|
|
+
|
|
+static void reset_controller_pos(struct lruvec *lruvec, int gen, int file)
|
|
+{
|
|
+ int tier;
|
|
+ int sid = sid_from_seq_or_gen(gen);
|
|
+ struct lrugen *lrugen = &lruvec->evictable;
|
|
+ bool carryover = gen == lru_gen_from_seq(lrugen->min_seq[file]);
|
|
+
|
|
+ if (!carryover && NR_STAT_GENS == 1)
|
|
+ return;
|
|
+
|
|
+ for (tier = 0; tier < MAX_NR_TIERS; tier++) {
|
|
+ if (carryover) {
|
|
+ unsigned long sum;
|
|
+
|
|
+ sum = lrugen->avg_refaulted[file][tier] +
|
|
+ atomic_long_read(&lrugen->refaulted[sid][file][tier]);
|
|
+ WRITE_ONCE(lrugen->avg_refaulted[file][tier], sum >> 1);
|
|
+
|
|
+ sum = lrugen->avg_total[file][tier] +
|
|
+ atomic_long_read(&lrugen->evicted[sid][file][tier]);
|
|
+ if (tier)
|
|
+ sum += lrugen->activated[sid][file][tier - 1];
|
|
+ WRITE_ONCE(lrugen->avg_total[file][tier], sum >> 1);
|
|
+
|
|
+ if (NR_STAT_GENS > 1)
|
|
+ continue;
|
|
+ }
|
|
+
|
|
+ atomic_long_set(&lrugen->refaulted[sid][file][tier], 0);
|
|
+ atomic_long_set(&lrugen->evicted[sid][file][tier], 0);
|
|
+ if (tier)
|
|
+ WRITE_ONCE(lrugen->activated[sid][file][tier - 1], 0);
|
|
+ }
|
|
+}
|
|
+
|
|
+static bool positive_ctrl_err(struct controller_pos *sp, struct controller_pos *pv)
|
|
+{
|
|
+ /*
|
|
+ * Allow eviction if the PV has a limited number of refaulted pages or a
|
|
+ * lower refault rate than the SP.
|
|
+ */
|
|
+ return pv->refaulted < SWAP_CLUSTER_MAX ||
|
|
+ pv->refaulted * max(sp->total, 1UL) * sp->gain <=
|
|
+ sp->refaulted * max(pv->total, 1UL) * pv->gain;
|
|
+}
|
|
+
|
|
/******************************************************************************
|
|
* state change
|
|
******************************************************************************/
|
|
diff --git a/mm/workingset.c b/mm/workingset.c
|
|
index cd39902c1062..df363f9419fc 100644
|
|
--- a/mm/workingset.c
|
|
+++ b/mm/workingset.c
|
|
@@ -168,9 +168,9 @@
|
|
* refault distance will immediately activate the refaulting page.
|
|
*/
|
|
|
|
-#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \
|
|
- 1 + NODES_SHIFT + MEM_CGROUP_ID_SHIFT)
|
|
-#define EVICTION_MASK (~0UL >> EVICTION_SHIFT)
|
|
+#define EVICTION_SHIFT (BITS_PER_XA_VALUE - MEM_CGROUP_ID_SHIFT - NODES_SHIFT)
|
|
+#define EVICTION_MASK (BIT(EVICTION_SHIFT) - 1)
|
|
+#define WORKINGSET_WIDTH 1
|
|
|
|
/*
|
|
* Eviction timestamps need to be able to cover the full range of
|
|
@@ -182,38 +182,139 @@
|
|
*/
|
|
static unsigned int bucket_order __read_mostly;
|
|
|
|
-static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction,
|
|
- bool workingset)
|
|
+static void *pack_shadow(int memcg_id, struct pglist_data *pgdat, unsigned long val)
|
|
{
|
|
- eviction >>= bucket_order;
|
|
- eviction &= EVICTION_MASK;
|
|
- eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
|
|
- eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
|
|
- eviction = (eviction << 1) | workingset;
|
|
+ val = (val << MEM_CGROUP_ID_SHIFT) | memcg_id;
|
|
+ val = (val << NODES_SHIFT) | pgdat->node_id;
|
|
|
|
- return xa_mk_value(eviction);
|
|
+ return xa_mk_value(val);
|
|
}
|
|
|
|
-static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
|
|
- unsigned long *evictionp, bool *workingsetp)
|
|
+static unsigned long unpack_shadow(void *shadow, int *memcg_id, struct pglist_data **pgdat)
|
|
{
|
|
- unsigned long entry = xa_to_value(shadow);
|
|
- int memcgid, nid;
|
|
- bool workingset;
|
|
-
|
|
- workingset = entry & 1;
|
|
- entry >>= 1;
|
|
- nid = entry & ((1UL << NODES_SHIFT) - 1);
|
|
- entry >>= NODES_SHIFT;
|
|
- memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
|
|
- entry >>= MEM_CGROUP_ID_SHIFT;
|
|
-
|
|
- *memcgidp = memcgid;
|
|
- *pgdat = NODE_DATA(nid);
|
|
- *evictionp = entry << bucket_order;
|
|
- *workingsetp = workingset;
|
|
+ unsigned long val = xa_to_value(shadow);
|
|
+
|
|
+ *pgdat = NODE_DATA(val & (BIT(NODES_SHIFT) - 1));
|
|
+ val >>= NODES_SHIFT;
|
|
+ *memcg_id = val & (BIT(MEM_CGROUP_ID_SHIFT) - 1);
|
|
+
|
|
+ return val >> MEM_CGROUP_ID_SHIFT;
|
|
+}
|
|
+
|
|
+#ifdef CONFIG_LRU_GEN
|
|
+
|
|
+#if LRU_GEN_SHIFT + LRU_USAGE_SHIFT >= EVICTION_SHIFT
|
|
+#error "Please try smaller NODES_SHIFT, NR_LRU_GENS and TIERS_PER_GEN configurations"
|
|
+#endif
|
|
+
|
|
+static void page_set_usage(struct page *page, int usage)
|
|
+{
|
|
+ unsigned long old_flags, new_flags;
|
|
+
|
|
+ VM_BUG_ON(usage > BIT(LRU_USAGE_WIDTH));
|
|
+
|
|
+ if (!usage)
|
|
+ return;
|
|
+
|
|
+ do {
|
|
+ old_flags = READ_ONCE(page->flags);
|
|
+ new_flags = (old_flags & ~LRU_USAGE_MASK) | LRU_TIER_FLAGS |
|
|
+ ((usage - 1UL) << LRU_USAGE_PGOFF);
|
|
+ if (old_flags == new_flags)
|
|
+ break;
|
|
+ } while (cmpxchg(&page->flags, old_flags, new_flags) != old_flags);
|
|
+}
|
|
+
|
|
+/* Return a token to be stored in the shadow entry of a page being evicted. */
|
|
+static void *lru_gen_eviction(struct page *page)
|
|
+{
|
|
+ int sid, tier;
|
|
+ unsigned long token;
|
|
+ unsigned long min_seq;
|
|
+ struct lruvec *lruvec;
|
|
+ struct lrugen *lrugen;
|
|
+ int file = page_is_file_lru(page);
|
|
+ int usage = page_tier_usage(page);
|
|
+ struct mem_cgroup *memcg = page_memcg(page);
|
|
+ struct pglist_data *pgdat = page_pgdat(page);
|
|
+
|
|
+ if (!lru_gen_enabled())
|
|
+ return NULL;
|
|
+
|
|
+ lruvec = mem_cgroup_lruvec(memcg, pgdat);
|
|
+ lrugen = &lruvec->evictable;
|
|
+ min_seq = READ_ONCE(lrugen->min_seq[file]);
|
|
+ token = (min_seq << LRU_USAGE_SHIFT) | usage;
|
|
+
|
|
+ sid = sid_from_seq_or_gen(min_seq);
|
|
+ tier = lru_tier_from_usage(usage);
|
|
+ atomic_long_add(thp_nr_pages(page), &lrugen->evicted[sid][file][tier]);
|
|
+
|
|
+ return pack_shadow(mem_cgroup_id(memcg), pgdat, token);
|
|
+}
|
|
+
|
|
+/* Account a refaulted page based on the token stored in its shadow entry. */
|
|
+static bool lru_gen_refault(struct page *page, void *shadow)
|
|
+{
|
|
+ int sid, tier, usage;
|
|
+ int memcg_id;
|
|
+ unsigned long token;
|
|
+ unsigned long min_seq;
|
|
+ struct lruvec *lruvec;
|
|
+ struct lrugen *lrugen;
|
|
+ struct pglist_data *pgdat;
|
|
+ struct mem_cgroup *memcg;
|
|
+ int file = page_is_file_lru(page);
|
|
+
|
|
+ if (!lru_gen_enabled())
|
|
+ return false;
|
|
+
|
|
+ token = unpack_shadow(shadow, &memcg_id, &pgdat);
|
|
+ if (page_pgdat(page) != pgdat)
|
|
+ return true;
|
|
+
|
|
+ rcu_read_lock();
|
|
+ memcg = page_memcg_rcu(page);
|
|
+ if (mem_cgroup_id(memcg) != memcg_id)
|
|
+ goto unlock;
|
|
+
|
|
+ usage = token & (BIT(LRU_USAGE_SHIFT) - 1);
|
|
+ token >>= LRU_USAGE_SHIFT;
|
|
+
|
|
+ lruvec = mem_cgroup_lruvec(memcg, pgdat);
|
|
+ lrugen = &lruvec->evictable;
|
|
+ min_seq = READ_ONCE(lrugen->min_seq[file]);
|
|
+ if (token != (min_seq & (EVICTION_MASK >> LRU_USAGE_SHIFT)))
|
|
+ goto unlock;
|
|
+
|
|
+ page_set_usage(page, usage);
|
|
+
|
|
+ sid = sid_from_seq_or_gen(min_seq);
|
|
+ tier = lru_tier_from_usage(usage);
|
|
+ atomic_long_add(thp_nr_pages(page), &lrugen->refaulted[sid][file][tier]);
|
|
+ inc_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file);
|
|
+ if (tier)
|
|
+ inc_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file);
|
|
+unlock:
|
|
+ rcu_read_unlock();
|
|
+
|
|
+ return true;
|
|
+}
|
|
+
|
|
+#else /* CONFIG_LRU_GEN */
|
|
+
|
|
+static void *lru_gen_eviction(struct page *page)
|
|
+{
|
|
+ return NULL;
|
|
}
|
|
|
|
+static bool lru_gen_refault(struct page *page, void *shadow)
|
|
+{
|
|
+ return false;
|
|
+}
|
|
+
|
|
+#endif /* CONFIG_LRU_GEN */
|
|
+
|
|
/**
|
|
* workingset_age_nonresident - age non-resident entries as LRU ages
|
|
* @lruvec: the lruvec that was aged
|
|
@@ -256,18 +357,25 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
|
|
unsigned long eviction;
|
|
struct lruvec *lruvec;
|
|
int memcgid;
|
|
+ void *shadow;
|
|
|
|
/* Page is fully exclusive and pins page's memory cgroup pointer */
|
|
VM_BUG_ON_PAGE(PageLRU(page), page);
|
|
VM_BUG_ON_PAGE(page_count(page), page);
|
|
VM_BUG_ON_PAGE(!PageLocked(page), page);
|
|
|
|
+ shadow = lru_gen_eviction(page);
|
|
+ if (shadow)
|
|
+ return shadow;
|
|
+
|
|
lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
|
|
/* XXX: target_memcg can be NULL, go through lruvec */
|
|
memcgid = mem_cgroup_id(lruvec_memcg(lruvec));
|
|
eviction = atomic_long_read(&lruvec->nonresident_age);
|
|
+ eviction >>= bucket_order;
|
|
+ eviction = (eviction << WORKINGSET_WIDTH) | PageWorkingset(page);
|
|
workingset_age_nonresident(lruvec, thp_nr_pages(page));
|
|
- return pack_shadow(memcgid, pgdat, eviction, PageWorkingset(page));
|
|
+ return pack_shadow(memcgid, pgdat, eviction);
|
|
}
|
|
|
|
/**
|
|
@@ -294,7 +402,10 @@ void workingset_refault(struct page *page, void *shadow)
|
|
bool workingset;
|
|
int memcgid;
|
|
|
|
- unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset);
|
|
+ if (lru_gen_refault(page, shadow))
|
|
+ return;
|
|
+
|
|
+ eviction = unpack_shadow(shadow, &memcgid, &pgdat);
|
|
|
|
rcu_read_lock();
|
|
/*
|
|
@@ -318,6 +429,8 @@ void workingset_refault(struct page *page, void *shadow)
|
|
goto out;
|
|
eviction_lruvec = mem_cgroup_lruvec(eviction_memcg, pgdat);
|
|
refault = atomic_long_read(&eviction_lruvec->nonresident_age);
|
|
+ workingset = eviction & (BIT(WORKINGSET_WIDTH) - 1);
|
|
+ eviction = (eviction >> WORKINGSET_WIDTH) << bucket_order;
|
|
|
|
/*
|
|
* Calculate the refault distance
|
|
@@ -335,7 +448,7 @@ void workingset_refault(struct page *page, void *shadow)
|
|
* longest time, so the occasional inappropriate activation
|
|
* leading to pressure on the active list is not a problem.
|
|
*/
|
|
- refault_distance = (refault - eviction) & EVICTION_MASK;
|
|
+ refault_distance = (refault - eviction) & (EVICTION_MASK >> WORKINGSET_WIDTH);
|
|
|
|
/*
|
|
* The activation decision for this page is made at the level
|
|
@@ -594,7 +707,7 @@ static int __init workingset_init(void)
|
|
unsigned int max_order;
|
|
int ret;
|
|
|
|
- BUILD_BUG_ON(BITS_PER_LONG < EVICTION_SHIFT);
|
|
+ BUILD_BUG_ON(EVICTION_SHIFT < WORKINGSET_WIDTH);
|
|
/*
|
|
* Calculate the eviction bucket size to cover the longest
|
|
* actionable refault distance, which is currently half of
|
|
@@ -602,7 +715,7 @@ static int __init workingset_init(void)
|
|
* some more pages at runtime, so keep working with up to
|
|
* double the initial memory by using totalram_pages as-is.
|
|
*/
|
|
- timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT;
|
|
+ timestamp_bits = EVICTION_SHIFT - WORKINGSET_WIDTH;
|
|
max_order = fls_long(totalram_pages() - 1);
|
|
if (max_order > timestamp_bits)
|
|
bucket_order = max_order - timestamp_bits;
|
|
--
|
|
2.31.1.295.g9ea45b61b8-goog
|
|
|
|
|