From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <linux-kernel-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17ED0C43460 for <linux-kernel@archiver.kernel.org>; Thu, 20 May 2021 06:54:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E04C861184 for <linux-kernel@archiver.kernel.org>; Thu, 20 May 2021 06:54:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231223AbhETG4A (ORCPT <rfc822;linux-kernel@archiver.kernel.org>); Thu, 20 May 2021 02:56:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230473AbhETGzl (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 20 May 2021 02:55:41 -0400 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0A0DC061761 for <linux-kernel@vger.kernel.org>; Wed, 19 May 2021 23:54:20 -0700 (PDT) Received: by mail-qv1-xf49.google.com with SMTP id d9-20020a0ce4490000b02901f0bee07112so6151672qvm.7 for <linux-kernel@vger.kernel.org>; Wed, 19 May 2021 23:54:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=D+kCP8KjWdhzq6b9AfWqzFHrIcC1HBgTAlg7o1thC8s=; b=Ao3JFmOKgU6GUK7wOdKwO7smRq1lLjob3ltec82Ju9mPzN+QmdjLHzBqk1xnUggESF TqhhI3jybr858NfIj3PCXK9+qR3zojc5Pd/Quyp44VSHbor2BjBUQqP/t8M487uM4XwV WngIjYnvrYzwh9qjiSWbyBv7yV1ee386Z4r6QxKE99zk0yauu04cnFkSyQcJzvL7ST9Y gunIrZGlwh/QB3VgMvJBx8LLRtENwU2C6hFb2JqIhNx7ECiYmfTdxZ3hqTeciT6fp1mo VJhTuLMD0zN+BmbL7udJFNaRaLEzDq8aaX3Qgn7+HzfVXcaIkWuHdLfLiqx6NOEuXJPh aFOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=D+kCP8KjWdhzq6b9AfWqzFHrIcC1HBgTAlg7o1thC8s=; b=Gjw4qCU2aoOaAwRGh+lY4+hcMXHU7TPGrsgdc0GeQGBjEbSelYAeLx6lfapzEMs4gS OINghBuL7TEDPHzWY92K4Snh4Pm597qGEmIgplE4cMHoWrN8rxc/C+gB/gsW/UgvllX2 o0zgNR9ve4/y3vOdD7xGYl0wDq608mGKsoRKDgVf/SEkDCldm1xmB/MaJihWPSw4niAH KRDP84OugEgIRgRj3MrqoREu5cyjw4ClxbQ8HeaRnw1wt6isXGqlXjiBVveyjSbrV+Q/ luG3YEEGwMlCYbMovQTSmBB7n0pN8Ihg0qVPmr6GcmbpcwYKQWv1tIves1vbV0kdb4aN u9HQ== X-Gm-Message-State: AOAM530vodqQk47GPTvruvJy4njfXeKD7559Rhxl39MVYv2cMgQ/XPuI uuzSatZDrJOCQGdTQCyuRTP/IMipOlM= X-Google-Smtp-Source: ABdhPJwiri0QbWt8YjsEa+N+Ooz0Ku0LVYpwKy1ZvcZJzOwoHQf1X931BLtxTF10spH4XfRsXE5x6SYzq+w= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:595d:62ee:f08:8e83]) (user=yuzhao job=sendgmr) by 2002:ad4:5767:: with SMTP id r7mr3879143qvx.1.1621493659852; Wed, 19 May 2021 23:54:19 -0700 (PDT) Date: Thu, 20 May 2021 00:53:53 -0600 In-Reply-To: <20210520065355.2736558-1-yuzhao@google.com> Message-Id: <20210520065355.2736558-13-yuzhao@google.com> Mime-Version: 1.0 References: <20210520065355.2736558-1-yuzhao@google.com> X-Mailer: git-send-email 2.31.1.751.gd2f1c929bd-goog Subject: [PATCH v3 12/14] mm: multigenerational lru: user interface From: Yu Zhao <yuzhao@google.com> To: linux-mm@kvack.org Cc: Alex Shi <alexs@kernel.org>, Andi Kleen <ak@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, Dave Chinner <david@fromorbit.com>, Dave Hansen <dave.hansen@linux.intel.com>, Donald Carr <sirspudd@gmail.com>, Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>, Johannes Weiner <hannes@cmpxchg.org>, Jonathan Corbet <corbet@lwn.net>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Konstantin Kharlamov <hi-angel@yandex.ru>, Marcus Seyfarth <m.seyfarth@gmail.com>, Matthew Wilcox <willy@infradead.org>, Mel Gorman <mgorman@suse.de>, Miaohe Lin <linmiaohe@huawei.com>, Michael Larabel <michael@michaellarabel.com>, Michal Hocko <mhocko@suse.com>, Michel Lespinasse <michel@lespinasse.org>, Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>, Tim Chen <tim.c.chen@linux.intel.com>, Vlastimil Babka <vbabka@suse.cz>, Yang Shi <shy828301@gmail.com>, Ying Huang <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>, linux-kernel@vger.kernel.org, lkp@lists.01.org, page-reclaim@google.com, Yu Zhao <yuzhao@google.com>, Konstantin Kharlamov <Hi-Angel@yandex.ru> Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org List-Archive: <https://lore.kernel.org/lkml/> Add a sysfs file /sys/kernel/mm/lru_gen/enabled to enable and disable the multigenerational lru at runtime. Add a sysfs file /sys/kernel/mm/lru_gen/spread to optionally spread pages out across more than three generations. More generations make the background aging more aggressive. Add a debugfs file /sys/kernel/debug/lru_gen to monitor the multigenerational lru and trigger the aging and the eviction. This file has the following output: memcg memcg_id memcg_path node node_id min_gen birth_time anon_size file_size ... max_gen birth_time anon_size file_size Given a memcg and a node, "min_gen" is the oldest generation (number) and "max_gen" is the youngest. Birth time is in milliseconds. The sizes of anon and file types are in pages. This file takes the following input: + memcg_id node_id gen [swappiness] - memcg_id node_id gen [swappiness] [nr_to_reclaim] The first command line accounts referenced pages to generation "max_gen" and creates the next generation "max_gen"+1. In this case, "gen" should be equal to "max_gen". A swap file and a non-zero "swappiness" are required to scan anon type. If swapping is not desired, set vm.swappiness to 0. The second command line evicts generations less than or equal to "gen". In this case, "gen" should be less than "max_gen"-1 as "max_gen" and "max_gen"-1 are active generations and therefore protected from the eviction. Use "nr_to_reclaim" to limit the number of pages to evict. Multiple command lines are supported, so does concatenation with delimiters "," and ";". Signed-off-by: Yu Zhao <yuzhao@google.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> --- mm/vmscan.c | 403 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 403 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2f86dcc04c56..ff2deec24c64 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -52,6 +52,8 @@ #include <linux/memory.h> #include <linux/pagewalk.h> #include <linux/shmem_fs.h> +#include <linux/ctype.h> +#include <linux/debugfs.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -4678,6 +4680,401 @@ static void lru_gen_stop_kswapd(int nid) kvfree(pgdat->mm_walk_args); } +/****************************************************************************** + * sysfs interface + ******************************************************************************/ + +static ssize_t show_lru_gen_spread(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", READ_ONCE(lru_gen_spread)); +} + +static ssize_t store_lru_gen_spread(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t len) +{ + int spread; + + if (kstrtoint(buf, 10, &spread) || spread >= MAX_NR_GENS) + return -EINVAL; + + WRITE_ONCE(lru_gen_spread, spread); + + return len; +} + +static struct kobj_attribute lru_gen_spread_attr = __ATTR( + spread, 0644, show_lru_gen_spread, store_lru_gen_spread +); + +static ssize_t show_lru_gen_enabled(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%d\n", lru_gen_enabled()); +} + +static ssize_t store_lru_gen_enabled(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t len) +{ + int enable; + + if (kstrtoint(buf, 10, &enable)) + return -EINVAL; + + lru_gen_set_state(enable, true, false); + + return len; +} + +static struct kobj_attribute lru_gen_enabled_attr = __ATTR( + enabled, 0644, show_lru_gen_enabled, store_lru_gen_enabled +); + +static struct attribute *lru_gen_attrs[] = { + &lru_gen_spread_attr.attr, + &lru_gen_enabled_attr.attr, + NULL +}; + +static struct attribute_group lru_gen_attr_group = { + .name = "lru_gen", + .attrs = lru_gen_attrs, +}; + +/****************************************************************************** + * debugfs interface + ******************************************************************************/ + +static void *lru_gen_seq_start(struct seq_file *m, loff_t *pos) +{ + struct mem_cgroup *memcg; + loff_t nr_to_skip = *pos; + + m->private = kzalloc(PATH_MAX, GFP_KERNEL); + if (!m->private) + return ERR_PTR(-ENOMEM); + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + int nid; + + for_each_node_state(nid, N_MEMORY) { + if (!nr_to_skip--) + return mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + } + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + + return NULL; +} + +static void lru_gen_seq_stop(struct seq_file *m, void *v) +{ + if (!IS_ERR_OR_NULL(v)) + mem_cgroup_iter_break(NULL, lruvec_memcg(v)); + + kfree(m->private); + m->private = NULL; +} + +static void *lru_gen_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + int nid = lruvec_pgdat(v)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(v); + + ++*pos; + + nid = next_memory_node(nid); + if (nid == MAX_NUMNODES) { + memcg = mem_cgroup_iter(NULL, memcg, NULL); + if (!memcg) + return NULL; + + nid = first_memory_node; + } + + return mem_cgroup_lruvec(memcg, NODE_DATA(nid)); +} + +static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, + unsigned long max_seq, unsigned long *min_seq, + unsigned long seq) +{ + int i; + int type, tier; + int hist = hist_from_seq_or_gen(seq); + struct lrugen *lrugen = &lruvec->evictable; + int nid = lruvec_pgdat(lruvec)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + struct lru_gen_mm_list *mm_list = get_mm_list(memcg); + + for (tier = 0; tier < MAX_NR_TIERS; tier++) { + seq_printf(m, " %10d", tier); + for (type = 0; type < ANON_AND_FILE; type++) { + unsigned long n[3] = {}; + + if (seq == max_seq) { + n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); + n[1] = READ_ONCE(lrugen->avg_total[type][tier]); + + seq_printf(m, " %10luR %10luT %10lu ", n[0], n[1], n[2]); + } else if (seq == min_seq[type] || NR_STAT_GENS > 1) { + n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]); + n[1] = atomic_long_read(&lrugen->evicted[hist][type][tier]); + if (tier) + n[2] = READ_ONCE(lrugen->activated[hist][type][tier - 1]); + + seq_printf(m, " %10lur %10lue %10lua", n[0], n[1], n[2]); + } else + seq_puts(m, " 0 0 0 "); + } + seq_putc(m, '\n'); + } + + seq_puts(m, " "); + for (i = 0; i < NR_MM_STATS; i++) { + if (seq == max_seq && NR_STAT_GENS == 1) + seq_printf(m, " %10lu%c", READ_ONCE(mm_list->nodes[nid].stats[hist][i]), + toupper(MM_STAT_CODES[i])); + else if (seq != max_seq && NR_STAT_GENS > 1) + seq_printf(m, " %10lu%c", READ_ONCE(mm_list->nodes[nid].stats[hist][i]), + MM_STAT_CODES[i]); + else + seq_puts(m, " 0 "); + } + seq_putc(m, '\n'); +} + +static int lru_gen_seq_show(struct seq_file *m, void *v) +{ + unsigned long seq; + bool full = !debugfs_real_fops(m->file)->write; + struct lruvec *lruvec = v; + struct lrugen *lrugen = &lruvec->evictable; + int nid = lruvec_pgdat(lruvec)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + DEFINE_MAX_SEQ(); + DEFINE_MIN_SEQ(); + + if (nid == first_memory_node) { +#ifdef CONFIG_MEMCG + if (memcg) + cgroup_path(memcg->css.cgroup, m->private, PATH_MAX); +#endif + seq_printf(m, "memcg %5hu %s\n", mem_cgroup_id(memcg), (char *)m->private); + } + + seq_printf(m, " node %5d\n", nid); + + seq = full ? (max_seq < MAX_NR_GENS ? 0 : max_seq - MAX_NR_GENS + 1) : + min(min_seq[0], min_seq[1]); + + for (; seq <= max_seq; seq++) { + int gen, type, zone; + unsigned int msecs; + + gen = lru_gen_from_seq(seq); + msecs = jiffies_to_msecs(jiffies - READ_ONCE(lrugen->timestamps[gen])); + + seq_printf(m, " %10lu %10u", seq, msecs); + + for (type = 0; type < ANON_AND_FILE; type++) { + long size = 0; + + if (seq < min_seq[type]) { + seq_puts(m, " -0 "); + continue; + } + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + size += READ_ONCE(lrugen->sizes[gen][type][zone]); + + seq_printf(m, " %10lu ", max(size, 0L)); + } + + seq_putc(m, '\n'); + + if (full) + lru_gen_seq_show_full(m, lruvec, max_seq, min_seq, seq); + } + + return 0; +} + +static const struct seq_operations lru_gen_seq_ops = { + .start = lru_gen_seq_start, + .stop = lru_gen_seq_stop, + .next = lru_gen_seq_next, + .show = lru_gen_seq_show, +}; + +static int advance_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness) +{ + struct scan_control sc = { + .target_mem_cgroup = lruvec_memcg(lruvec), + }; + DEFINE_MAX_SEQ(); + + if (seq == max_seq) + walk_mm_list(lruvec, max_seq, &sc, swappiness, NULL); + + return seq > max_seq ? -EINVAL : 0; +} + +static int advance_min_seq(struct lruvec *lruvec, unsigned long seq, int swappiness, + unsigned long nr_to_reclaim) +{ + struct blk_plug plug; + int err = -EINTR; + long nr_to_scan = LONG_MAX; + struct scan_control sc = { + .nr_to_reclaim = nr_to_reclaim, + .target_mem_cgroup = lruvec_memcg(lruvec), + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + DEFINE_MAX_SEQ(); + + if (seq >= max_seq - 1) + return -EINVAL; + + blk_start_plug(&plug); + + while (!signal_pending(current)) { + DEFINE_MIN_SEQ(); + + if (seq < min(min_seq[!swappiness], min_seq[swappiness < 200]) || + !evict_pages(lruvec, &sc, swappiness, &nr_to_scan)) { + err = 0; + break; + } + + cond_resched(); + } + + blk_finish_plug(&plug); + + return err; +} + +static int advance_seq(char cmd, int memcg_id, int nid, unsigned long seq, + int swappiness, unsigned long nr_to_reclaim) +{ + struct lruvec *lruvec; + int err = -EINVAL; + struct mem_cgroup *memcg = NULL; + + if (!mem_cgroup_disabled()) { + rcu_read_lock(); + memcg = mem_cgroup_from_id(memcg_id); +#ifdef CONFIG_MEMCG + if (memcg && !css_tryget(&memcg->css)) + memcg = NULL; +#endif + rcu_read_unlock(); + + if (!memcg) + goto done; + } + if (memcg_id != mem_cgroup_id(memcg)) + goto done; + + if (nid < 0 || nid >= MAX_NUMNODES || !node_state(nid, N_MEMORY)) + goto done; + + lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + + if (swappiness == -1) + swappiness = get_swappiness(lruvec); + else if (swappiness > 200U) + goto done; + + switch (cmd) { + case '+': + err = advance_max_seq(lruvec, seq, swappiness); + break; + case '-': + err = advance_min_seq(lruvec, seq, swappiness, nr_to_reclaim); + break; + } +done: + mem_cgroup_put(memcg); + + return err; +} + +static ssize_t lru_gen_seq_write(struct file *file, const char __user *src, + size_t len, loff_t *pos) +{ + void *buf; + char *cur, *next; + int err = 0; + + buf = kvmalloc(len + 1, GFP_USER); + if (!buf) + return -ENOMEM; + + if (copy_from_user(buf, src, len)) { + kvfree(buf); + return -EFAULT; + } + + next = buf; + next[len] = '\0'; + + while ((cur = strsep(&next, ",;\n"))) { + int n; + int end; + char cmd; + unsigned int memcg_id; + unsigned int nid; + unsigned long seq; + unsigned int swappiness = -1; + unsigned long nr_to_reclaim = -1; + + cur = skip_spaces(cur); + if (!*cur) + continue; + + n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid, + &seq, &end, &swappiness, &end, &nr_to_reclaim, &end); + if (n < 4 || cur[end]) { + err = -EINVAL; + break; + } + + err = advance_seq(cmd, memcg_id, nid, seq, swappiness, nr_to_reclaim); + if (err) + break; + } + + kvfree(buf); + + return err ? : len; +} + +static int lru_gen_seq_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &lru_gen_seq_ops); +} + +static const struct file_operations lru_gen_rw_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .write = lru_gen_seq_write, + .llseek = seq_lseek, + .release = seq_release, +}; + +static const struct file_operations lru_gen_ro_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + /****************************************************************************** * initialization ******************************************************************************/ @@ -4718,6 +5115,12 @@ static int __init init_lru_gen(void) if (hotplug_memory_notifier(lru_gen_online_mem, 0)) pr_err("lru_gen: failed to subscribe hotplug notifications\n"); + if (sysfs_create_group(mm_kobj, &lru_gen_attr_group)) + pr_err("lru_gen: failed to create sysfs group\n"); + + debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops); + debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops); + return 0; }; /* -- 2.31.1.751.gd2f1c929bd-goog