Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge misc fixes from Andrew Morton:
 "The usual shower of hotfixes.

  Chris's memcg patches aren't actually fixes - they're mature but a few
  niggling review issues were late to arrive.

  The ocfs2 fixes are quite old - those took some time to get reviewer
  attention.

  Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg,
  mm/slab-generic"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  mm, sl[ou]b: improve memory accounting
  mm, memcg: make scan aggression always exclude protection
  mm, memcg: make memory.emin the baseline for utilisation determination
  mm, memcg: proportional memory.{low,min} reclaim
  mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
  mm/page_alloc.c: fix a crash in free_pages_prepare()
  mm/z3fold.c: claim page in the beginning of free
  kernel/sysctl.c: do not override max_threads provided by userspace
  memcg: only record foreign writebacks with dirty pages when memcg is not disabled
  mm: fix -Wmissing-prototypes warnings
  writeback: fix use-after-free in finish_writeback_work()
  mm/memremap: drop unused SECTION_SIZE and SECTION_MASK
  panic: ensure preemption is disabled during panic()
  fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
  fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
  fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
  ocfs2: clear zero in unaligned direct IO
  • Loading branch information
Linus Torvalds committed Oct 7, 2019
2 parents c512c69 + 59bb479 commit eda57a0
Show file tree
Hide file tree
Showing 21 changed files with 281 additions and 89 deletions.
20 changes: 14 additions & 6 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -615,8 +615,8 @@ on an IO device and is an example of this type.
Protections
-----------

A cgroup is protected to be allocated upto the configured amount of
the resource if the usages of all its ancestors are under their
A cgroup is protected upto the configured amount of the resource
as long as the usages of all its ancestors are under their
protected levels. Protections can be hard guarantees or best effort
soft boundaries. Protections can also be over-committed in which case
only upto the amount available to the parent is protected among
Expand Down Expand Up @@ -1096,7 +1096,10 @@ PAGE_SIZE multiple when read back.
is within its effective min boundary, the cgroup's memory
won't be reclaimed under any conditions. If there is no
unprotected reclaimable memory available, OOM killer
is invoked.
is invoked. Above the effective min boundary (or
effective low boundary if it is higher), pages are reclaimed
proportionally to the overage, reducing reclaim pressure for
smaller overages.

Effective min boundary is limited by memory.min values of
all ancestor cgroups. If there is memory.min overcommitment
Expand All @@ -1118,7 +1121,10 @@ PAGE_SIZE multiple when read back.
Best-effort memory protection. If the memory usage of a
cgroup is within its effective low boundary, the cgroup's
memory won't be reclaimed unless memory can be reclaimed
from unprotected cgroups.
from unprotected cgroups. Above the effective low boundary (or
effective min boundary if it is higher), pages are reclaimed
proportionally to the overage, reducing reclaim pressure for
smaller overages.

Effective low boundary is limited by memory.low values of
all ancestor cgroups. If there is memory.low overcommitment
Expand Down Expand Up @@ -2482,8 +2488,10 @@ system performance due to overreclaim, to the point where the feature
becomes self-defeating.

The memory.low boundary on the other hand is a top-down allocated
reserve. A cgroup enjoys reclaim protection when it's within its low,
which makes delegation of subtrees possible.
reserve. A cgroup enjoys reclaim protection when it's within its
effective low, which makes delegation of subtrees possible. It also
enjoys having reclaim pressure proportional to its overage when
above its effective low.

The original high boundary, the hard limit, is defined as a strict
limit that can not budge, even if the OOM killer has to be called.
Expand Down
4 changes: 4 additions & 0 deletions Documentation/core-api/memory-allocation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the kernel
configuration, but it is a good practice to use `kmalloc` for objects
smaller than page size.

The address of a chunk allocated with `kmalloc` is aligned to at least
ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
alignment is also guaranteed to be at least the respective size.

For large allocations you can use :c:func:`vmalloc` and
:c:func:`vzalloc`, or directly request pages from the page
allocator. The memory allocated by `vmalloc` and related functions is
Expand Down
9 changes: 7 additions & 2 deletions fs/fs-writeback.c
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,13 @@ static void finish_writeback_work(struct bdi_writeback *wb,

if (work->auto_free)
kfree(work);
if (done && atomic_dec_and_test(&done->cnt))
wake_up_all(done->waitq);
if (done) {
wait_queue_head_t *waitq = done->waitq;

/* @done can't be accessed after the following dec */
if (atomic_dec_and_test(&done->cnt))
wake_up_all(waitq);
}
}

static void wb_queue_work(struct bdi_writeback *wb,
Expand Down
25 changes: 23 additions & 2 deletions fs/ocfs2/aops.c
Original file line number Diff line number Diff line change
Expand Up @@ -2049,7 +2049,8 @@ int ocfs2_write_end_nolock(struct address_space *mapping,
inode->i_mtime = inode->i_ctime = current_time(inode);
di->i_mtime = di->i_ctime = cpu_to_le64(inode->i_mtime.tv_sec);
di->i_mtime_nsec = di->i_ctime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
ocfs2_update_inode_fsync_trans(handle, inode, 1);
if (handle)
ocfs2_update_inode_fsync_trans(handle, inode, 1);
}
if (handle)
ocfs2_journal_dirty(handle, wc->w_di_bh);
Expand Down Expand Up @@ -2146,13 +2147,30 @@ static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock,
struct ocfs2_dio_write_ctxt *dwc = NULL;
struct buffer_head *di_bh = NULL;
u64 p_blkno;
loff_t pos = iblock << inode->i_sb->s_blocksize_bits;
unsigned int i_blkbits = inode->i_sb->s_blocksize_bits;
loff_t pos = iblock << i_blkbits;
sector_t endblk = (i_size_read(inode) - 1) >> i_blkbits;
unsigned len, total_len = bh_result->b_size;
int ret = 0, first_get_block = 0;

len = osb->s_clustersize - (pos & (osb->s_clustersize - 1));
len = min(total_len, len);

/*
* bh_result->b_size is count in get_more_blocks according to write
* "pos" and "end", we need map twice to return different buffer state:
* 1. area in file size, not set NEW;
* 2. area out file size, set NEW.
*
* iblock endblk
* |--------|---------|---------|---------
* |<-------area in file------->|
*/

if ((iblock <= endblk) &&
((iblock + ((len - 1) >> i_blkbits)) > endblk))
len = (endblk - iblock + 1) << i_blkbits;

mlog(0, "get block of %lu at %llu:%u req %u\n",
inode->i_ino, pos, len, total_len);

Expand Down Expand Up @@ -2236,6 +2254,9 @@ static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock,
if (desc->c_needs_zero)
set_buffer_new(bh_result);

if (iblock > endblk)
set_buffer_new(bh_result);

/* May sleep in end_io. It should not happen in a irq context. So defer
* it to dio work queue. */
set_buffer_defer_completion(bh_result);
Expand Down
2 changes: 1 addition & 1 deletion fs/ocfs2/ioctl.c
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ static int ocfs2_info_scan_inode_alloc(struct ocfs2_super *osb,
if (inode_alloc)
inode_lock(inode_alloc);

if (o2info_coherent(&fi->ifi_req)) {
if (inode_alloc && o2info_coherent(&fi->ifi_req)) {
status = ocfs2_inode_lock(inode_alloc, &bh, 0);
if (status < 0) {
mlog_errno(status);
Expand Down
56 changes: 23 additions & 33 deletions fs/ocfs2/xattr.c
Original file line number Diff line number Diff line change
Expand Up @@ -1490,18 +1490,6 @@ static int ocfs2_xa_check_space(struct ocfs2_xa_loc *loc,
return loc->xl_ops->xlo_check_space(loc, xi);
}

static void ocfs2_xa_add_entry(struct ocfs2_xa_loc *loc, u32 name_hash)
{
loc->xl_ops->xlo_add_entry(loc, name_hash);
loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
/*
* We can't leave the new entry's xe_name_offset at zero or
* add_namevalue() will go nuts. We set it to the size of our
* storage so that it can never be less than any other entry.
*/
loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
}

static void ocfs2_xa_add_namevalue(struct ocfs2_xa_loc *loc,
struct ocfs2_xattr_info *xi)
{
Expand Down Expand Up @@ -2133,29 +2121,31 @@ static int ocfs2_xa_prepare_entry(struct ocfs2_xa_loc *loc,
if (rc)
goto out;

if (loc->xl_entry) {
if (ocfs2_xa_can_reuse_entry(loc, xi)) {
orig_value_size = loc->xl_entry->xe_value_size;
rc = ocfs2_xa_reuse_entry(loc, xi, ctxt);
if (rc)
goto out;
goto alloc_value;
}
if (!loc->xl_entry) {
rc = -EINVAL;
goto out;
}

if (!ocfs2_xattr_is_local(loc->xl_entry)) {
orig_clusters = ocfs2_xa_value_clusters(loc);
rc = ocfs2_xa_value_truncate(loc, 0, ctxt);
if (rc) {
mlog_errno(rc);
ocfs2_xa_cleanup_value_truncate(loc,
"overwriting",
orig_clusters);
goto out;
}
if (ocfs2_xa_can_reuse_entry(loc, xi)) {
orig_value_size = loc->xl_entry->xe_value_size;
rc = ocfs2_xa_reuse_entry(loc, xi, ctxt);
if (rc)
goto out;
goto alloc_value;
}

if (!ocfs2_xattr_is_local(loc->xl_entry)) {
orig_clusters = ocfs2_xa_value_clusters(loc);
rc = ocfs2_xa_value_truncate(loc, 0, ctxt);
if (rc) {
mlog_errno(rc);
ocfs2_xa_cleanup_value_truncate(loc,
"overwriting",
orig_clusters);
goto out;
}
ocfs2_xa_wipe_namevalue(loc);
} else
ocfs2_xa_add_entry(loc, name_hash);
}
ocfs2_xa_wipe_namevalue(loc);

/*
* If we get here, we have a blank entry. Fill it. We grow our
Expand Down
29 changes: 29 additions & 0 deletions include/linux/memcontrol.h
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,19 @@ static inline bool mem_cgroup_disabled(void)
return !cgroup_subsys_enabled(memory_cgrp_subsys);
}

static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg,
bool in_low_reclaim)
{
if (mem_cgroup_disabled())
return 0;

if (in_low_reclaim)
return READ_ONCE(memcg->memory.emin);

return max(READ_ONCE(memcg->memory.emin),
READ_ONCE(memcg->memory.elow));
}

enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
struct mem_cgroup *memcg);

Expand Down Expand Up @@ -537,6 +550,8 @@ void mem_cgroup_handle_over_high(void);

unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg);

unsigned long mem_cgroup_size(struct mem_cgroup *memcg);

void mem_cgroup_print_oom_context(struct mem_cgroup *memcg,
struct task_struct *p);

Expand Down Expand Up @@ -829,6 +844,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
{
}

static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg,
bool in_low_reclaim)
{
return 0;
}

static inline enum mem_cgroup_protection mem_cgroup_protected(
struct mem_cgroup *root, struct mem_cgroup *memcg)
{
Expand Down Expand Up @@ -968,6 +989,11 @@ static inline unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg)
return 0;
}

static inline unsigned long mem_cgroup_size(struct mem_cgroup *memcg)
{
return 0;
}

static inline void
mem_cgroup_print_oom_context(struct mem_cgroup *memcg, struct task_struct *p)
{
Expand Down Expand Up @@ -1264,6 +1290,9 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct page *page,
static inline void mem_cgroup_track_foreign_dirty(struct page *page,
struct bdi_writeback *wb)
{
if (mem_cgroup_disabled())
return;

if (unlikely(&page->mem_cgroup->css != wb->memcg_css))
mem_cgroup_track_foreign_dirty_slowpath(page, wb);
}
Expand Down
4 changes: 4 additions & 0 deletions include/linux/slab.h
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,10 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
* kmalloc is the normal method of allocating memory
* for objects smaller than page size in the kernel.
*
* The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
* bytes. For @size of power of two bytes, the alignment is also guaranteed
* to be at least to the size.
*
* The @flags argument may be one of the GFP flags defined at
* include/linux/gfp.h and described at
* :ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>`
Expand Down
4 changes: 2 additions & 2 deletions kernel/fork.c
Original file line number Diff line number Diff line change
Expand Up @@ -2925,7 +2925,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
struct ctl_table t;
int ret;
int threads = max_threads;
int min = MIN_THREADS;
int min = 1;
int max = MAX_THREADS;

t = *table;
Expand All @@ -2937,7 +2937,7 @@ int sysctl_max_threads(struct ctl_table *table, int write,
if (ret || !write)
return ret;

set_max_threads(threads);
max_threads = threads;

return 0;
}
1 change: 1 addition & 0 deletions kernel/panic.c
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ void panic(const char *fmt, ...)
* after setting panic_cpu) from invoking panic() again.
*/
local_irq_disable();
preempt_disable_notrace();

/*
* It's possible to come here directly from a panic-assertion and
Expand Down
5 changes: 5 additions & 0 deletions mm/memcontrol.c
Original file line number Diff line number Diff line change
Expand Up @@ -1567,6 +1567,11 @@ unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg)
return max;
}

unsigned long mem_cgroup_size(struct mem_cgroup *memcg)
{
return page_counter_read(&memcg->memory);
}

static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
int order)
{
Expand Down
2 changes: 0 additions & 2 deletions mm/memremap.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@
#include <linux/xarray.h>

static DEFINE_XARRAY(pgmap_array);
#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)

#ifdef CONFIG_DEV_PAGEMAP_OPS
DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
Expand Down
8 changes: 7 additions & 1 deletion mm/page_alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1175,11 +1175,17 @@ static __always_inline bool free_pages_prepare(struct page *page,
debug_check_no_obj_freed(page_address(page),
PAGE_SIZE << order);
}
arch_free_page(page, order);
if (want_init_on_free())
kernel_init_free_pages(page, 1 << order);

kernel_poison_pages(page, 1 << order, 0);
/*
* arch_free_page() can make the page's contents inaccessible. s390
* does this. So nothing which can access the page's contents should
* happen after this.
*/
arch_free_page(page, order);

if (debug_pagealloc_enabled())
kernel_map_pages(page, 1 << order, 0);

Expand Down
2 changes: 1 addition & 1 deletion mm/shuffle.c
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ __meminit void page_alloc_shuffle(enum mm_shuffle_ctl ctl)
}

static bool shuffle_param;
extern int shuffle_show(char *buffer, const struct kernel_param *kp)
static int shuffle_show(char *buffer, const struct kernel_param *kp)
{
return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state)
? 'Y' : 'N');
Expand Down
19 changes: 16 additions & 3 deletions mm/slab_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -1030,10 +1030,19 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
unsigned int useroffset, unsigned int usersize)
{
int err;
unsigned int align = ARCH_KMALLOC_MINALIGN;

s->name = name;
s->size = s->object_size = size;
s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);

/*
* For power of two sizes, guarantee natural alignment for kmalloc
* caches, regardless of SL*B debugging options.
*/
if (is_power_of_2(size))
align = max(align, size);
s->align = calculate_alignment(flags, align, size);

s->useroffset = useroffset;
s->usersize = usersize;

Expand Down Expand Up @@ -1287,12 +1296,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
*/
void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
{
void *ret;
void *ret = NULL;
struct page *page;

flags |= __GFP_COMP;
page = alloc_pages(flags, order);
ret = page ? page_address(page) : NULL;
if (likely(page)) {
ret = page_address(page);
mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE,
1 << order);
}
ret = kasan_kmalloc_large(ret, size, flags);
/* As ret might get tagged, call kmemleak hook after KASAN. */
kmemleak_alloc(ret, size, 1, flags);
Expand Down
Loading

0 comments on commit eda57a0

Please sign in to comment.