Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge second set of updates from Andrew Morton:
 "More of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
  mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
  mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
  vmstat: Reduce time interval to stat update on idle cpu
  mm/page_owner.c: remove unnecessary stack_trace field
  Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files
  mm: incorporate read-only pages into transparent huge pages
  vmstat: do not use deferrable delayed work for vmstat_update
  mm: more aggressive page stealing for UNMOVABLE allocations
  mm: always steal split buddies in fallback allocations
  mm: when stealing freepages, also take pages created by splitting buddy page
  mincore: apply page table walker on do_mincore()
  mm: /proc/pid/clear_refs: avoid split_huge_page()
  mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
  mempolicy: apply page table walker on queue_pages_range()
  arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
  memcg: cleanup preparation for page table walk
  numa_maps: remove numa_maps->vma
  numa_maps: fix typo in gather_hugetbl_stats
  pagemap: use walk->vma instead of calling find_vma()
  clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
  ...
  • Loading branch information
Linus Torvalds committed Feb 12, 2015
2 parents d3f180e + 8138a67 commit 59d5373
Show file tree
Hide file tree
Showing 116 changed files with 2,491 additions and 1,717 deletions.
79 changes: 79 additions & 0 deletions Documentation/cgroups/unified-hierarchy.txt
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,85 @@ supported and the interface files "release_agent" and
- use_hierarchy is on by default and the cgroup file for the flag is
not created.

- The original lower boundary, the soft limit, is defined as a limit
that is per default unset. As a result, the set of cgroups that
global reclaim prefers is opt-in, rather than opt-out. The costs
for optimizing these mostly negative lookups are so high that the
implementation, despite its enormous size, does not even provide the
basic desirable behavior. First off, the soft limit has no
hierarchical meaning. All configured groups are organized in a
global rbtree and treated like equal peers, regardless where they
are located in the hierarchy. This makes subtree delegation
impossible. Second, the soft limit reclaim pass is so aggressive
that it not just introduces high allocation latencies into the
system, but also impacts system performance due to overreclaim, to
the point where the feature becomes self-defeating.

The memory.low boundary on the other hand is a top-down allocated
reserve. A cgroup enjoys reclaim protection when it and all its
ancestors are below their low boundaries, which makes delegation of
subtrees possible. Secondly, new cgroups have no reserve per
default and in the common case most cgroups are eligible for the
preferred reclaim pass. This allows the new low boundary to be
efficiently implemented with just a minor addition to the generic
reclaim code, without the need for out-of-band data structures and
reclaim passes. Because the generic reclaim code considers all
cgroups except for the ones running low in the preferred first
reclaim pass, overreclaim of individual groups is eliminated as
well, resulting in much better overall workload performance.

- The original high boundary, the hard limit, is defined as a strict
limit that can not budge, even if the OOM killer has to be called.
But this generally goes against the goal of making the most out of
the available memory. The memory consumption of workloads varies
during runtime, and that requires users to overcommit. But doing
that with a strict upper limit requires either a fairly accurate
prediction of the working set size or adding slack to the limit.
Since working set size estimation is hard and error prone, and
getting it wrong results in OOM kills, most users tend to err on the
side of a looser limit and end up wasting precious resources.

The memory.high boundary on the other hand can be set much more
conservatively. When hit, it throttles allocations by forcing them
into direct reclaim to work off the excess, but it never invokes the
OOM killer. As a result, a high boundary that is chosen too
aggressively will not terminate the processes, but instead it will
lead to gradual performance degradation. The user can monitor this
and make corrections until the minimal memory footprint that still
gives acceptable performance is found.

In extreme cases, with many concurrent allocations and a complete
breakdown of reclaim progress within the group, the high boundary
can be exceeded. But even then it's mostly better to satisfy the
allocation from the slack available in other groups or the rest of
the system than killing the group. Otherwise, memory.max is there
to limit this type of spillover and ultimately contain buggy or even
malicious applications.

- The original control file names are unwieldy and inconsistent in
many different ways. For example, the upper boundary hit count is
exported in the memory.failcnt file, but an OOM event count has to
be manually counted by listening to memory.oom_control events, and
lower boundary / soft limit events have to be counted by first
setting a threshold for that value and then counting those events.
Also, usage and limit files encode their units in the filename.
That makes the filenames very long, even though this is not
information that a user needs to be reminded of every time they type
out those names.

To address these naming issues, as well as to signal clearly that
the new interface carries a new configuration model, the naming
conventions in it necessarily differ from the old interface.

- The original limit files indicate the state of an unset limit with a
Very High Number, and a configured limit can be unset by echoing -1
into those files. But that very high number is implementation and
architecture dependent and not very descriptive. And while -1 can
be understood as an underflow into the highest possible value, -2 or
-10M etc. do not work, so it's not consistent.

memory.low, memory.high, and memory.max will use the string
"infinity" to indicate and set the highest possible value.

5. Planned Changes

Expand Down
23 changes: 23 additions & 0 deletions Documentation/filesystems/proc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Table of Contents
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
3.7 /proc/<pid>/task/<tid>/children - Information about task children
3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
3.9 /proc/<pid>/map_files - Information about memory mapped files

4 Configuring procfs
4.1 Mount options
Expand Down Expand Up @@ -1763,6 +1764,28 @@ pair provide additional information particular to the objects they represent.
with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
still exhibits timer's remaining time.

3.9 /proc/<pid>/map_files - Information about memory mapped files
---------------------------------------------------------------------
This directory contains symbolic links which represent memory mapped files
the process is maintaining. Example output:

| lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so
| lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so
| lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so
| ...
| lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1
| lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls

The name of a link represents the virtual memory bounds of a mapping, i.e.
vm_area_struct::vm_start-vm_area_struct::vm_end.

The main purpose of the map_files is to retrieve a set of memory mapped
files in a fast way instead of parsing /proc/<pid>/maps or
/proc/<pid>/smaps, both of which contain many more records. At the same
time one can open(2) mappings from the listings of two processes and
comparing their inode numbers to figure out which anonymous memory areas
are actually shared.

------------------------------------------------------------------------------
Configuring procfs
------------------------------------------------------------------------------
Expand Down
12 changes: 6 additions & 6 deletions Documentation/sysctl/vm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -555,12 +555,12 @@ this is causing problems for your system/application.

oom_dump_tasks

Enables a system-wide task dump (excluding kernel threads) to be
produced when the kernel performs an OOM-killing and includes such
information as pid, uid, tgid, vm size, rss, nr_ptes, swapents,
oom_score_adj score, and name. This is helpful to determine why the
OOM killer was invoked, to identify the rogue task that caused it,
and to determine why the OOM killer chose the task it did to kill.
Enables a system-wide task dump (excluding kernel threads) to be produced
when the kernel performs an OOM-killing and includes such information as
pid, uid, tgid, vm size, rss, nr_ptes, nr_pmds, swapents, oom_score_adj
score, and name. This is helpful to determine why the OOM killer was
invoked, to identify the rogue task that caused it, and to determine why
the OOM killer chose the task it did to kill.

If this is set to zero, this information is suppressed. On very
large systems with thousands of tasks it may not be feasible to dump
Expand Down
8 changes: 8 additions & 0 deletions Documentation/vm/pagemap.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ There are three components to pagemap:
20. NOPAGE
21. KSM
22. THP
23. BALLOON
24. ZERO_PAGE

Short descriptions to the page flags:

Expand Down Expand Up @@ -102,6 +104,12 @@ Short descriptions to the page flags:
22. THP
contiguous pages which construct transparent hugepages

23. BALLOON
balloon compaction page

24. ZERO_PAGE
zero page for pfn_zero or huge_zero page

[IO related page flags]
1. ERROR IO error occurred
3. UPTODATE page has up-to-date data
Expand Down
2 changes: 1 addition & 1 deletion arch/alpha/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ struct vm_area_struct;
#define PTRS_PER_PMD (1UL << (PAGE_SHIFT-3))
#define PTRS_PER_PGD (1UL << (PAGE_SHIFT-3))
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

/* Number of pointers that fit on a page: this will go away. */
#define PTRS_PER_PAGE (1UL << (PAGE_SHIFT-3))
Expand Down
2 changes: 1 addition & 1 deletion arch/arc/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@
* No special requirements for lowest virtual address we permit any user space
* mapping to be mapped at.
*/
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL


/****************************************************************
Expand Down
2 changes: 2 additions & 0 deletions arch/arm/include/asm/pgtable-2level.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
#ifndef _ASM_PGTABLE_2LEVEL_H
#define _ASM_PGTABLE_2LEVEL_H

#define __PAGETABLE_PMD_FOLDED

/*
* Hardware-wise, we have a two level page table structure, where the first
* level has 4096 entries, and the second level has 256 entries. Each entry
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/include/asm/pgtable-nommu.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ extern unsigned int kobjsize(const void *objp);
#define VMALLOC_START 0UL
#define VMALLOC_END 0xffffffffUL

#define FIRST_USER_ADDRESS (0)
#define FIRST_USER_ADDRESS 0UL

#include <asm-generic/pgtable.h>

Expand Down
6 changes: 0 additions & 6 deletions arch/arm/mm/hugetlbpage.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,6 @@
* of type casting from pmd_t * to pte_t *.
*/

struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
int write)
{
return ERR_PTR(-EINVAL);
}

int pud_huge(pud_t pud)
{
return 0;
Expand Down
4 changes: 4 additions & 0 deletions arch/arm/mm/pgd.c
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)

no_pte:
pmd_free(mm, new_pmd);
mm_dec_nr_pmds(mm);
no_pmd:
pud_free(mm, new_pud);
no_pud:
Expand Down Expand Up @@ -130,9 +131,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
pte = pmd_pgtable(*pmd);
pmd_clear(pmd);
pte_free(mm, pte);
atomic_long_dec(&mm->nr_ptes);
no_pmd:
pud_clear(pud);
pmd_free(mm, pmd);
mm_dec_nr_pmds(mm);
no_pud:
pgd_clear(pgd);
pud_free(mm, pud);
Expand All @@ -152,6 +155,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
pmd = pmd_offset(pud, 0);
pud_clear(pud);
pmd_free(mm, pmd);
mm_dec_nr_pmds(mm);
pgd_clear(pgd);
pud_free(mm, pud);
}
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@

#define vmemmap ((struct page *)(VMALLOC_END + SZ_64K))

#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

#ifndef __ASSEMBLY__
extern void __pte_error(const char *file, int line, unsigned long val);
Expand Down
6 changes: 0 additions & 6 deletions arch/arm64/mm/hugetlbpage.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
}
#endif

struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
int write)
{
return ERR_PTR(-EINVAL);
}

int pmd_huge(pmd_t pmd)
{
return !(pmd_val(pmd) & PMD_TABLE_BIT);
Expand Down
2 changes: 1 addition & 1 deletion arch/avr32/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
#define PGDIR_MASK (~(PGDIR_SIZE-1))

#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

#ifndef __ASSEMBLY__
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
Expand Down
2 changes: 1 addition & 1 deletion arch/cris/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ extern void paging_init(void);
*/

#define USER_PTRS_PER_PGD (TASK_SIZE/PGDIR_SIZE)
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

/* zero page used for uninitialized stuff */
#ifndef __ASSEMBLY__
Expand Down
2 changes: 1 addition & 1 deletion arch/frv/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ extern unsigned long empty_zero_page;
#define PTRS_PER_PTE 4096

#define USER_PGDS_IN_LAST_PML4 (TASK_SIZE / PGDIR_SIZE)
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

#define USER_PGD_PTRS (PAGE_OFFSET >> PGDIR_SHIFT)
#define KERNEL_PGD_PTRS (PTRS_PER_PGD - USER_PGD_PTRS)
Expand Down
2 changes: 1 addition & 1 deletion arch/hexagon/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ extern unsigned long _dflt_cache_att;
extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; /* located in head.S */

/* Seems to be zero even in architectures where the zero page is firewalled? */
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL
#define pte_special(pte) 0
#define pte_mkspecial(pte) (pte)

Expand Down
2 changes: 1 addition & 1 deletion arch/ia64/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@
#define PTRS_PER_PGD_SHIFT PTRS_PER_PTD_SHIFT
#define PTRS_PER_PGD (1UL << PTRS_PER_PGD_SHIFT)
#define USER_PTRS_PER_PGD (5*PTRS_PER_PGD/8) /* regions 0-4 are user regions */
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

/*
* All the normal masks have the "page accessed" bits on, as any time
Expand Down
6 changes: 0 additions & 6 deletions arch/ia64/mm/hugetlbpage.c
Original file line number Diff line number Diff line change
Expand Up @@ -114,12 +114,6 @@ int pud_huge(pud_t pud)
return 0;
}

struct page *
follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
{
return NULL;
}

void hugetlb_free_pgd_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
unsigned long floor, unsigned long ceiling)
Expand Down
2 changes: 1 addition & 1 deletion arch/m32r/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ extern unsigned long empty_zero_page[1024];
#define PGDIR_MASK (~(PGDIR_SIZE - 1))

#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

#ifndef __ASSEMBLY__
/* Just any arbitrary offset to the start of the vmalloc VM area: the
Expand Down
2 changes: 1 addition & 1 deletion arch/m68k/include/asm/pgtable_mm.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
#define PTRS_PER_PGD 128
#endif
#define USER_PTRS_PER_PGD (TASK_SIZE/PGDIR_SIZE)
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

/* Virtual address region for use by kernel_map() */
#ifdef CONFIG_SUN3
Expand Down
6 changes: 0 additions & 6 deletions arch/metag/mm/hugetlbpage.c
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
return 0;
}

struct page *follow_huge_addr(struct mm_struct *mm,
unsigned long address, int write)
{
return ERR_PTR(-EINVAL);
}

int pmd_huge(pmd_t pmd)
{
return pmd_page_shift(pmd) > PAGE_SHIFT;
Expand Down
4 changes: 3 additions & 1 deletion arch/microblaze/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ extern int mem_init_done;

#include <asm-generic/4level-fixup.h>

#define __PAGETABLE_PMD_FOLDED

#ifdef __KERNEL__
#ifndef __ASSEMBLY__

Expand All @@ -70,7 +72,7 @@ extern int mem_init_done;
#include <asm/mmu.h>
#include <asm/page.h>

#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

extern unsigned long va_to_phys(unsigned long address);
extern pte_t *va_to_pte(unsigned long address);
Expand Down
2 changes: 1 addition & 1 deletion arch/mips/include/asm/pgtable-32.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ extern int add_temporary_entry(unsigned long entrylo0, unsigned long entrylo1,
#define PTRS_PER_PTE ((PAGE_SIZE << PTE_ORDER) / sizeof(pte_t))

#define USER_PTRS_PER_PGD (0x80000000UL/PGDIR_SIZE)
#define FIRST_USER_ADDRESS 0
#define FIRST_USER_ADDRESS 0UL

#define VMALLOC_START MAP_BASE

Expand Down
8 changes: 3 additions & 5 deletions arch/mips/mm/gup.c
Original file line number Diff line number Diff line change
Expand Up @@ -301,11 +301,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
start += nr << PAGE_SHIFT;
pages += nr;

down_read(&mm->mmap_sem);
ret = get_user_pages(current, mm, start,
(end - start) >> PAGE_SHIFT,
write, 0, pages, NULL);
up_read(&mm->mmap_sem);
ret = get_user_pages_unlocked(current, mm, start,
(end - start) >> PAGE_SHIFT,
write, 0, pages);

/* Have to be a bit careful with return values */
if (nr > 0) {
Expand Down
Loading

0 comments on commit 59d5373

Please sign in to comment.