From b985194c8c0a130ed155b71662e39f7eaea4876f Mon Sep 17 00:00:00 2001 From: Chen Yucong Date: Thu, 22 May 2014 11:54:15 -0700 Subject: [PATCH 1/9] hwpoison, hugetlb: lock_page/unlock_page does not match for handling a free hugepage For handling a free hugepage in memory failure, the race will happen if another thread hwpoisoned this hugepage concurrently. So we need to check PageHWPoison instead of !PageHWPoison. If hwpoison_filter(p) returns true or a race happens, then we need to unlock_page(hpage). Signed-off-by: Chen Yucong Reviewed-by: Naoya Horiguchi Tested-by: Naoya Horiguchi Reviewed-by: Andi Kleen Cc: [2.6.36+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 35ef28acf137c..dbf8922216ade 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1081,15 +1081,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags) return 0; } else if (PageHuge(hpage)) { /* - * Check "just unpoisoned", "filter hit", and - * "race with other subpage." + * Check "filter hit" and "race with other subpage." */ lock_page(hpage); - if (!PageHWPoison(hpage) - || (hwpoison_filter(p) && TestClearPageHWPoison(p)) - || (p != hpage && TestSetPageHWPoison(hpage))) { - atomic_long_sub(nr_pages, &num_poisoned_pages); - return 0; + if (PageHWPoison(hpage)) { + if ((hwpoison_filter(p) && TestClearPageHWPoison(p)) + || (p != hpage && TestSetPageHWPoison(hpage))) { + atomic_long_sub(nr_pages, &num_poisoned_pages); + unlock_page(hpage); + return 0; + } } set_page_hwpoison_huge_page(hpage); res = dequeue_hwpoisoned_huge_page(hpage); From 7fcbbaf18392f0b17c95e2f033c8ccf87eecde1d Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 22 May 2014 11:54:16 -0700 Subject: [PATCH 2/9] mm/filemap.c: avoid always dirtying mapping->flags on O_DIRECT In some testing I ran today (some fio jobs that spread over two nodes), we end up spending 40% of the time in filemap_check_errors(). That smells fishy. Looking further, this is basically what happens: blkdev_aio_read() generic_file_aio_read() filemap_write_and_wait_range() if (!mapping->nr_pages) filemap_check_errors() and filemap_check_errors() always attempts two test_and_clear_bit() on the mapping flags, thus dirtying it for every single invocation. The patch below tests each of these bits before clearing them, avoiding this issue. In my test case (4-socket box), performance went from 1.7M IOPS to 4.0M IOPS. Signed-off-by: Jens Axboe Acked-by: Jeff Moyer Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/filemap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 000a220e2a414..088358c8006bb 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -257,9 +257,11 @@ static int filemap_check_errors(struct address_space *mapping) { int ret = 0; /* Check for outstanding write errors */ - if (test_and_clear_bit(AS_ENOSPC, &mapping->flags)) + if (test_bit(AS_ENOSPC, &mapping->flags) && + test_and_clear_bit(AS_ENOSPC, &mapping->flags)) ret = -ENOSPC; - if (test_and_clear_bit(AS_EIO, &mapping->flags)) + if (test_bit(AS_EIO, &mapping->flags) && + test_and_clear_bit(AS_EIO, &mapping->flags)) ret = -EIO; return ret; } From 55231e5c898c5c03c14194001e349f40f59bd300 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Thu, 22 May 2014 11:54:17 -0700 Subject: [PATCH 3/9] mm: madvise: fix MADV_WILLNEED on shmem swapouts MADV_WILLNEED currently does not read swapped out shmem pages back in. Commit 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees") made find_get_page() filter exceptional radix tree entries but failed to convert all find_get_page() callers that WANT exceptional entries over to find_get_entry(). One of them is shmem swap readahead in madvise, which now skips over any swap-out records. Convert it to find_get_entry(). Fixes: 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees") Signed-off-by: Johannes Weiner Reported-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/madvise.c b/mm/madvise.c index 539eeb96b323b..a402f8fdc68e9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -195,7 +195,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, for (; start < end; start += PAGE_SIZE) { index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; - page = find_get_page(mapping, index); + page = find_get_entry(mapping, index); if (!radix_tree_exceptional_entry(page)) { if (page) page_cache_release(page); From 6f6acb00514c10be35529402f36ad7a288f08c2e Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 22 May 2014 11:54:19 -0700 Subject: [PATCH 4/9] memcg: fix swapcache charge from kernel thread context Commit 284f39afeaa4 ("mm: memcg: push !mm handling out to page cache charge function") explicitly checks for page cache charges without any mm context (from kernel thread context[1]). This seemed to be the only possible case where memory could be charged without mm context so commit 03583f1a631c ("memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm()") removed the mm check from get_mem_cgroup_from_mm(). This however caused another NULL ptr dereference during early boot when loopback kernel thread splices to tmpfs as reported by Stephan Kulow: BUG: unable to handle kernel NULL pointer dereference at 0000000000000360 IP: get_mem_cgroup_from_mm.isra.42+0x2b/0x60 Oops: 0000 [#1] SMP Modules linked in: btrfs dm_multipath dm_mod scsi_dh multipath raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod parport_pc parport nls_utf8 isofs usb_storage iscsi_ibft iscsi_boot_sysfs arc4 ecb fan thermal nfs lockd fscache nls_iso8859_1 nls_cp437 sg st hid_generic usbhid af_packet sunrpc sr_mod cdrom ata_generic uhci_hcd virtio_net virtio_blk ehci_hcd usbcore ata_piix floppy processor button usb_common virtio_pci virtio_ring virtio edd squashfs loop ppa] CPU: 0 PID: 97 Comm: loop1 Not tainted 3.15.0-rc5-5-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: __mem_cgroup_try_charge_swapin+0x40/0xe0 mem_cgroup_charge_file+0x8b/0xd0 shmem_getpage_gfp+0x66b/0x7b0 shmem_file_splice_read+0x18f/0x430 splice_direct_to_actor+0xa2/0x1c0 do_lo_receive+0x5a/0x60 [loop] loop_thread+0x298/0x720 [loop] kthread+0xc6/0xe0 ret_from_fork+0x7c/0xb0 Also Branimir Maksimovic reported the following oops which is tiggered for the swapcache charge path from the accounting code for kernel threads: CPU: 1 PID: 160 Comm: kworker/u8:5 Tainted: P OE 3.15.0-rc5-core2-custom #159 Hardware name: System manufacturer System Product Name/MAXIMUSV GENE, BIOS 1903 08/19/2013 task: ffff880404e349b0 ti: ffff88040486a000 task.ti: ffff88040486a000 RIP: get_mem_cgroup_from_mm.isra.42+0x2b/0x60 Call Trace: __mem_cgroup_try_charge_swapin+0x45/0xf0 mem_cgroup_charge_file+0x9c/0xe0 shmem_getpage_gfp+0x62c/0x770 shmem_write_begin+0x38/0x40 generic_perform_write+0xc5/0x1c0 __generic_file_aio_write+0x1d1/0x3f0 generic_file_aio_write+0x4f/0xc0 do_sync_write+0x5a/0x90 do_acct_process+0x4b1/0x550 acct_process+0x6d/0xa0 do_exit+0x827/0xa70 kthread+0xc3/0xf0 This patch fixes the issue by reintroducing mm check into get_mem_cgroup_from_mm. We could do the same trick in __mem_cgroup_try_charge_swapin as we do for the regular page cache path but it is not worth troubles. The check is not that expensive and it is better to have get_mem_cgroup_from_mm more robust. [1] - http://marc.info/?l=linux-mm&m=139463617808941&w=2 Fixes: 03583f1a631c ("memcg: remove unnecessary !mm check from try_get_mem_cgroup_from_mm()") Reported-and-tested-by: Stephan Kulow Reported-by: Branimir Maksimovic Signed-off-by: Michal Hocko Acked-by: Johannes Weiner Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c47dffdcb246b..5177c6d4a2ddb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1077,9 +1077,18 @@ static struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) rcu_read_lock(); do { - memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); - if (unlikely(!memcg)) + /* + * Page cache insertions can happen withou an + * actual mm context, e.g. during disk probing + * on boot, loopback IO, acct() writes etc. + */ + if (unlikely(!mm)) memcg = root_mem_cgroup; + else { + memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); + if (unlikely(!memcg)) + memcg = root_mem_cgroup; + } } while (!css_tryget(&memcg->css)); rcu_read_unlock(); return memcg; @@ -3958,17 +3967,9 @@ int mem_cgroup_charge_file(struct page *page, struct mm_struct *mm, return 0; } - /* - * Page cache insertions can happen without an actual mm - * context, e.g. during disk probing on boot. - */ - if (unlikely(!mm)) - memcg = root_mem_cgroup; - else { - memcg = mem_cgroup_try_charge_mm(mm, gfp_mask, 1, true); - if (!memcg) - return -ENOMEM; - } + memcg = mem_cgroup_try_charge_mm(mm, gfp_mask, 1, true); + if (!memcg) + return -ENOMEM; __mem_cgroup_commit_charge(memcg, page, 1, type, false); return 0; } From ad0f614e4723db8cead439cf414108cbf975b224 Mon Sep 17 00:00:00 2001 From: Masatake YAMATO Date: Thu, 22 May 2014 11:54:20 -0700 Subject: [PATCH 5/9] wait: swap EXIT_ZOMBIE(Z) and EXIT_DEAD(X) chars in TASK_STATE_TO_CHAR_STR In commit ad86622b478e ("wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space") the order of task state definitions were changed: EXIT_DEAD and EXIT_ZOMBIE were swapped. Though the charterers for the states in TASK_STATE_TO_CHAR_STR string were not updated. This patch synchronizes the string to the order of definitions. Signed-off-by: Masatake YAMATO Cc: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 25f54c79f7577..21fbdae61b9e6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -220,7 +220,7 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq); #define TASK_PARKED 512 #define TASK_STATE_MAX 1024 -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP" +#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP" extern char ___assert_task_state[1 - 2*!!( sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)]; From 3e030ecc0fc7de10fd0da10c1c19939872a31717 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 22 May 2014 11:54:21 -0700 Subject: [PATCH 6/9] mm/memory-failure.c: fix memory leak by race between poison and unpoison When a memory error happens on an in-use page or (free and in-use) hugepage, the victim page is isolated with its refcount set to one. When you try to unpoison it later, unpoison_memory() calls put_page() for it twice in order to bring the page back to free page pool (buddy or free hugepage list). However, if another memory error occurs on the page which we are unpoisoning, memory_failure() returns without releasing the refcount which was incremented in the same call at first, which results in memory leak and unconsistent num_poisoned_pages statistics. This patch fixes it. Signed-off-by: Naoya Horiguchi Cc: Andi Kleen Cc: [2.6.32+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory-failure.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index dbf8922216ade..9ccef39a9de26 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1153,6 +1153,8 @@ int memory_failure(unsigned long pfn, int trapno, int flags) */ if (!PageHWPoison(p)) { printk(KERN_ERR "MCE %#lx: just unpoisoned\n", pfn); + atomic_long_sub(nr_pages, &num_poisoned_pages); + put_page(hpage); res = 0; goto out; } From 66db6cfd49825f4e3462ceee3bcbb292d57da6fb Mon Sep 17 00:00:00 2001 From: Joseph Qi Date: Thu, 22 May 2014 11:54:22 -0700 Subject: [PATCH 7/9] ocfs2: fix double kmem_cache_destroy in dlm_init In dlm_init, if create dlm_lockname_cache failed in dlm_init_master_caches, it will destroy dlm_lockres_cache which created before twice. And this will cause system die when loading modules. Signed-off-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ocfs2/dlm/dlmmaster.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index af3f7aa73e13a..ee1f88419cb06 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -472,11 +472,15 @@ int dlm_init_master_caches(void) void dlm_destroy_master_caches(void) { - if (dlm_lockname_cache) + if (dlm_lockname_cache) { kmem_cache_destroy(dlm_lockname_cache); + dlm_lockname_cache = NULL; + } - if (dlm_lockres_cache) + if (dlm_lockres_cache) { kmem_cache_destroy(dlm_lockres_cache); + dlm_lockres_cache = NULL; + } } static void dlm_lockres_release(struct kref *kref) From e60cbeedc48d80689c249ab5dcc3c31ad0452dea Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 22 May 2014 11:54:23 -0700 Subject: [PATCH 8/9] Documentation: fix DOCBOOKS=... building Prior to commit 4266129964b8 ("[media] DocBook: Move all media docbook stuff into its own directory") it was possible to build only a single (or more) book(s) by calling, for example make htmldocs DOCBOOKS=80211.xml This now fails: cp: target `.../Documentation/DocBook//media_api' is not a directory Ignore errors from that copy to make this possible again. Fixes: 4266129964b8 ("[media] DocBook: Move all media docbook stuff into its own directory") Signed-off-by: Johannes Berg Acked-by: Randy Dunlap Cc: Mauro Carvalho Chehab Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/DocBook/media/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/DocBook/media/Makefile b/Documentation/DocBook/media/Makefile index f9fd615427fbd..1d27f0a1abd1e 100644 --- a/Documentation/DocBook/media/Makefile +++ b/Documentation/DocBook/media/Makefile @@ -195,7 +195,7 @@ DVB_DOCUMENTED = \ # install_media_images = \ - $(Q)cp $(OBJIMGFILES) $(MEDIA_SRC_DIR)/v4l/*.svg $(MEDIA_OBJ_DIR)/media_api + $(Q)-cp $(OBJIMGFILES) $(MEDIA_SRC_DIR)/v4l/*.svg $(MEDIA_OBJ_DIR)/media_api $(MEDIA_OBJ_DIR)/%: $(MEDIA_SRC_DIR)/%.b64 $(Q)base64 -d $< >$@ From 0d9327ab70038ac8c7af6e20456578ab80158f2d Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Thu, 22 May 2014 11:54:24 -0700 Subject: [PATCH 9/9] MAINTAINERS: add closing angle bracket to Vince Bridgers' email address Signed-off-by: Tobias Klauser Cc: Vince Bridgers Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6846c7c622e33..c47d2683bd6bd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -537,7 +537,7 @@ L: linux-alpha@vger.kernel.org F: arch/alpha/ ALTERA TRIPLE SPEED ETHERNET DRIVER -M: Vince Bridgers L: netdev@vger.kernel.org L: nios2-dev@lists.rocketboards.org (moderated for non-subscribers) S: Maintained