asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2012-12-27	libceph: fix protocol feature mismatch failure path	Sage Weil
	We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-12-27	libceph: WARN, don't BUG on unexpected connection states	Alex Elder
	A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-27	libceph: always reset osds when kicking	Alex Elder
	When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-27	libceph: move linger requests sooner in kick_requests()	Alex Elder
	The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for completed linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request before it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and that will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-28	f2fs: handle error from f2fs_iget_nowait	Namjae Jeon
	In case f2fs_iget_nowait returns error, it results in truncate_hole being called with 'error' value as inode pointer. There is no check in truncate_hole for valid inode, so it could result in crash due "invalid access to memory". Avoid this by handling error condition properly. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-28	f2fs: fix equation of has_not_enough_free_secs()	Jaegeuk Kim
	Practically, has_not_enough_free_secs() should calculate with the numbers of current node and directory data blocks together. Actually the equation was implemented in need_to_flush(). So, this patch removes need_flush() and moves the equation into has_not_enough_free_secs(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-28	f2fs: add MAINTAINERS entry	Jaegeuk Kim
	This patch adds myself to MAINTAINERS entry for the f2fs file system. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-28	f2fs: return a default value for non-void function	Jaegeuk Kim
	This patch resolves a build warning reported by kbuild test robot. " fs/f2fs/segment.c: In function '__get_segment_type': fs/f2fs/segment.c:806:1: warning: control reaches end of non-void function [-Wreturn-type] " Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-28	f2fs: invalidate the node page if allocation is failed	Jaegeuk Kim
	The new_node_page() is processed as the following procedure. 1. A new node page is allocated. 2. Set PageUptodate with proper footer information. 3. Check if there is a free space for allocation 4.a. If there is no space, f2fs returns with -ENOSPC. 4.b. Otherwise, go next. In the case of step #4.a, f2fs remains a wrong node page in the page cache with the uptodate flag. Also, even though a new node page is allocated successfully, an error can be occurred afterwards due to allocation failure of the other data structures. In such a case, remove_inode_page() would be triggered, so that we have to clear uptodate flag in truncate_node() too. So, we should remove the uptodate flag, if allocation is failed. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-28	f2fs: add missing #include <linux/prefetch.h>	Geert Uytterhoeven
	m68k allmodconfig: fs/f2fs/data.c: In function ‘read_end_io’: fs/f2fs/data.c:311: error: implicit declaration of function ‘prefetchw’ fs/f2fs/segment.c: In function ‘f2fs_end_io_write’: fs/f2fs/segment.c:628: error: implicit declaration of function ‘prefetchw’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-27	Merge tag 'hwmon-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Report i2c errors to userspace in lm73 driver - Fix problem with DIV_ROUND_CLOSEST and unsigned divisors in emc6w201 driver * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (emc6w201) Fix DIV_ROUND_CLOSEST problem with unsigned divisors hwmon: (lm73} Detect and report i2c bus errors
2012-12-27	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This tree includes two bug fixes for problems Oleg spotted on his review of the recent pid namespace work. A small fix to not enable bottom halves with irqs disabled, and a trivial build fix for f2fs with user namespaces enabled." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: f2fs: Don't assign e_id in f2fs_acl_from_disk proc: Allow proc_free_inum to be called from any context pidns: Stop pid allocation when init dies pidns: Outlaw thread creation after unshare(CLONE_NEWPID)
2012-12-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) GRE tunnel drivers don't set the transport header properly, they also blindly deref the inner protocol ipv4 and needs some checks. Fixes from Isaku Yamahata. 2) Fix sleeps while atomic in netdevice rename code, from Eric Dumazet. 3) Fix double-spinlock in solos-pci driver, from Dan Carpenter. 4) More ARP bug fixes. Fix lockdep splat in arp_solicit() and then the bug accidentally added by that fix. From Eric Dumazet and Cong Wang. 5) Remove some __dev* annotations that slipped back in, as well as all HOTPLUG references. From Greg KH 6) RDS protocol uses wrong interfaces to access scatter-gather elements, causing a regression. From Mike Marciniszyn. 7) Fix build error in cpts driver, from Richard Cochran. 8) Fix arithmetic in packet scheduler, from Stefan Hasko. 9) Similarly, fix association during calculation of random backoff in batman-adv. From Akinobu Mita. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) ipv6/ip6_gre: set transport header correctly ipv4/ip_gre: set transport header correctly to gre header IB/rds: suppress incompatible protocol when version is known IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len net/vxlan: Use the underlying device index when joining/leaving multicast groups tcp: should drop incoming frames without ACK flag set netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled cpts: fix a run time warn_on. cpts: fix build error by removing useless code. batman-adv: fix random jitter calculation arp: fix a regression in arp_solicit() net: sched: integer overflow fix CONFIG_HOTPLUG removal from networking core Drivers: network: more __dev* removal bridge: call br_netpoll_disable in br_add_if ipv4: arp: fix a lockdep splat in arp_solicit() tuntap: dont use a private kmem_cache net: devnet_rename_seq should be a seqcount ip_gre: fix possible use after free ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally ...
2012-12-27	ext4: avoid hang when mounting non-journal filesystems with orphan list	Theodore Ts'o
	When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
2012-12-27	ext4: lock i_mutex when truncating orphan inodes	Theodore Ts'o
	Commit c278531d39 added a warning when ext4_flush_unwritten_io() is called without i_mutex being taken. It had previously not been taken during orphan cleanup since races weren't possible at that point in the mount process, but as a result of this c278531d39, we will now see a kernel WARN_ON in this case. Take the i_mutex in ext4_orphan_cleanup() to suppress this warning. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
2012-12-26	ipv6/ip6_gre: set transport header correctly	Isaku Yamahata
	ip6gre_xmit2() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. Set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	ipv4/ip_gre: set transport header correctly to gre header	Isaku Yamahata
	ipgre_tunnel_xmit() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. So set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	IB/rds: suppress incompatible protocol when version is known	Marciniszyn, Mike
	Add an else to only print the incompatible protocol message when version hasn't been established. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len	Marciniszyn, Mike
	0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs") added uses of sg_dma_len() and sg_dma_address(). This makes RDS DOA with the qib driver. IB ulps should use ib_sg_dma_len() and ib_sg_dma_address respectively since some HCAs overload ib_sg_dma* operations. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	net/vxlan: Use the underlying device index when joining/leaving multicast groups	Yan Burman
	The socket calls from vxlan to join/leave multicast group aren't using the index of the underlying device, as a result the stack uses the first interface that is up. This results in vxlan being non functional over a device which isn't the 1st to be up. Fix this by providing the iflink field to the vxlan instance to the multicast calls. Signed-off-by: Yan Burman <yanb@mellanox.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	tcp: should drop incoming frames without ACK flag set	Eric Dumazet
	In commit 96e0bf4b5193d (tcp: Discard segments that ack data not yet sent) John Dykstra enforced a check against ack sequences. In commit 354e4aa391ed5 (tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation) I added more safety tests. But we missed fact that these tests are not performed if ACK bit is not set. RFC 793 3.9 mandates TCP should drop a frame without ACK flag set. " fifth check the ACK field, if the ACK bit is off drop the segment and return" Not doing so permits an attacker to only guess an acceptable sequence number, evading stronger checks. Many thanks to Zhiyun Qian for bringing this issue to our attention. See : http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf Reported-by: Zhiyun Qian <zhiyunq@umich.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED	Christoffer Dall
	Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and (PageHead) is true, for tail pages. If this is indeed the intended behavior, which I doubt because it breaks cache cleaning on some ARM systems, then the nomenclature is highly problematic. This patch makes sure PageHead is only true for head pages and PageTail is only true for tail pages, and neither is true for non-compound pages. [ This buglet seems ancient - seems to have been introduced back in Apr 2008 in commit 6a1e7f777f61: "pageflags: convert to the use of new macros". And the reason nobody noticed is because the PageHead() tests are almost all about just sanity-checking, and only used on pages that are actual page heads. The fact that the old code returned true for tail pages too was thus not really noticeable. - Linus ] Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Steve Capper <Steve.Capper@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: stable@kernel.org # 2.6.26+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-26	netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled	Li Zefan
	sock->sk_cgrp_prioidx won't be used at all if CONFIG_NETPRIO_CGROUP=n. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	cpts: fix a run time warn_on.	Richard Cochran
	This patch fixes a warning in clk_enable by calling clk_prepare_enable instead. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	cpts: fix build error by removing useless code.	Richard Cochran
	The cpts driver tries to obtain the input clock frequency by calling the clock's internal 'recalc' method. Since <plat/clock.h> has been removed, this code can no longer compile. However, the driver never makes use of the frequency value, so this patch fixes the issue by removing the offending code altogether. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	batman-adv: fix random jitter calculation	Akinobu Mita
	batadv_iv_ogm_emit_send_time() attempts to calculates a random integer in the range of 'orig_interval +- BATADV_JITTER' by the below lines. msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER; msecs += (random32() % 2 * BATADV_JITTER); But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER' because '%' and '*' have same precedence and associativity is left-to-right. This adds the parentheses at the appropriate position so that it matches original intension. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Cc: Marek Lindner <lindner_marek@yahoo.de> Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Cc: Antonio Quartulli <ordex@autistici.org> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26	PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz	Andy Lutomirski
	Otherwise it fails like this on cards like the Transcend 16GB SDHC card: mmc0: new SDHC card at address b368 mmcblk0: mmc0:b368 SDC 15.0 GiB mmcblk0: error -110 sending status command, retrying mmcblk0: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb0 Tested on my Lenovo x200 laptop. [bhelgaas: changelog] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Chris Ball <cjb@laptop.org> CC: Manoj Iyer <manoj.iyer@canonical.com> CC: stable@vger.kernel.org
2012-12-26	PCI/PM: Do not suspend port if any subordinate device needs PME polling	Huang Ying
	Ulrich reported that his USB3 cardreader does not work reliably when connected to the USB3 port. It turns out that USB3 controller failed to awaken when plugging in the USB3 cardreader. Further experiments found that the USB3 host controller can only be awakened via polling, not via PME interrupt. But if the PCIe port to which the USB3 host controller is connected is suspended, we cannot poll the controller because its config space is not accessible when the PCIe port is in a low power state. To solve the issue, the PCIe port will not be suspended if any subordinate device needs PME polling. [bhelgaas: use bool consistently rather than mixing int/bool] Reference: http://lkml.kernel.org/r/50841CCC.9030809@uli-eckhardt.de Reported-by: Ulrich Eckhardt <usb@uli-eckhardt.de> Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> CC: stable@vger.kernel.org # v3.6+
2012-12-26	PCI: Add PCIe Link Capability link speed and width names	Bjorn Helgaas
	Add standard #defines for the Supported Link Speeds field in the PCIe Link Capabilities register. Note that prior to PCIe spec r3.0, these encodings were defined: 0001b 2.5GT/s Link speed supported 0010b 5.0GT/s and 2.5GT/s Link speed supported Starting with spec r3.0, these encodings refer to bits 0 and 1 in the Supported Link Speeds Vector in the Link Capabilities 2 register, and bits 0 and 1 there mean 2.5 GT/s and 5.0 GT/s, respectively. Therefore, code that followed r2.0 and interpreted 0x1 as 2.5GT/s and 0x2 as 5.0GT/s will continue to work, and we can identify a device using the new encodings because it will have a non-zero Link Capabilities 2 register. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-12-26	PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)	Myron Stowe
	Commit 284f5f9 was intended to disable the "only_one_child()" optimization on Stratus ftServer systems, but its DMI check is wrong. It looks for DMI_SYS_VENDOR that contains "ftServer", when it should look for DMI_SYS_VENDOR containing "Stratus" and DMI_PRODUCT_NAME containing "ftServer". Tested on Stratus ftServer 6400. Reported-by: Fadeeva Marina <astarta@rat.ru> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=51331 Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.5+
2012-12-26	PCI: Remove spurious error for sriov_numvfs store and simplify flow	Bjorn Helgaas
	If we request "num_vfs" and the driver's sriov_configure() method enables exactly that number ("num_vfs_enabled"), we complain "Invalid value for number of VFs to enable" and return an error. We should silently return success instead. Also, use kstrtou16() since numVFs is defined to be a 16-bit field and rework to simplify control flow. Reported-by: Greg Rose <gregory.v.rose@intel.com> Reference: http://lkml.kernel.org/r/20121214101911.00002f59@unknown Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Donald Dutile <ddutile@redhat.com>
2012-12-25	f2fs: Don't assign e_id in f2fs_acl_from_disk	Eric W. Biederman
	With user namespaces enabled building f2fs fails with: CC fs/f2fs/acl.o fs/f2fs/acl.c: In function ‘f2fs_acl_from_disk’: fs/f2fs/acl.c:85:21: error: ‘struct posix_acl_entry’ has no member named ‘e_id’ make[2]: *** [fs/f2fs/acl.o] Error 1 make[2]: Target `__build' not remade because of errors. e_id is a backwards compatibility field only used for file systems that haven't been converted to use kuids and kgids. When the posix acl tag field is neither ACL_USER nor ACL_GROUP assigning e_id is unnecessary. Remove the assignment so f2fs will build with user namespaces enabled. Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-26	f2fs: do f2fs_balance_fs in front of dir operations	Jaegeuk Kim
	In order to conserve free sections to deal with the worst-case scenarios, f2fs should be able to freeze all the directory operations especially when there are not enough free sections. The f2fs_balance_fs() is for this use. When FS utilization becomes almost 100%, directory operations can be failed due to -ENOSPC frequently, which produces some dirty node pages occasionally. Previously, in such a case, f2fs_balance_fs() is not able to be triggered since it is triggered only if the directory operation ends up with success. So, this patch triggers f2fs_balance_fs() at first before handling directory operations. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-26	f2fs: should recover orphan and fsync data	Jaegeuk Kim
	The recovery routine should do all the time regardless of normal umount action. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-26	f2fs: fix handling errors got by f2fs_write_inode	Jaegeuk Kim
	Ruslan reported that f2fs hangs with an infinite loop in f2fs_sync_file(): while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0) f2fs_write_inode(inode, NULL); The reason was revealed that the cold flag is not set even thought this inode is a normal file. Therefore, sync_node_pages() skips to write node blocks since it only writes cold node blocks. The cold flag is stored to the node_footer in node block, and whenever a new node page is allocated, it is set according to its file type, file or directory. But, after sudden-power-off, when recovering the inode page, f2fs doesn't recover its cold flag. So, let's assign the cold flag in more right places. One more thing: If f2fs_write_inode() returns an error due to whatever situations, there would be no dirty node pages so that sync_node_pages() returns zero. (i.e., zero means nothing was written.) Reported-by: Ruslan N. Marchenko <me@ruff.mobi> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-26	f2fs: fix up f2fs_get_parent issue to retrieve correct parent inode number	Namjae Jeon
	Test Case: [NFS Client] ls -lR . [NFS Server] while [ 1 ] do echo 3 > /proc/sys/vm/drop_caches done Error on NFS Client: "No such file or directory" When cache is dropped at the server, it results in lookup failure at the NFS client due to non-connection with the parent. The default path is it initiates a lookup by calculating the hash value for the name, even though the hash values stored on the disk for "." and ".." is maintained as zero, which results in failure from find_in_block due to not matching HASH values. Fix up, by using the correct hashing values for these entries. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-26	f2fs: fix wrong calculation on f_files in statfs	Jaegeuk Kim
	In f2fs_statfs(), f_files should be the total number of available inodes instead of the currently allocated inodes. So, this patch should resolve the reported bug below. Note that, showing 10% usage is not a bug, since f2fs reveals whole volume size as much as possible and shows the space overhead as used. This policy is fair enough with respect to other file systems. <Reported Bug> (loop0 is backed by 1GiB file) $ mkfs.f2fs /dev/loop0 F2FS-tools: Ver: 1.1.0 (2012-12-11) Info: sector size = 512 Info: total sectors = 2097152 (in 512bytes) Info: zone aligned segment0 blkaddr: 512 Info: format successful $ mount /dev/loop0 mnt/ $ df mnt/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop0 1046528 98312 929784 10% /home/zeta/linux-devel/mtd-bench/mnt $ df mnt/ -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/loop0 1 -465918 465919 - /home/zeta/linux-devel/mtd-bench/mnt Notice IUsed is negative. Also, 10% usage on a fresh f2fs seems too much to be correct. Reported-and-Tested-by: Ezequiel Garcia <elezegarcia@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-26	f2fs: remove set_page_dirty for atomic f2fs_end_io_write	Jaegeuk Kim
	We should guarantee not to do scheduling while atomic. I found, in atomic f2fs_end_io_write(), there is a set_page_dirty() call to deal with IO errors. But, set_page_dirty() calls: -> f2fs_set_data_page_dirty() -> set_dirty_dir_page() -> cond_resched() which results in scheduling. In order to avoid this, I'd like to remove simply set_page_dirty(), since the page is already marked as ERROR and f2fs will be operated as the read-only mode as well. So, there is no recovery issue with this. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-26	ARM: ux500: add pinctrl address resources	Fabio Baltieri
	Current nmk_pinctrl driver is not PRCMU dependent anymore, so it needs its own DT address resources to work properly, as done for platform_device in: f482833 ARM: ux500: add PRCM register base for pinctrl Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-26	pinctrl: nomadik: return if prcm_base is NULL	Fabio Baltieri
	This patch adds a check for npct->prcm_base to make sure that the address is not NULL before using it, as the driver was made capable of loading even without a proper memory resource in: f1671bf pinctrl/nomadik: make independent of prcmu driver Also, refuses to probe without prcm_base on anything else than nomadik. This solves the following crash, introduced during the merge window when booting on U8500 with device tree: pinctrl-nomadik pinctrl-db8500: No PRCM base, assume no ALT-Cx control is available Unable to handle kernel NULL pointer dereference at virtual address 00000138 pgd = c0004000 [00000138] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 Not tainted (3.7.0-02892-g1ebaf4f #631) PC is at nmk_pmx_enable+0x1bc/0x4d0 LR is at clk_disable+0x40/0x44 [snip] [<c01d5e50>] (nmk_pmx_enable+0x1bc/0x4d0) from [<c01d3ba8>] (pinmux_enable_setting+0x12c/0x1ec) [<c01d3ba8>] (pinmux_enable_setting+0x12c/0x1ec) from [<c01d1dc8>] (pinctrl_select_state_locked+0xfc/0x134) [<c01d1dc8>] (pinctrl_select_state_locked+0xfc/0x134) from [<c01d2814>] (pinctrl_register+0x26c/0x43c) [<c01d2814>] (pinctrl_register+0x26c/0x43c) from [<c01d668c>] (nmk_pinctrl_probe+0x114/0x238) [<c01d668c>] (nmk_pinctrl_probe+0x114/0x238) from [<c0211cc4>] (platform_drv_probe+0x28/0x2c) [<c0211cc4>] (platform_drv_probe+0x28/0x2c) from [<c0210738>] (driver_probe_device+0x84/0x21c) [<c0210738>] (driver_probe_device+0x84/0x21c) from [<c02109c0>] (__device_attach+0x50/0x54) [<c02109c0>] (__device_attach+0x50/0x54) from [<c020eb1c>] (bus_for_each_drv+0x54/0x9c) [<c020eb1c>] (bus_for_each_drv+0x54/0x9c) from [<c0210668>] (device_attach+0x84/0x9c) [<c0210668>] (device_attach+0x84/0x9c) from [<c020fbac>] (bus_probe_device+0x94/0xb8) [<c020fbac>] (bus_probe_device+0x94/0xb8) from [<c020e084>] (device_add+0x4f0/0x5bc) [<c020e084>] (device_add+0x4f0/0x5bc) from [<c0276400>] (of_device_add+0x40/0x48) [<c0276400>] (of_device_add+0x40/0x48) from [<c0276a98>] (of_platform_device_create_pdata+0x68/0x98) [<c0276a98>] (of_platform_device_create_pdata+0x68/0x98) from [<c0276bac>] (of_platform_bus_create+0xe4/0x260) [<c0276bac>] (of_platform_bus_create+0xe4/0x260) from [<c0276bf8>] (of_platform_bus_create+0x130/0x260) [<c0276bf8>] (of_platform_bus_create+0x130/0x260) from [<c0276d94>] (of_platform_populate+0x6c/0xac) [<c0276d94>] (of_platform_populate+0x6c/0xac) from [<c04a8224>] (u8500_init_machine+0x78/0x140) [<c04a8224>] (u8500_init_machine+0x78/0x140) from [<c04a3560>] (customize_machine+0x24/0x30) [<c04a3560>] (customize_machine+0x24/0x30) from [<c00087b0>] (do_one_initcall+0x130/0x1b0) [<c00087b0>] (do_one_initcall+0x130/0x1b0) from [<c033ff9c>] (kernel_init+0x138/0x2e8) [<c033ff9c>] (kernel_init+0x138/0x2e8) from [<c000eb18>] (ret_from_fork+0x14/0x20) Code: 0a00001b e19400b2 e59a200c e0822000 (e592c000) ---[ end trace 1b75b31a2719ed1c ]--- note: swapper/0[1] exited with preempt_count 1 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-26	pinctrl: sirf: enable GPIO pullup/down configuration from dts	Barry Song
	commit 7bec207427c2efb794 remove sirfsoc_gpio_set_pull function, this patches takes the feature back by adding sirf,pullups and sirf,pulldowns prop in dts, and the driver will set the GPIO pull according to the dts. Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-26	pinctrl: sirf: add missing DT-binding document	Barry Song
	While sending email to Linus for reviewing: "pinctrl: sirf: add DT-binding pinmux mapping support" https://patchwork.kernel.org/patch/1364361/ i have included the devicetree/bindings/pinctrl/pinctrl-sirf.txt But while sending pull request with commit 056876f6c73406c, i missed the document. this patch takes the document back. Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-26	pinctrl: fix comment mistake	Linus Walleij
	This variable pertains to pinctrl handles not muxes specifically. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-26	drivers/pinctrl/pinctrl-at91.c: convert kfree to devm_kfree	Julia Lawall
	The function at91_dt_node_to_map is ultimately called by the function pinctrl_get, which is an exported function. Since it is possible that this function is not called from within a probe function, for safety, the kfree is converted to a devm_kfree, to both free the data and remove it from the device in a failure situation. Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-26	pinctrl: imx5: fix GPIO_8 pad CAN1_RXCAN configuration	Philipp Zabel
	3 is an invalid value for the CAN1_IPP_IND_CANRX_SELECT_INPUT register. Set it to 2, which correctly selects the GPIO_8 pad. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-25	proc: Allow proc_free_inum to be called from any context	Eric W. Biederman
	While testing the pid namespace code I hit this nasty warning. [ 176.262617] ------------[ cut here ]------------ [ 176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0() [ 176.265145] Hardware name: Bochs [ 176.265677] Modules linked in: [ 176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18 [ 176.266564] Call Trace: [ 176.266564] [<ffffffff810a539f>] warn_slowpath_common+0x7f/0xc0 [ 176.266564] [<ffffffff810a53fa>] warn_slowpath_null+0x1a/0x20 [ 176.266564] [<ffffffff810ad9ea>] local_bh_enable_ip+0x7a/0xa0 [ 176.266564] [<ffffffff819308c9>] _raw_spin_unlock_bh+0x19/0x20 [ 176.266564] [<ffffffff8123dbda>] proc_free_inum+0x3a/0x50 [ 176.266564] [<ffffffff8111d0dc>] free_pid_ns+0x1c/0x80 [ 176.266564] [<ffffffff8111d195>] put_pid_ns+0x35/0x50 [ 176.266564] [<ffffffff810c608a>] put_pid+0x4a/0x60 [ 176.266564] [<ffffffff8146b177>] tty_ioctl+0x717/0xc10 [ 176.266564] [<ffffffff810aa4d5>] ? wait_consider_task+0x855/0xb90 [ 176.266564] [<ffffffff81086bf9>] ? default_spin_lock_flags+0x9/0x10 [ 176.266564] [<ffffffff810cab0a>] ? remove_wait_queue+0x5a/0x70 [ 176.266564] [<ffffffff811e37e8>] do_vfs_ioctl+0x98/0x550 [ 176.266564] [<ffffffff810b8a0f>] ? recalc_sigpending+0x1f/0x60 [ 176.266564] [<ffffffff810b9127>] ? __set_task_blocked+0x37/0x80 [ 176.266564] [<ffffffff810ab95b>] ? sys_wait4+0xab/0xf0 [ 176.266564] [<ffffffff811e3d31>] sys_ioctl+0x91/0xb0 [ 176.266564] [<ffffffff810a95f0>] ? task_stopped_code+0x50/0x50 [ 176.266564] [<ffffffff81939199>] system_call_fastpath+0x16/0x1b [ 176.266564] ---[ end trace 387af88219ad6143 ]--- It turns out that spin_unlock_bh(proc_inum_lock) is not safe when put_pid is called with another spinlock held and irqs disabled. For now take the easy path and use spin_lock_irqsave(proc_inum_lock) in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-25	pidns: Stop pid allocation when init dies	Eric W. Biederman
	Oleg pointed out that in a pid namespace the sequence. - pid 1 becomes a zombie - setns(thepidns), fork,... - reaping pid 1. - The injected processes exiting. Can lead to processes attempting access their child reaper and instead following a stale pointer. That waitpid for init can return before all of the processes in the pid namespace have exited is also unfortunate. Avoid these problems by disabling the allocation of new pids in a pid namespace when init dies, instead of when the last process in a pid namespace is reaped. Pointed-out-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-25	ext4: do not try to write superblock on ro remount w/o journal	Michael Tokarev
	When a journal-less ext4 filesystem is mounted on a read-only block device (blockdev --setro will do), each remount (for other, unrelated, flags, like suid=>nosuid etc) results in a series of scary messages from kernel telling about I/O errors on the device. This is becauese of the following code ext4_remount(): if (sbi->s_journal == NULL) ext4_commit_super(sb, 1); at the end of remount procedure, which forces writing (flushing) of a superblock regardless whenever it is dirty or not, if the filesystem is readonly or not, and whenever the device itself is readonly or not. We only need call ext4_commit_super when the file system had been previously mounted read/write. Thanks to Eric Sandeen for help in diagnosing this issue. Signed-off-By: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-12-25	ext4: include journal blocks in df overhead calcs	Eric Sandeen
	To more accurately calculate overhead for "bsd" style df reporting, we should count the journal blocks as overhead as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Eric Whitney <enwlinux@gmail.com>
2012-12-25	ext4: remove unaligned AIO warning printk	Eric Sandeen
	Although I put this in, I now think it was a bad decision. For most users, there is very little to be done in this case. They get the message, once per day, with no real context or proposed action. TBH, it generates support calls when it probably does not need to; the message sounds more dire than the situation really is. Just nuke it. Normal investigation via blktrace or whatnot can reveal poor IO patterns if bad performance is encountered. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>