Age | Commit message (Collapse) | Author |
|
Pull networking fixes from David Miller:
1) Fix memory leaks and other issues in mwifiex driver, from Amitkumar
Karwar.
2) skb_segment() can choke on packets using frag lists, fix from
Herbert Xu with help from Eric Dumazet and others.
3) IPv4 output cached route instantiation properly handles races
involving two threads trying to install the same route, but we
forgot to propagate this logic to input routes as well. Fix from
Alexei Starovoitov.
4) Put protections in place to make sure that recvmsg() paths never
accidently copy uninitialized memory back into userspace and also
make sure that we never try to use more that sockaddr_storage for
building the on-kernel-stack copy of a sockaddr. Fixes from Hannes
Frederic Sowa.
5) R8152 driver transmit flow bug fixes from Hayes Wang.
6) Fix some minor fallouts from genetlink changes, from Johannes Berg
and Michael Opdenacker.
7) AF_PACKET sendmsg path can race with netdevice unregister notifier,
fix by using RCU to make sure the network device doesn't go away
from under us. Fix from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
gso: handle new frag_list of frags GRO packets
genetlink: fix genl_set_err() group ID
genetlink: fix genlmsg_multicast() bug
packet: fix use after free race in send path when dev is released
xen-netback: stop the VIF thread before unbinding IRQs
wimax: remove dead code
net/phy: Add the autocross feature for forced links on VSC82x4
net/phy: Add VSC8662 support
net/phy: Add VSC8574 support
net/phy: Add VSC8234 support
net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct sockaddr_storage)
net: rework recvmsg handler msg_name and msg_namelen logic
bridge: flush br's address entry in fdb when remove the
net: core: Always propagate flag changes to interfaces
ipv4: fix race in concurrent ip_route_input_slow()
r8152: fix incorrect type in assignment
r8152: support stopping/waking tx queue
r8152: modify the tx flow
r8152: fix tx/rx memory overflow
netfilter: ebt_ip6: fix source and destination matching
...
|
|
Merge patches from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: place page->pmd_huge_pte to right union
MAINTAINERS: add keyboard driver to Hyper-V file list
x86, mm: do not leak page->ptl for pmd page tables
ipc,shm: correct error return value in shmctl (SHM_UNLOCK)
mm, mempolicy: silence gcc warning
block/partitions/efi.c: fix bound check
ARM: drivers/rtc/rtc-at91rm9200.c: disable interrupts at shutdown
mm: hugetlbfs: fix hugetlbfs optimization
kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly
ipc,shm: fix shm_file deletion races
mm: thp: give transparent hugepage code a separate copy_page
checkpatch: fix "Use of uninitialized value" warnings
configfs: fix race between dentry put and lookup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
"In this patchset, we finally get an SELinux update, with Paul Moore
taking over as maintainer of that code.
Also a significant update for the Keys subsystem, as well as
maintenance updates to Smack, IMA, TPM, and Apparmor"
and since I wanted to know more about the updates to key handling,
here's the explanation from David Howells on that:
"Okay. There are a number of separate bits. I'll go over the big bits
and the odd important other bit, most of the smaller bits are just
fixes and cleanups. If you want the small bits accounting for, I can
do that too.
(1) Keyring capacity expansion.
KEYS: Consolidate the concept of an 'index key' for key access
KEYS: Introduce a search context structure
KEYS: Search for auth-key by name rather than target key ID
Add a generic associative array implementation.
KEYS: Expand the capacity of a keyring
Several of the patches are providing an expansion of the capacity of a
keyring. Currently, the maximum size of a keyring payload is one page.
Subtract a small header and then divide up into pointers, that only gives
you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
a keyring to store ID mapping data, that has proven to be insufficient to
the cause.
Whatever data structure I use to handle the keyring payload, it can only
store pointers to keys, not the keys themselves because several keyrings
may point to a single key. This precludes inserting, say, and rb_node
struct into the key struct for this purpose.
I could make an rbtree of records such that each record has an rb_node
and a key pointer, but that would use four words of space per key stored
in the keyring. It would, however, be able to use much existing code.
I selected instead a non-rebalancing radix-tree type approach as that
could have a better space-used/key-pointer ratio. I could have used the
radix tree implementation that we already have and insert keys into it by
their serial numbers, but that means any sort of search must iterate over
the whole radix tree. Further, its nodes are a bit on the capacious side
for what I want - especially given that key serial numbers are randomly
allocated, thus leaving a lot of empty space in the tree.
So what I have is an associative array that internally is a radix-tree
with 16 pointers per node where the index key is constructed from the key
type pointer and the key description. This means that an exact lookup by
type+description is very fast as this tells us how to navigate directly to
the target key.
I made the data structure general in lib/assoc_array.c as far as it is
concerned, its index key is just a sequence of bits that leads to a
pointer. It's possible that someone else will be able to make use of it
also. FS-Cache might, for example.
(2) Mark keys as 'trusted' and keyrings as 'trusted only'.
KEYS: verify a certificate is signed by a 'trusted' key
KEYS: Make the system 'trusted' keyring viewable by userspace
KEYS: Add a 'trusted' flag and a 'trusted only' flag
KEYS: Separate the kernel signature checking keyring from module signing
These patches allow keys carrying asymmetric public keys to be marked as
being 'trusted' and allow keyrings to be marked as only permitting the
addition or linkage of trusted keys.
Keys loaded from hardware during kernel boot or compiled into the kernel
during build are marked as being trusted automatically. New keys can be
loaded at runtime with add_key(). They are checked against the system
keyring contents and if their signatures can be validated with keys that
are already marked trusted, then they are marked trusted also and can
thus be added into the master keyring.
Patches from Mimi Zohar make this usable with the IMA keyrings also.
(3) Remove the date checks on the key used to validate a module signature.
X.509: Remove certificate date checks
It's not reasonable to reject a signature just because the key that it was
generated with is no longer valid datewise - especially if the kernel
hasn't yet managed to set the system clock when the first module is
loaded - so just remove those checks.
(4) Make it simpler to deal with additional X.509 being loaded into the kernel.
KEYS: Load *.x509 files into kernel keyring
KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate
The builder of the kernel now just places files with the extension ".x509"
into the kernel source or build trees and they're concatenated by the
kernel build and stuffed into the appropriate section.
(5) Add support for userspace kerberos to use keyrings.
KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
KEYS: Implement a big key type that can save to tmpfs
Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
We looked at storing it in keyrings instead as that confers certain
advantages such as tickets being automatically deleted after a certain
amount of time and the ability for the kernel to get at these tokens more
easily.
To make this work, two things were needed:
(a) A way for the tickets to persist beyond the lifetime of all a user's
sessions so that cron-driven processes can still use them.
The problem is that a user's session keyrings are deleted when the
session that spawned them logs out and the user's user keyring is
deleted when the UID is deleted (typically when the last log out
happens), so neither of these places is suitable.
I've added a system keyring into which a 'persistent' keyring is
created for each UID on request. Each time a user requests their
persistent keyring, the expiry time on it is set anew. If the user
doesn't ask for it for, say, three days, the keyring is automatically
expired and garbage collected using the existing gc. All the kerberos
tokens it held are then also gc'd.
(b) A key type that can hold really big tickets (up to 1MB in size).
The problem is that Active Directory can return huge tickets with lots
of auxiliary data attached. We don't, however, want to eat up huge
tracts of unswappable kernel space for this, so if the ticket is
greater than a certain size, we create a swappable shmem file and dump
the contents in there and just live with the fact we then have an
inode and a dentry overhead. If the ticket is smaller than that, we
slap it in a kmalloc()'d buffer"
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
KEYS: Fix keyring content gc scanner
KEYS: Fix error handling in big_key instantiation
KEYS: Fix UID check in keyctl_get_persistent()
KEYS: The RSA public key algorithm needs to select MPILIB
ima: define '_ima' as a builtin 'trusted' keyring
ima: extend the measurement list to include the file signature
kernel/system_certificate.S: use real contents instead of macro GLOBAL()
KEYS: fix error return code in big_key_instantiate()
KEYS: Fix keyring quota misaccounting on key replacement and unlink
KEYS: Fix a race between negating a key and reading the error set
KEYS: Make BIG_KEYS boolean
apparmor: remove the "task" arg from may_change_ptraced_domain()
apparmor: remove parent task info from audit logging
apparmor: remove tsk field from the apparmor_audit_struct
apparmor: fix capability to not use the current task, during reporting
Smack: Ptrace access check mode
ima: provide hash algo info in the xattr
ima: enable support for larger default filedata hash algorithms
ima: define kernel parameter 'ima_template=' to change configured default
ima: add Kconfig default measurement list template
...
|
|
Make sure RTC-interrupts are disabled at shutdown.
As the RTC is generally powered by backup power (VDDBU), its interrupts
are not disabled on wake-up, user, watchdog or software reset. This
could cause troubles on other systems (e.g. older kernels) if an
interrupt occurs before a handler has been installed at next boot.
Let us be well-behaved and disable them on clean shutdowns at least (as
do the RTT-based rtc-at91sam9 driver).
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Andrew Victor <linux@maxim.org.za>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove CONFIG_USE_GENERIC_SMP_HELPERS left by commit 0a06ff068f12
("kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS").
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the VIF thread is still running after unbinding the Tx and Rx IRQs
in xenvif_disconnect(), the thread may attempt to raise an event which
will BUG (as the irq is unbound).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
pull request: wireless 2013-11-21
Please pull this batch of fixes intended for the 3.13 stream!
For the Bluetooth bits, Gustavo says:
"A few fixes for 3.13. There is 3 fixes to the RFCOMM protocol. One
crash fix to L2CAP. A simple fix to a bad behaviour in the SMP
protocol."
On top of that...
Amitkumar Karwar sends a quintet of mwifiex fixes -- two fixes related
to failure handling, two memory leak fixes, and a NULL pointer fix.
Felix Fietkau corrects and earlier rt2x00 HT descriptor handling fix
to address a crash.
Geyslan G. Bem fixes a memory leak in brcmfmac.
Larry Finger address more pointer arithmetic errors in rtlwifi.
Luis R. Rodriguez provides a regulatory fix in the shared ath code.
Sujith Manoharan brings a couple ath9k initialization fixes.
Ujjal Roy offers one more mwifiex fix to avoid invalid memory accesses
when unloading the USB driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
|
|
Add auto-MDI/MDI-X capability for forced (autonegotiation disabled)
10/100 Mbps speeds on Vitesse VSC82x4 PHYs. Exported previously static
function genphy_setup_forced() required by the new config_aneg handler
in the Vitesse PHY module.
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vitesse VSC8662 is Dual Port 10/100/1000Base-T Phy
Its register set and features are similar to other Vitesse Phys.
Signed-off-by: Sandeep Singh <Sandeep@freescale.com>
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The VSC8574 is a quad-port Gigabit Ethernet transceiver with four SerDes
interfaces for quad-port dual media capability.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vitesse VSC8234 is quad port 10/100/1000BASE-T PHY
with SGMII and SERDES MAC interfaces.
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Shruti Kanetkar <Shruti@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.
This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.
Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.
Also document these changes in include/linux/net.h as suggested by David
Miller.
Changes since RFC:
Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.
With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".
This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.
Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.
Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha updates from Matt Turner:
"It contains a few fixes and some work from Richard to make alpha
emulation under QEMU much more usable"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: Prevent a NULL ptr dereference in csum_partial_copy.
alpha: perf: fix out-of-bounds array access triggered from raw event
alpha: Use qemu+cserve provided high-res clock and alarm.
alpha: Switch to GENERIC_CLOCKEVENTS
alpha: Enable the rpcc clocksource for single processor
alpha: Reorganize rtc handling
alpha: Primitive support for CPU power down.
alpha: Allow HZ to be configured
alpha: Notice if we're being run under QEMU
alpha: Eliminate compiler warning from memset macro
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI and power management updates from Rafael Wysocki:
- ACPI-based device hotplug fixes for issues introduced recently and a
fix for an older error code path bug in the ACPI PCI host bridge
driver
- Fix for recently broken OMAP cpufreq build from Viresh Kumar
- Fix for a recent hibernation regression related to s2disk
- Fix for a locking-related regression in the ACPI EC driver from
Puneet Kumar
- System suspend error code path fix related to runtime PM and runtime
PM documentation update from Ulf Hansson
- cpufreq's conservative governor fix from Xiaoguang Chen
- New processor IDs for intel_idle and turbostat and removal of an
obsolete Kconfig option from Len Brown
- New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and
ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg
- Removal of several ACPI video DMI blacklist entries that are not
necessary any more from Aaron Lu
- Rework of the ACPI companion representation in struct device and code
cleanup related to that change from Rafael J Wysocki, Lan Tianyu and
Jarkko Nikula
- Fixes for assigning names to ACPI-enumerated I2C and SPI devices from
Jarkko Nikula
* tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed()
ACPI / PCI root: Clear driver_data before failing enumeration
ACPI / hotplug: Fix PCI host bridge hot removal
ACPI / hotplug: Fix acpi_bus_get_device() return value check
cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
ACPI / video: clean up DMI table for initial black screen problem
ACPI / EC: Ensure lock is acquired before accessing ec struct members
PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac
spi: Use stable dev_name for ACPI enumerated SPI slaves
i2c: Use stable dev_name for ACPI enumerated I2C slaves
ACPI: Provide acpi_dev_name accessor for struct acpi_device device name
ACPI / bind: Use (put|get)_device() on ACPI device objects too
ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro
ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node
cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
PM / Runtime: Fix error path for prepare
PM / Runtime: Update documentation around probe|remove|suspend
cpufreq: conservative: set requested_freq to policy max when it is over policy max
...
|
|
Pull slave-dmaengine changes from Vinod Koul:
"This brings for slave dmaengine:
- Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as
dmaengine can only transfer and not verify validaty of dma
transfers
- Bunch of fixes across drivers:
- cppi41 driver fixes from Daniel
- 8 channel freescale dma engine support and updated bindings from
Hongbo
- msx-dma fixes and cleanup by Markus
- DMAengine updates from Dan:
- Bartlomiej and Dan finalized a rework of the dma address unmap
implementation.
- In the course of testing 1/ a collection of enhancements to
dmatest fell out. Notably basic performance statistics, and
fixed / enhanced test control through new module parameters
'run', 'wait', 'noverify', and 'verbose'. Thanks to Andriy and
Linus [Walleij] for their review.
- Testing the raid related corner cases of 1/ triggered bugs in
the recently added 16-source operation support in the ioatdma
driver.
- Some minor fixes / cleanups to mv_xor and ioatdma"
* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits)
dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers
dma: mv_xor: Remove unneeded NULL address check
ioat: fix ioat3_irq_reinit
ioat: kill msix_single_vector support
raid6test: add new corner case for ioatdma driver
ioatdma: clean up sed pool kmem_cache
ioatdma: fix selection of 16 vs 8 source path
ioatdma: fix sed pool selection
ioatdma: Fix bug in selftest after removal of DMA_MEMSET.
dmatest: verbose mode
dmatest: convert to dmaengine_unmap_data
dmatest: add a 'wait' parameter
dmatest: add basic performance metrics
dmatest: add support for skipping verification and random data setup
dmatest: use pseudo random numbers
dmatest: support xor-only, or pq-only channels in tests
dmatest: restore ability to start test at module load and init
dmatest: cleanup redundant "dmatest: " prefixes
dmatest: replace stored results mechanism, with uniform messages
Revert "dmatest: append verify result to results"
...
|
|
Pull block IO fixes from Jens Axboe:
"Normally I'd defer my initial for-linus pull request until after the
merge window, but a race was uncovered in the virtio-blk conversion to
blk-mq that could cause hangs. So here's a small collection of fixes
for you to pull:
- The fix for the virtio-blk IO hang reported by Dave Chinner, from
Shaohua and myself.
- Add the Insert blktrace event for blk-mq. This makes 'btt' happy
when it is doing it's state transition analysis.
- Ensure that blk-mq has disk/partition stats enabled by default,
instead of making it opt-in.
- A fix for __bio_add_page() and large sector counts"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: add blktrace insert event trace
virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations
blk-mq: ensure that we set REQ_IO_STAT so diskstats work
bio: fix argument of __bio_add_page() for max_sectors > 0xffff
|
|
Pull md update from Neil Brown:
"Mostly optimisations and obscure bug fixes.
- raid5 gets less lock contention
- raid1 gets less contention between normal-io and resync-io during
resync"
* tag 'md/3.13' of git://neil.brown.name/md:
md/raid5: Use conf->device_lock protect changing of multi-thread resources.
md/raid5: Before freeing old multi-thread worker, it should flush them.
md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE.
UAPI: include <asm/byteorder.h> in linux/raid/md_p.h
raid1: Rewrite the implementation of iobarrier.
raid1: Add some macros to make code clearly.
raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array.
raid1: Add a field array_frozen to indicate whether raid in freeze state.
md: Convert use of typedef ctl_table to struct ctl_table
md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes.
md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread.
md: fix some places where mddev_lock return value is not checked.
raid5: Retry R5_ReadNoMerge flag when hit a read error.
raid5: relieve lock contention in get_active_stripe()
raid5: relieve lock contention in get_active_stripe()
wait: add wait_event_cmd()
md/raid5.c: add proper locking to error path of raid5_start_reshape.
md: fix calculation of stacking limits on level change.
raid5: Use slow_path to release stripe when mddev->thread is null
|
|
The data from the hardware should be little endian. Correct the
declaration.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The maximum packet number which a tx aggregation buffer could contain
is the tx_qlen.
tx_qlen = buffer size / (packet size + descriptor size).
If the tx buffer is empty and the queued packets are more than the
maximum value which is defined above, stop the tx queue. Wake the
tx queue if tx queue is stopped and the queued packets are less than
tx_qlen.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the code for sending the packet in the rtl8152_start_xmit().
Let rtl8152_start_xmit() to queue the packet only, and schedule a
tasklet to send the queued packets. This simplify the code and make
sure all the packet would be sent by the original order.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tx/rx would access the memory which is out of the desired range.
Modify the method of checking the end of the memory to avoid it.
For r8152_tx_agg_fill(), the variable remain may become negative.
However, the declaration is unsigned, so the while loop wouldn't
break when reaching the end of the desied memory. Although to change
the declaration from unsigned to signed is enough to fix it, I also
modify the checking method for safe. Replace
remain = rx_buf_sz - sizeof(*tx_desc) -
(u32)((void *)tx_data - agg->head);
with
remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
to make sure the variable remain is always positive. Then, the
overflow wouldn't happen.
For rx_bottom(), the rx_desc should not be used to calculate the
packet length before making sure the rx_desc is in the desired range.
Change the checking to two parts. First, check the descriptor is in
the memory. The other, using the descriptor to find out the packet
length and check if the packet is in the memory.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It isn't safe to call it without holding the vblk->vq_lock.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Fixed another condition of virtqueue_kick() not holding the lock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It appears that driver runs into a problem here if fibsize is too small
because we allocate user_srbcmd with fibsize size only but later we
access it until user_srbcmd->sg.count to copy it over to srbcmd.
It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this
structure already includes one sg element and this is not needed for
commands without data. So, we would recommend to add the following
(instead of test for fibsize == 0).
Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
"Mostly these are fixes for fallout due to merge window changes, as
well as cures for problems that have been with us for a much longer
period of time"
1) Johannes Berg noticed two major deficiencies in our genetlink
registration. Some genetlink protocols we passing in constant
counts for their ops array rather than something like
ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
using fixed IDs for their multicast groups.
We have to retain these fixed IDs to keep existing userland tools
working, but reserve them so that other multicast groups used by
other protocols can not possibly conflict.
In dealing with these two problems, we actually now use less state
management for genetlink operations and multicast groups.
2) When configuring interface hardware timestamping, fix several
drivers that simply do not validate that the hwtstamp_config value
is one the driver actually supports. From Ben Hutchings.
3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.
4) In dev_forward_skb(), set the skb->protocol in the right order
relative to skb_scrub_packet(). From Alexei Starovoitov.
5) Bridge erroneously fails to use the proper wrapper functions to make
calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
Makita.
6) When detaching a bridge port, make sure to flush all VLAN IDs to
prevent them from leaking, also from Toshiaki Makita.
7) Put in a compromise for TCP Small Queues so that deep queued devices
that delay TX reclaim non-trivially don't have such a performance
decrease. One particularly problematic area is 802.11 AMPDU in
wireless. From Eric Dumazet.
8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
here. Fix from Eric Dumzaet, reported by Dave Jones.
9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.
10) When computing mergeable buffer sizes, virtio-net fails to take the
virtio-net header into account. From Michael Dalton.
11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
bumping, this one has been with us for a while. From Eric Dumazet.
12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
Hugne.
13) 6lowpan bit used for traffic classification was wrong, from Jukka
Rissanen.
14) macvlan has the same issue as normal vlans did wrt. propagating LRO
disabling down to the real device, fix it the same way. From Michal
Kubecek.
15) CPSW driver needs to soft reset all slaves during suspend, from
Daniel Mack.
16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.
17) The xen-netfront RX buffer refill timer isn't properly scheduled on
partial RX allocation success, from Ma JieYue.
18) When ipv6 ping protocol support was added, the AF_INET6 protocol
initialization cleanup path on failure was borked a little. Fix
from Vlad Yasevich.
19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
blocks we can do the wrong thing with the msg_name we write back to
userspace. From Hannes Frederic Sowa. There is another fix in the
works from Hannes which will prevent future problems of this nature.
20) Fix route leak in VTI tunnel transmit, from Fan Du.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
genetlink: make multicast groups const, prevent abuse
genetlink: pass family to functions using groups
genetlink: add and use genl_set_err()
genetlink: remove family pointer from genl_multicast_group
genetlink: remove genl_unregister_mc_group()
hsr: don't call genl_unregister_mc_group()
quota/genetlink: use proper genetlink multicast APIs
drop_monitor/genetlink: use proper genetlink multicast APIs
genetlink: only pass array to genl_register_family_with_ops()
tcp: don't update snd_nxt, when a socket is switched from repair mode
atm: idt77252: fix dev refcnt leak
xfrm: Release dst if this dst is improper for vti tunnel
netlink: fix documentation typo in netlink_set_err()
be2net: Delete secondary unicast MAC addresses during be_close
be2net: Fix unconditional enabling of Rx interface options
net, virtio_net: replace the magic value
ping: prevent NULL pointer dereference on write to msg_name
bnx2x: Prevent "timeout waiting for state X"
bnx2x: prevent CFC attention
bnx2x: Prevent panic during DMAE timeout
...
|
|
Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.
This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.
At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.
The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
init_card() calls dev_get_by_name() to get a network deceive. But it
doesn't decrease network device reference count after the device is
used.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
* acpi-hotplug:
PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
|
|
Commit bd950799d951 (PCI: acpiphp: Convert to dynamic debug) removed users
of acpiphp_debug variable and the variable itself but the declaration was
left in the header file. Drop this unused declaration.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull second set of s390 patches from Martin Schwidefsky:
"The handling of the PCI hotplug notifications has been improved, the
zfcp dumper can now detect the HSA size dynamically and the default
install kernel has been changed to the compressed bzImage. And two
bug-fixes for scm and 3720"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: implement hotplug notifications
s390/scm_block: do not hide eadm subchannel dependency
s390/sclp: Consolidate early sclp init calls to sclp_early_detect()
s390/sclp: Move early code from sclp_cmd.c to sclp_early.c
s390/sclp: Determine HSA size dynamically for zfcpdump
s390/sclp: Move declarations for sclp_sdias into separate header file
s390/pci: implement pcibios_remove_bus
s390/pci: improve handling of bus resources
s390/3270: fix missing device_destroy() call
s390/boot: Install bzImage as default kernel image
|
|
When we change group_thread_cnt from sysfs entry, it can OOPS.
The kernel messages are:
[ 135.299021] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 135.299073] IP: [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.299107] PGD 0
[ 135.299122] Oops: 0000 [#1] SMP
[ 135.299144] Modules linked in: netconsole e1000e ptp pps_core
[ 135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24
[ 135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011
[ 135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000
[ 135.299283] RIP: 0010:[<ffffffff815188ab>] [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.299323] RSP: 0018:ffff8800b77a5c48 EFLAGS: 00010002
[ 135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008
[ 135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00
[ 135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000
[ 135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00
[ 135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70
[ 135.299479] FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
[ 135.299510] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0
[ 135.299559] Stack:
[ 135.299570] ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300
[ 135.299611] 000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8
[ 135.299654] 000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98
[ 135.299696] Call Trace:
[ 135.299711] [<ffffffff8107383e>] ? __wake_up+0x4e/0x70
[ 135.299733] [<ffffffff81518f88>] raid5d+0x4c8/0x680
[ 135.299756] [<ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0
[ 135.299781] [<ffffffff81524c9f>] md_thread+0x11f/0x170
[ 135.299804] [<ffffffff81069cd0>] ? wake_up_bit+0x40/0x40
[ 135.299826] [<ffffffff81524b80>] ? md_rdev_init+0x110/0x110
[ 135.299850] [<ffffffff81069656>] kthread+0xc6/0xd0
[ 135.299871] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 135.299899] [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[ 135.299923] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90
[ 135.300005] RIP [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[ 135.300005] RSP <ffff8800b77a5c48>
[ 135.300005] CR2: 0000000000000000
[ 135.300005] ---[ end trace 504854e5bb7562ed ]---
[ 135.300005] Kernel panic - not syncing: Fatal exception
This is because raid5d() can be running when the multi-thread
resources are changed via system. We see need to provide locking.
mddev->device_lock is suitable, but we cannot simple call
alloc_thread_groups under this lock as we cannot allocate memory
while holding a spinlock.
So change alloc_thread_groups() to allocate and return the data
structures, then raid5_store_group_thread_cnt() can take the lock
while updating the pointers to the data structures.
This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x
stable series.
Fixes: b721420e8719131896b009b11edbbd27
Cc: stable@vger.kernel.org (3.12)
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
|
|
When changing group_thread_cnt from sysfs entry, the kernel can oops.
The kernel messages are:
[ 740.961389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 740.961444] IP: [<ffffffff81062570>] process_one_work+0x30/0x500
[ 740.961476] PGD b9013067 PUD b651e067 PMD 0
[ 740.961503] Oops: 0000 [#1] SMP
[ 740.961525] Modules linked in: netconsole e1000e ptp pps_core
[ 740.961577] CPU: 0 PID: 3683 Comm: kworker/u8:5 Not tainted 3.12.0+ #23
[ 740.961602] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011
[ 740.961646] task: ffff88013abe0000 ti: ffff88013a246000 task.ti: ffff88013a246000
[ 740.961673] RIP: 0010:[<ffffffff81062570>] [<ffffffff81062570>] process_one_work+0x30/0x500
[ 740.961708] RSP: 0018:ffff88013a247e08 EFLAGS: 00010086
[ 740.961730] RAX: ffff8800b912b400 RBX: ffff88013a61e680 RCX: ffff8800b912b400
[ 740.961757] RDX: ffff8800b912b600 RSI: ffff8800b912b600 RDI: ffff88013a61e680
[ 740.961782] RBP: ffff88013a247e48 R08: ffff88013a246000 R09: 000000000002c09d
[ 740.961808] R10: 000000000000010f R11: 0000000000000000 R12: ffff88013b00cc00
[ 740.961833] R13: 0000000000000000 R14: ffff88013b00cf80 R15: ffff88013a61e6b0
[ 740.961861] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[ 740.961893] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 740.962001] CR2: 00000000000000b8 CR3: 00000000b24fe000 CR4: 00000000000407f0
[ 740.962001] Stack:
[ 740.962001] 0000000000000008 ffff8800b912b600 ffff88013b00cc00 ffff88013a61e680
[ 740.962001] ffff88013b00cc00 ffff88013b00cc18 ffff88013b00cf80 ffff88013a61e6b0
[ 740.962001] ffff88013a247eb8 ffffffff810639c6 0000000000012a80 ffff88013a247fd8
[ 740.962001] Call Trace:
[ 740.962001] [<ffffffff810639c6>] worker_thread+0x206/0x3f0
[ 740.962001] [<ffffffff810637c0>] ? manage_workers+0x2c0/0x2c0
[ 740.962001] [<ffffffff81069656>] kthread+0xc6/0xd0
[ 740.962001] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 740.962001] [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[ 740.962001] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[ 740.962001] Code: 89 e5 41 57 41 56 41 55 45 31 ed 41 54 53 48 89 fb 48 83 ec 18 48 8b 06 4c 8b 67 48 48 89 c1 30 c9 a8 04 4c 0f 45 e9 80 7f 58 00 <49> 8b 45 08 44 8b b0 00 01 00 00 78 0c 41 f6 44 24 10 04 0f 84
[ 740.962001] RIP [<ffffffff81062570>] process_one_work+0x30/0x500
[ 740.962001] RSP <ffff88013a247e08>
[ 740.962001] CR2: 0000000000000008
[ 740.962001] ---[ end trace 39181460000748de ]---
[ 740.962001] Kernel panic - not syncing: Fatal exception
This can happen if there are some stripes left, fewer than MAX_STRIPE_BATCH.
A worker is queued to handle them.
But before calling raid5_do_work, raid5d handles those
stripes making conf->active_stripe = 0.
So mddev_suspend() can return.
We might then free old worker resources before the queued
raid5_do_work() handled them. When it runs, it crashes.
raid5d() raid5_store_group_thread_cnt()
queue_work mddev_suspend()
handle_strips
active_stripe=0
free(old worker resources)
process_one_work
raid5_do_work
To avoid this, we should only flush the worker resources before freeing them.
This fixes a bug introduced in 3.12 so is suitable for the 3.12.x
stable series.
Cc: stable@vger.kernel.org (3.12)
Fixes: b721420e8719131896b009b11edbbd27
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
|
|
For R5_ReadNoMerge,it mean this bio can't merge with other bios or
request.It used REQ_FLUSH to achieve this. But REQ_NOMERGE can do the
same work.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
There is an iobarrier in raid1 because of contention between normal IO and
resync IO. It suspends all normal IO when resync/recovery happens.
However if normal IO is out side the resync window, there is no contention.
So this patch changes the barrier mechanism to only block IO that
could contend with the resync that is currently happening.
We partition the whole space into five parts.
|---------|-----------|------------|----------------|-------|
start next_resync start_next_window end_window
start + RESYNC_WINDOW = next_resync
next_resync + NEXT_NORMALIO_DISTANCE = start_next_window
start_next_window + NEXT_NORMALIO_DISTANCE = end_window
Firstly we introduce some concepts:
1 - RESYNC_WINDOW: For resync, there are 32 resync requests at most at the
same time. A sync request is RESYNC_BLOCK_SIZE(64*1024).
So the RESYNC_WINDOW is 32 * RESYNC_BLOCK_SIZE, that is 2MB.
2 - NEXT_NORMALIO_DISTANCE: the distance between next_resync
and start_next_window. It also indicates the distance between
start_next_window and end_window.
It is currently 3 * RESYNC_WINDOW_SIZE but could be tuned if
this turned out not to be optimal.
3 - next_resync: the next sector at which we will do sync IO.
4 - start: a position which is at most RESYNC_WINDOW before
next_resync.
5 - start_next_window: a position which is NEXT_NORMALIO_DISTANCE
beyond next_resync. Normal-io after this position doesn't need to
wait for resync-io to complete.
6 - end_window: a position which is 2 * NEXT_NORMALIO_DISTANCE beyond
next_resync. This also doesn't need to wait, but is counted
differently.
7 - current_window_requests: the count of normalIO between
start_next_window and end_window.
8 - next_window_requests: the count of normalIO after end_window.
NormalIO will be partitioned into four types:
NormIO1: the end sector of bio is smaller or equal the start
NormIO2: the start sector of bio larger or equal to end_window
NormIO3: the start sector of bio larger or equal to
start_next_window.
NormIO4: the location between start_next_window and end_window
|--------|-----------|--------------------|----------------|-------------|
| start | next_resync | start_next_window | end_window |
NormIO1 NormIO4 NormIO4 NormIO3 NormIO2
For NormIO1, we don't need any io barrier.
For NormIO4, we used a similar approach to the original iobarrier
mechanism. The normalIO and resyncIO must be kept separate.
For NormIO2/3, we add two fields to struct r1conf: "current_window_requests"
and "next_window_requests". They indicate the count of active
requests in the two window.
For these, we don't wait for resync io to complete.
For resync action, if there are NormIO4s, we must wait for it.
If not, we can proceed.
But if resync action reaches start_next_window and
current_window_requests > 0 (that is there are NormIO3s), we must
wait until the current_window_requests becomes zero.
When current_window_requests becomes zero, start_next_window also
moves forward. Then current_window_requests will replaced by
next_window_requests.
There is a problem which when and how to change from NormIO2 to
NormIO3. Only then can sync action progress.
We add a field in struct r1conf "start_next_window".
A: if start_next_window == MaxSector, it means there are no NormIO2/3.
So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
B: if current_window_requests == 0 && next_window_requests != 0, it
means start_next_window move to end_window
There is another problem which how to differentiate between
old NormIO2(now it is NormIO3) and NormIO2.
For example, there are many bios which are NormIO2 and a bio which is
NormIO3. NormIO3 firstly completed, so the bios of NormIO2 became NormIO3.
We add a field in struct r1bio "start_next_window".
This is used to record the position conf->start_next_window when the call
to wait_barrier() is made in make_request().
In allow_barrier(), we check the conf->start_next_window.
If r1bio->stat_next_window == conf->start_next_window, it means
there is no transition between NormIO2 and NormIO3.
If r1bio->start_next_window != conf->start_next_window, it mean
there was a transition between NormIO2 and NormIO3. There can only
have been one transition. So it only means the bio is old NormIO2.
For one bio, there may be many r1bio's. So we make sure
all the r1bio->start_next_window are the same value.
If we met blocked_dev in make_request(), it must call allow_barrier
and wait_barrier. So the former and the later value of
conf->start_next_window will be change.
If there are many r1bio's with differnet start_next_window,
for the relevant bio, it depend on the last value of r1bio.
It will cause error. To avoid this, we must wait for previous r1bios
to complete.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
In a subsequent patch, we'll use some const parameters.
Using macros will make the code clearly.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
when reconfiguring the array.
We used to use raise_barrier to suspend normal IO while we reconfigure
the array. However raise_barrier will soon only suspend some normal
IO, not all. So we need something else.
Change it to use freeze_array.
But freeze_array not only suspends normal io, it also suspends
resync io.
For the place where call raise_barrier for reconfigure, it isn't a
problem.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Because the following patch will rewrite the content between normal IO
and resync IO. So we used a parameter to indicate whether raid is in freeze
array.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
md_stop_writes.
When raid5 recovery hits a fresh badblock, this badblock will flagged as unack
badblock until md_update_sb() is called.
But md_stop will take reconfig lock which means raid5d can't call
md_update_sb() in md_check_recovery(), the badblock will always
be unack, so raid5d thread enters an infinite loop and md_stop_write()
can never stop sync_thread. This causes deadlock.
To solve this, when STOP_ARRAY ioctl is issued and sync_thread is
running, we need set md->recovery FROZEN and INTR flags and wait for
sync_thread to stop before we (re)take reconfig lock.
This requires that raid5 reshape_request notices MD_RECOVERY_INTR
(which it probably should have noticed anyway) and stops waiting for a
metadata update in that case.
Reported-by: Jianpeng Ma <majianpeng@gmail.com>
Reported-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
We currently use kthread_should_stop() in various places in the
sync/reshape code to abort early.
However some places set MD_RECOVERY_INTR but don't immediately call
md_reap_sync_thread() (and we will shortly get another one).
When this happens we are relying on md_check_recovery() to reap the
thread and that only happen when it finishes normally.
So MD_RECOVERY_INTR must lead to a normal finish without the
kthread_should_stop() test.
So replace all relevant tests, and be more careful when the thread is
interrupted not to acknowledge that latest step in a reshape as it may
not be fully committed yet.
Also add a test on MD_RECOVERY_INTR in the 'is_mddev_idle' loop
so we don't wait have to wait for the speed to drop before we can abort.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Sometimes we need to lock and mddev and cannot cope with
failure due to interrupt.
In these cases we should use mutex_lock, not mutex_lock_interruptible.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Because of block layer merge, one bio fails will cause other bios
which belongs to the same request fails, so raid5_end_read_request
will record all these bios as badblocks.
If retry request with R5_ReadNoMerge flag to avoid bios merge,
badblocks can only record sector which is bad exactly.
test:
hdparm --yes-i-know-what-i-am-doing --make-bad-sector 300000 /dev/sdb
mdadm -C /dev/md0 -l5 -n3 /dev/sd[bcd] --assume-clean
mdadm /dev/md0 -f /dev/sdd
mdadm /dev/md0 -r /dev/sdd
mdadm --zero-superblock /dev/sdd
mdadm /dev/md0 -a /dev/sdd
1. Without this patch:
cat /sys/block/md0/md/rd*/bad_blocks
299776 256
299776 256
2. With this patch:
cat /sys/block/md0/md/rd*/bad_blocks
300000 8
300000 8
Signed-off-by: Bian Yu <bianyu@kedacom.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
track empty inactive list count, so md_raid5_congested() can use it to make
decision.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
* pm-cpufreq:
cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
cpufreq: conservative: set requested_freq to policy max when it is over policy max
|
|
* pm-runtime:
PM / Runtime: Fix error path for prepare
PM / Runtime: Update documentation around probe|remove|suspend
|
|
* pm-cpuidle:
intel_idle: Support Intel Atom Processor C2000 Product Family
|
|
* acpi-video:
ACPI / video: clean up DMI table for initial black screen problem
|
|
* acpi-ec:
ACPI / EC: Ensure lock is acquired before accessing ec struct members
|