Age | Commit message (Collapse) | Author |
|
The scsi netlink code confuses the netlink port id with a process id,
going so far as to read NETLINK_CREDS(skb)->pid instead of the correct
NETLINK_CB(skb).pid. Fortunately it does not matter because nothing
registers to respond to scsi netlink requests.
The only interesting use of the scsi_netlink interface is
fc_host_post_vendor_event which sends a netlink multicast message.
Since nothing registers to handle scsi netlink messages kill all of the
registration logic, while retaining the same error handling behavior
preserving the userspace visible behavior and removing all of the
confused code that thought a netlink port id was a process id.
This was tested with a kernel allyesconfig build which had no problems.
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: James Smart <James.Smart@Emulex.Com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
a lot of code has either the memset or an inefficient copy
from a static array that contains the all-zeros Ethernet address.
Introduce help function eth_zero_addr() to fill an address with
all zeros, making the code clearer and allowing us to get rid of
some constant arrays.
Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new ALU opcode, to compute a modulus.
Commit ffe06c17afbbb used an ancillary to implement XOR_X,
but here we reserve one of the available ALU opcode to implement both
MOD_X and MOD_K
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: George Bakos <gbakos@alpinista.org>
Cc: Jay Schulist <jschlst@samba.org>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).
Suggested by David S. Miller.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.
This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).
Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace. Avoid that problem by
passing kuids and kgids instead.
- define struct scm_creds for use in scm_cookie and netlink_skb_parms
that holds uid and gid information in kuid_t and kgid_t.
- Modify scm_set_cred to fill out scm_creds by heand instead of using
cred_to_ucred to fill out struct ucred. This conversion ensures
userspace does not get incorrect uid or gid values to look at.
- Modify scm_recv to convert from struct scm_creds to struct ucred
before copying credential values to userspace.
- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
instead of just copying struct ucred from userspace.
- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
into the NETLINK_CB.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When adding a blackhole or a prohibit route, they were handling like classic
routes. Moreover, it was only possible to add this kind of routes by specifying
an interface.
Bug already reported here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498
Before the patch:
$ ip route add blackhole 2001::1/128
RTNETLINK answers: No such device
$ ip route add blackhole 2001::1/128 dev eth0
$ ip -6 route | grep 2001
2001::1 dev eth0 metric 1024
After:
$ ip route add blackhole 2001::1/128
$ ip -6 route | grep 2001
blackhole 2001::1 dev lo metric 1024 error -22
v2: wrong patch
v3: add a field fc_type in struct fib6_config to store RTN_* type
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It seems we need to provide ability for stacked devices
to use specific lock_class_key for sch->busylock
We could instead default l2tpeth tx_queue_len to 0 (no qdisc), but
a user might use a qdisc anyway.
(So same fixes are probably needed on non LLTX stacked drivers)
Noticed while stressing L2TPV3 setup :
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc3+ #788 Not tainted
-------------------------------------------------------
netperf/4660 is trying to acquire lock:
(l2tpsock){+.-...}, at: [<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
but task is already holding lock:
(&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&sch->busylock)->rlock){+.-...}:
[<ffffffff810a5df0>] lock_acquire+0x90/0x200
[<ffffffff817499fc>] _raw_spin_lock_irqsave+0x4c/0x60
[<ffffffff81074872>] __wake_up+0x32/0x70
[<ffffffff8136d39e>] tty_wakeup+0x3e/0x80
[<ffffffff81378fb3>] pty_write+0x73/0x80
[<ffffffff8136cb4c>] tty_put_char+0x3c/0x40
[<ffffffff813722b2>] process_echoes+0x142/0x330
[<ffffffff813742ab>] n_tty_receive_buf+0x8fb/0x1230
[<ffffffff813777b2>] flush_to_ldisc+0x142/0x1c0
[<ffffffff81062818>] process_one_work+0x198/0x760
[<ffffffff81063236>] worker_thread+0x186/0x4b0
[<ffffffff810694d3>] kthread+0x93/0xa0
[<ffffffff81753e24>] kernel_thread_helper+0x4/0x10
-> #0 (l2tpsock){+.-...}:
[<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10
[<ffffffff810a5df0>] lock_acquire+0x90/0x200
[<ffffffff817498c1>] _raw_spin_lock+0x41/0x50
[<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70
[<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290
[<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00
[<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890
[<ffffffff815db019>] ip_output+0x59/0xf0
[<ffffffff815da36d>] ip_local_out+0x2d/0xa0
[<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680
[<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60
[<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30
[<ffffffff815f5300>] tcp_push_one+0x30/0x40
[<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040
[<ffffffff81614495>] inet_sendmsg+0x125/0x230
[<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0
[<ffffffff81579ece>] sys_sendto+0xfe/0x130
[<ffffffff81752c92>] system_call_fastpath+0x16/0x1b
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&sch->busylock)->rlock);
lock(l2tpsock);
lock(&(&sch->busylock)->rlock);
lock(l2tpsock);
*** DEADLOCK ***
5 locks held by netperf/4660:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e581c>] tcp_sendmsg+0x2c/0x1040
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff815da3e0>] ip_queue_xmit+0x0/0x680
#2: (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9ac5>] ip_finish_output+0x135/0x890
#3: (rcu_read_lock_bh){.+....}, at: [<ffffffff81595820>] dev_queue_xmit+0x0/0xe00
#4: (&(&sch->busylock)->rlock){+.-...}, at: [<ffffffff81596595>] dev_queue_xmit+0xd75/0xe00
stack backtrace:
Pid: 4660, comm: netperf Not tainted 3.6.0-rc3+ #788
Call Trace:
[<ffffffff8173dbf8>] print_circular_bug+0x1fb/0x20c
[<ffffffff810a5288>] __lock_acquire+0x1628/0x1b10
[<ffffffff810a334b>] ? check_usage+0x9b/0x4d0
[<ffffffff810a3f44>] ? __lock_acquire+0x2e4/0x1b10
[<ffffffff810a5df0>] lock_acquire+0x90/0x200
[<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[<ffffffff817498c1>] _raw_spin_lock+0x41/0x50
[<ffffffffa0208db2>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[<ffffffffa0208db2>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[<ffffffffa021a802>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[<ffffffff815952b2>] dev_hard_start_xmit+0x502/0xa70
[<ffffffff81594e0e>] ? dev_hard_start_xmit+0x5e/0xa70
[<ffffffff81595961>] ? dev_queue_xmit+0x141/0xe00
[<ffffffff815b63ce>] sch_direct_xmit+0xfe/0x290
[<ffffffff81595a05>] dev_queue_xmit+0x1e5/0xe00
[<ffffffff81595820>] ? dev_hard_start_xmit+0xa70/0xa70
[<ffffffff815d9d60>] ip_finish_output+0x3d0/0x890
[<ffffffff815d9ac5>] ? ip_finish_output+0x135/0x890
[<ffffffff815db019>] ip_output+0x59/0xf0
[<ffffffff815da36d>] ip_local_out+0x2d/0xa0
[<ffffffff815da5a3>] ip_queue_xmit+0x1c3/0x680
[<ffffffff815da3e0>] ? ip_local_out+0xa0/0xa0
[<ffffffff815f4192>] tcp_transmit_skb+0x402/0xa60
[<ffffffff815fa25e>] ? tcp_md5_do_lookup+0x18e/0x1a0
[<ffffffff815f4a94>] tcp_write_xmit+0x1f4/0xa30
[<ffffffff815f5300>] tcp_push_one+0x30/0x40
[<ffffffff815e6672>] tcp_sendmsg+0xe82/0x1040
[<ffffffff81614495>] inet_sendmsg+0x125/0x230
[<ffffffff81614370>] ? inet_create+0x6b0/0x6b0
[<ffffffff8157e6e2>] ? sock_update_classid+0xc2/0x3b0
[<ffffffff8157e750>] ? sock_update_classid+0x130/0x3b0
[<ffffffff81576cdc>] sock_sendmsg+0xdc/0xf0
[<ffffffff81162579>] ? fget_light+0x3f9/0x4f0
[<ffffffff81579ece>] sys_sendto+0xfe/0x130
[<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff8174a0b0>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff810757e3>] ? finish_task_switch+0x83/0xf0
[<ffffffff810757a6>] ? finish_task_switch+0x46/0xf0
[<ffffffff81752cb7>] ? sysret_check+0x1b/0x56
[<ffffffff81752c92>] system_call_fastpath+0x16/0x1b
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for genl "tcp_metrics". No locking
is changed, only that now we can unlink and delete
entries after grace period. We implement get/del for
single entry and dump to support show/flush filtering
in user space. Del without address attribute causes
flush for all addresses, sadly under genl_mutex.
v2:
- remove rcu_assign_pointer as suggested by Eric Dumazet,
it is not needed because there are no other writes under lock
- move the flushing code in tcp_metrics_flush_all
v3:
- remove synchronize_rcu on flush as suggested by Eric Dumazet
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Use proportional rate reduction (PRR) algorithm to reduce cwnd in CWR state,
in addition to Recovery state. Retire the current rate-halving in CWR.
When losses are detected via ACKs in CWR state, the sender enters Recovery
state but the cwnd reduction continues and does not restart.
Rename and refactor cwnd reduction functions since both CWR and Recovery
use the same algorithm:
tcp_init_cwnd_reduction() is new and initiates reduction state variables.
tcp_cwnd_reduction() is previously tcp_update_cwnd_in_recovery().
tcp_ends_cwnd_reduction() is previously tcp_complete_cwr().
The rate halving functions and logic such as tcp_cwnd_down(), tcp_min_cwnd(),
and the cwnd moderation inside tcp_enter_cwr() are removed. The unused
parameter, flag, in tcp_cwnd_reduction() is also removed.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling
of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
|
|
This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -
1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled
2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes
3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket
4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes
5. supporting TCP_FASTOPEN socket option
6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock
7. supporting TCP's TFO cookie option
8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.
The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.
In addition, it includes the following:
1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.
2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.
"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.
"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.
3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes bus_id from mdio platform data, The reason to remove
bus_id is, stmmac mdio bus_id is always same as stmmac bus-id, so there
is no point in passing this in different variable.
Also stmmac ethernet driver connects to phy with bus_id passed its
platform data.
So, having single bus-id is much simpler.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for
the passive open side") changed the initRTO from 3secs to 1sec in
accordance to RFC6298 (former RFC2988bis). This reduced the time till
the last SYN retransmission packet gets sent from 93secs to 31secs.
RFC1122 is stating that the retransmission should be done for at least 3
minutes, but this seems to be quite high.
"However, the values of R1 and R2 may be different for SYN
and data segments. In particular, R2 for a SYN segment MUST
be set large enough to provide retransmission of the segment
for at least 3 minutes. The application can close the
connection (i.e., give up on the open attempt) sooner, of
course."
This patch increases the value of TCP_SYN_RETRIES to the value of 6,
providing a retransmission window of 63secs.
The comments for SYN and SYNACK retries have also been updated to
describe the current settings. The same goes for the documentation file
"Documentation/networking/ip-sysctl.txt".
Signed-off-by: Alexander Bergmann <alex@linlab.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.
This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.
Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit c3def943c7117d42caaed3478731ea7c3c87190e have added support for
new pci ids of the 57840 board, while failing to change the obsolete value
in 'pci_ids.h'.
This patch does so, allowing the probe of such devices.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds dummy functions in of_mdio.h, so that driver need not
ifdef there code with CONFIG_OF.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
target
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
|
|
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.
Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.
This patch improves the situation in the following way:
- If a reassembled packet belongs to a connection that has a helper
assigned, the reassembled packet is passed through the stack instead
of the original fragments.
- During defragmentation, the largest received fragment size is stored.
On output, the packet is refragmented if required. If the largest
received fragment size exceeds the outgoing MTU, a "packet too big"
message is generated, thus behaving as if the original fragments
were passed through the stack from an outside point of view.
- The ipv6_helper() hook function can't receive fragments anymore for
connections using a helper, so it is switched to use ipv6_skip_exthdr()
instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
reassembled packets are passed to connection tracking helpers.
The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.
This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.
IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.
This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
This is a batch of updates intended for 3.7. The bulk of it is
mac80211 changes, including some mesh work from Thomas Pederson and
some multi-channel work from Johannes. A variety of driver updates
and other bits are scattered in there as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
also, remove unused vlan_info definition from header
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver. The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.
For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened. However, we must not activate linkwatch for a
unregistered device, and commit b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't. But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.
The same issue exists with the dormant state.
The proper initialisation sequence, avoiding a race with opening of
the device, is:
rtnl_lock();
rc = register_netdevice(dev);
if (rc)
goto out_unlock;
netif_carrier_off(dev); /* or netif_dormant_on(dev) */
rtnl_unlock();
but it seems silly that this should have to be repeated in so many
drivers. Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.
Commit 22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.
This initialises the operstate synchronously at registration time
only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
|
|
This patch fixes a broken build due to a missing header:
...
CC net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...
The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.
See commit 0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,
Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
|
|
Change since v1:
* Fixed inuse counters access spotted by Eric
In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.
[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693] #3: (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(
Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:
* sklist
* prot inuse counters
The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The helper functions which translate IEEE MDIO Manageable Device (MMD)
Energy-Efficient Ethernet (EEE) registers 3.20, 7.60 and 7.61 to and from
the comparable ethtool supported/advertised settings will be needed by
drivers other than those in PHYLIB (e.g. e1000e in a follow-on patch).
In the same fashion as similar translation functions in linux/mii.h, move
these functions from the PHYLIB core to the linux/mdio.h header file so the
code will not have to be duplicated in each driver needing MMD-to-ethtool
(and vice-versa) translations. The function and some variable names have
been renamed to be more descriptive.
Not tested on the only hardware that currently calls the related functions,
stmmac, because I don't have access to any. Has been compile tested and
the translations have been tested on a locally modified version of e1000e.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.
These call_rcu() were posted by rt_free(struct rtable *rt) calls.
We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.
As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.
To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.
Use NETDEV_UNREGISTER_FINAL with care !
Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.
With help from Gao feng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:
* Remove unnecessary RTNL locking now that we have support
for namespace in nf_conntrack, from Patrick McHardy.
* Cleanup to eliminate unnecessary goto in the initialization
path of several Netfilter tables, from Jean Sacren.
* Another cleanup from Wu Fengguang, this time to PTR_RET instead
of if IS_ERR then return PTR_ERR.
* Use list_for_each_entry_continue_rcu in nf_iterate, from
Michael Wang.
* Add pmtu_disc sysctl option to disable PMTU in their tunneling
transmitter, from Julian Anastasov.
* Generalize application protocol registration in IPVS and modify
IPVS FTP helper to use it, from Julian Anastasov.
* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
helper for NAT support, from Julian Anastasov.
* Add logic to update PMTU for IPIP packets in IPVS, again
from Julian Anastasov.
* A couple of sparse warning fixes for IPVS and Netfilter from
Claudiu Ghioc and Patrick McHardy respectively.
Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
This series contains updates to ethtool.h, e1000, e1000e, and igb to
implement MDI/MDIx control.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Pull drm fixes from Dave Airlie:
"Intel: edid fixes, power consumption fix, s/r fix, haswell fix
Radeon: BIOS loading fixes for UEFI and Thunderbolt machines, better
MSAA validation, lockup timeout fixes, modesetting fixes
One udl dpms fix, one vmwgfx fix, a couple of trivial core changes.
There is an export added to ACPI as part of the radeon bios fixes.
I've also included the fbcon flashing cursor vs deinit race fix, that
seems the simplest place to start"
Trivial conflict in drivers/video/console/fbcon.c due to me having
already applied the fbcon flashing cursor vs deinit race fix, and Dave
had added a comment in there too.
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
fbcon: fix race condition between console lock and cursor timer (v1.1)
drm: Add missing static storage class specifiers in drm_proc.c file
drm/udl: dpms off the crtc when disabled.
drm: Remove two unused fields from struct drm_display_mode
drm: stop vmgfx driver explosion
drm/radeon/ss: use num_crtc rather than hardcoded 6
Revert "drm/radeon: fix bo creation retry path"
drm/i915: use hsw rps tuning values everywhere on gen6+
drm/radeon: split ATRM support out from the ATPX handler (v3)
drm/radeon: convert radeon vfct code to use acpi_get_table_with_size
ACPI: export symbol acpi_get_table_with_size
drm/radeon: implement ACPI VFCT vbios fetch (v3)
drm/radeon/kms: extend the Fujitsu D3003-S2 board connector quirk to cover later silicon stepping
drm/radeon: fix checking of MSAA renderbuffers on r600-r700
drm/radeon: allow CMASK and FMASK in the CS checker on r600-r700
drm/radeon: init lockup timeout on ring init
drm/radeon: avoid turning off spread spectrum for used pll
drm/i915: fall back to bit-banging if GMBUS fails in CRT EDID reads
drm/i915: extract connector update from intel_ddc_get_modes() for reuse
drm/i915: fix hsw uncached pte
...
|
|
Pull SCSI target fixes from Nicholas Bellinger:
"The executive summary includes:
- Post-merge review comments for tcm_vhost (MST + nab)
- Avoid debugging overhead when not debugging for tcm-fc(FCoE) (MDR)
- Fix NULL pointer dereference bug on alloc_page failulre (Yi Zou)
- Fix REPORT_LUNs regression bug with pSCSI export (AlexE + nab)
- Fix regression bug with handling of zero-length data CDBs (nab)
- Fix vhost_scsi_target structure alignment (MST)
Thanks again to everyone who contributed a bugfix patch, gave review
feedback on tcm_vhost code, and/or reported a bug during their own
testing over the last weeks.
There is one other outstanding bug reported by Roland recently related
to SCSI transfer length overflow handling, for which the current
proposed bugfix has been left in queue pending further testing with
other non iscsi-target based fabric drivers.
As the patch is verified with loopback (local SGL memory from SCSI
LLD) + tcm_qla2xxx (TCM allocated SGL memory mapped to PCI HW) fabric
ports, it will be included into the next 3.6-rc-fixes PULL request."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Remove unused se_cmd.cmd_spdtl
tcm_fc: rcu_deref outside rcu lock/unlock section
tcm_vhost: Fix vhost_scsi_target structure alignment
target: Fix regression bug with handling of zero-length data CDBs
target/pscsi: Fix bug with REPORT_LUNs handling for SCSI passthrough
tcm_vhost: Change vhost_scsi_target->vhost_wwpn to char *
target: fix NULL pointer dereference bug alloc_page() fails to get memory
tcm_fc: Avoid debug overhead when not debugging
tcm_vhost: Post-merge review changes requested by MST
tcm_vhost: Fix incorrect IS_ERR() usage in vhost_scsi_map_iov_to_sgl
|
|
Pull NFS client bugfixes from Trond Myklebust:
- NFSv3 mounts need to fail if the FSINFO rpc call fails
- Ensure that the NFS commit cache gets torn down when we unload the
NFS module.
- Fix memory scribble issues when interrupting a LAYOUTGET rpc call
- Fix NFSv4 legacy idmapper regressions
- Fix issues with the NFSv4 getacl command
- Fix a regression when using the legacy "mount -t nfs4"
* tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv3: Ensure that do_proc_get_root() reports errors correctly
NFSv4: Ensure that nfs4_alloc_client cleans up on error.
NFS: return -ENOKEY when the upcall fails to map the name
NFS: Clear key construction data if the idmap upcall fails
NFSv4: Don't use private xdr_stream fields in decode_getacl
NFSv4: Fix the acl cache size calculation
NFSv4: Fix pointer arithmetic in decode_getacl
NFS: Alias the nfs module to nfs4
NFS: Fix a regression when loading the NFS v4 module
NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
pnfs-obj: Better IO pattern in case of unaligned offset
NFS41: add pg_layout_private to nfs_pageio_descriptor
pnfs: nfs4_proc_layoutget returns void
pnfs: defer release of pages in layoutget
nfs: tear down caches in nfs_init_writepagecache when allocation fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull assorted fixes - mostly vfs - from Al Viro:
"Assorted fixes, with an unexpected detour into vfio refcounting logics
(fell out when digging in an analog of eventpoll race in there)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
task_work: add a scheduling point in task_work_run()
fs: fix fs/namei.c kernel-doc warnings
eventpoll: use-after-possible-free in epoll_create1()
vfio: grab vfio_device reference *before* exposing the sucker via fd_install()
vfio: get rid of vfio_device_put()/vfio_group_get_device* races
vfio: get rid of open-coding kref_put_mutex
introduce kref_put_mutex()
vfio: don't dereference after kfree...
mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit
|