Age | Commit message (Collapse) | Author |
|
F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5
bank errors. Reuse functionality from the other families.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Now that all decoders have been taught about F14h, models < 0x10
MCEs, enable decoding on this family of CPUs. Also, issue a short
informational message upon boot that MCE decoding gets enabled.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Those are N/A on K8, so don't decode them there.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add support for decoding F14h BU MCEs and improve decoding of the
remaining families.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
F14h CPUs do not generate LS MCEs so exit early and warn the user in
case this path is ever hit that something else might be going haywire.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add support for IC MCEs for F14h CPUs. K8 and F10h are almost identical
so use one function for both.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add a per-family data cache decoders. Since there is a certain overlap
between the different DC MCE signatures, reuse functionality between the
families as far as possible.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Drop "edac_" string from the filenames since they're prefixed with edac/
in their pathname anyway.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add sysfs injection facilities for testing of the MCE decoding code.
Remove large parts of amd64_edac_dbg.c, as a result, which did only
NB MCE injection anyway and the new injection code supports that
functionality already.
Add an injection module so that MCE decoding code in production kernels
like those in RHEL and SLES can be tested.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Move toplevel sysfs class to the stub and make it available to
non-modularized code too. Add proper refcounting of its users and move
the registration functionality into the reference counting routines.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
... instead of the MCi_STATUS info only for improved handling of certain
types of errors later.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Clean up error codes names, shorten to mnemonics, add RRRR boundary
checking.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Remove remains from previous functionality.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
.. so that the user knows what she's looking at there in dmesg. Also,
fix a minor cosmetic output inconsistency.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
We should return a negative value when we cannot get the toplevel edac
sysfs class.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
The patch below updates broken web addresses in the kernel
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Array of udimm sysfs attributes was not ended with NULL marker, leading to
dereference of random memory.
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2
BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name
RIP: 0010:[<ffffffff81330b36>] [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
(...)
Call Trace:
[<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1
[<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2
[<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557
[<ffffffff81428901>] i7core_probe+0xccf/0xec0
RIP [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
---[ end trace 20de320855b81d78 ]---
Kernel panic - not syncing: Attempted to kill init!
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
f4347553b30ec66530bfe63c84530afea3803396 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.
Fix that by adding a balancing check to the workqueue removal path.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Due to the current edac-core limits, we cannot represent a per-channel
memory size, for FB-DIMM drivers. So, we need to sum-up all values
for each slot, in order to properly represent the total amount of
memory found by the i7300 driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
It is still somewhat fake, as the pages may not be on this exact order,
and may even be used in mirror mode, but this is a best guess than the
other random fake values.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.
Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
So far we only provide num_k8_northbridges. This is required in
different areas (e.g. L3 cache index disable, GART). But not all AMD
CPUs provide a GART. Thus it is useful to split off the GART handling
from the generic caching of AMD northbridge misc devices.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160254.GC4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This is basically a cleanup patch, improving the comments for each
function.
While here, do a few cleanups.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This change should do no functional change. It just rearranges the
contents of the c file, in order to make easier to understand and
maintain it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of dynamically allocating a buffer for it where needed,
just allocate it once. As we'll use the same buffer also during
fatal and non-fatal errors, is is very risky to dynamically allocate
it during an error.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
As the error logic in this driver came from i5400 driver, it
were using one function to get errors, and another to display.
Let's make it simpler and avoid doing it into two steps.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
While here, do some cleanup by adding some macros to check
for device features.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
There's no mention at the datasheet about how to enable global error
reporting. So, I'm assuming that those errors are always enabled.
Maybe I'm plain wrong about that ;)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
When the Overflow MCi_STATUS bit is set, EDAC reports the lost error
with a "no information available" message which often puzzles users
parsing the dmesg. This doesn't make much sense since this error has
been lost anyway so no need for reporting it separately. Thus, report
the overflow bit setting in the MCE dump instead. While at it, remove
reporting of MiscV and ErrorEnable (en) which are superfluous.
Now it looks like this:
[ 1501.650024] MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
[ 1501.666887] Northbridge Error, node 2
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Limit MCE error decoding to current and older families only (K8-F11h).
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
mmc_spi: Fix unterminated of_match_table
of/sparc: fix build regression from of_device changes
of/device: Replace struct of_device with struct platform_device
|
|
Simply add proper IDs into the device table.
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
-EIO is not the only error code that pci_enable_device() may return, also
the set of errors can be enhanced in future. We should compare return
code with zero, not with concrete error value.
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Jeff Roberson <jroberson@jroberson.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
-EIO is not the only error code that pci_enable_device() may return, also
the set of errors can be enhanced in future. We should compare return
code with zero, not with concrete error value.
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Jeff Roberson <jroberson@jroberson.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|