Различия

Здесь показаны различия между двумя версиями данной страницы.

--- wiki:руководство_по_ubuntu_server:установка:отчет_о_падении_ядра [2012/05/12 23:10]
+++ wiki:руководство_по_ubuntu_server:установка:отчет_о_падении_ядра [2013/02/19 10:16]
[Проверка механизма отчета о падении ядра]
@@ Строка 6: / Строка 6: @@
 ====Введение====
-A Kernel Crash Dump refers to a portion of the contents of volatile memory (RAM) that is copied to disk whenever the execution of the kernel is disrupted. The following events can cause a kernel disruption :
+Отчет о падении ядра является частью содержимого оперативной памяти (RAM), которая копируется на диск всякий раз, когда выполнение ядра прерывается. Следующие события могут стать причиной остановки ядра:
-    Kernel Panic
+  -- Критическая ошибка ядра (Kernel Panic)
+  -- Незамаскированное прерывание (NMI)
+  -- Фатальная ошибка проверки машины (MCE)
+  -- Отказ оборудования
+  -- Ручное вмешательство
-    Non Maskable Interrupts (NMI)
+Некоторые из этих событий (паника, NMI) ядро обрабатывает автоматически и запускает механизм сохранения отчета через kexec. В других случаях требуется ручное вмешательство для захвата памяти. Всякий раз как что-то из перечисленного происходит, важно найти основную причину с целью предотвращения такого события снова. Причина может быть определена с помощью инспектирования содержимого памяти.
-    Machine Check Exceptions (MCE)
+====Механизм отчета о падении ядра====
-    Hardware failure
+Когда происходит критическая ошибка ядра, ядро полагается на механизм kexec для быстрой перезагрузки новой копии ядра в предварительно зарезервированную секцию памяти, которая выделяется при загрузке системы. Это позволяет существующей памяти остаться нетронутой с целью безопасного копирования ее содержимого в файл.
-    Manual intervention
+====Установка====
-For some of those events (panic, NMI) the kernel will react automatically and trigger the crash dump mechanism through kexec. In other situations a manual intervention is required in order to capture the memory. Whenever one of the above events occurs, it is important to find out the root cause in order to prevent it from happening again. The cause can be determined by inspecting the copied memory contents.
+Утилита сохранения отчета о падении ядра устанавливается следующей командой:
-====Kernel Crash Dump Mechanism====
+<code>sudo apt-get install linux-crashdump</code>
-When a kernel panic occurs, the kernel relies on the kexec mechanism to quickly reboot a new instance of the kernel in a pre-reserved section of memory that had been allocated when the system booted (see below). This permits the existing memory area to remain untouched in order to safely copy its contents to storage.
+После этого потребуется перезагрузка.
-====Installation====
+====Настройка====
-The kernel crash dump utility is installed with the following command:
+Никакой дальнейшей настройки не требуется чтобы разрешить механизм отчета о падении ядра.
-sudo apt-get install linux-crashdump
+====Проверка====
-A reboot is then needed.
+Для подтверждения, что механизм отчета о падении ядра доступен, надо проверить несколько вещей. Во-первых, убедитесь, что указан загрузочный параметр crashkernel. (заметьте, что строка внизу разделена на две для сохранения форматирования документа):
-====Configuration====
+<code>cat /proc/cmdline
-No further configuration is required in order to have the kernel dump mechanism enabled.
-====Verification====
-To confirm that the kernel dump mechanism is enabled, there are a few things to verify. First, confirm that the crashkernel boot parameter is present (note: The following line has been split into two to fit the format of this document:
-cat /proc/cmdline
 BOOT_IMAGE=/vmlinuz-3.2.0-17-server root=/dev/mapper/PreciseS-root ro
- crashkernel=384M-2G:64M,2G-:128M
+ crashkernel=384M-2G:64M,2G-:128M</code>
-The crashkernel parameter has the following syntax:
+Параметр crashkernel имеет следующий синтаксис:
-crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
+<code>crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
-    range=start-[end] 'start' is inclusive and 'end' is exclusive.
+    range=start-[end] 'start' is inclusive and 'end' is exclusive.</code>
-So for the crashkernel parameter found in /proc/cmdline we would have :
+Таким образом с параметром crashkernel, найденным в файле /proc/cmdline мы имеем:
-crashkernel=384M-2G:64M,2G-:128M
+<code>crashkernel=384M-2G:64M,2G-:128M</code>
-The above value means:
+Значение выше означает:
-    if the RAM is smaller than 384M, then don't reserve anything (this is the "rescue" case)
+  -- если оперативная память меньше 384МБ, то ничего не резервировать (это вариант "спасения")
+  -- если оперативная память между 384МБ и 2ГБ (привилегированная), то зарезервировать 64МБ
+  -- если оперативная память больше 2ГБ, то зарезервировать  128МБ
-    if the RAM size is between 386M and 2G (exclusive), then reserve 64M
+Во-вторых, убедитесь, что ядро зарезервировало требуемую память для kdump ядра, выполнив:
-    if the RAM size is larger than 2G, then reserve 128M
+<code>dmesg | grep -i crash
-Second, verify that the kernel has reserved the requested memory area for the kdump kernel by doing:
-dmesg | grep -i crash
 ...
-[    0.000000] Reserving 64MB of memory at 800MB for crashkernel (System RAM: 1023MB)
+[    0.000000] Reserving 64MB of memory at 800MB for crashkernel (System RAM: 1023MB)</code>
-====Testing the Crash Dump Mechanism====
+====Проверка механизма отчета о падении ядра====
-Testing the Crash Dump Mechanism will cause a system reboot. In certain situations, this can cause data loss if the system is under heavy load. If you want to test the mechanism, make sure that the system is idle or under very light load.
+<note important>Проверка механизма отчета о падении ядра вызовет перезагрузку системы. В определенных ситуациях это может привести к потере данных если система будет сильно загружена. Если вы хотите проверить этот механизм, убедитесь, что система простаивает или загружена очень слабо.</note>
+Убедитесь, что механизм SysRQ включен, посмотрев значение параметра ядра /proc/sys/kernel/sysrq:
+<code>cat /proc/sys/kernel/sysrq</code>
-Verify that the SysRQ mechanism is enabled by looking at the value of the /proc/sys/kernel/sysrq kernel parameter :
+Если возвращается значение 0, свойство отключено. Включите его следующей командой:
+<code>sudo sysctl -w kernel.sysrq=1</code>
-cat /proc/sys/kernel/sysrq
+Как только закончите с этим, вам придется стать суперпользователем (root), поскольку будет недостаточно использовать только ''sudo''.
+От имени пользователя root вам нужно выполнить команду ''echo c > /proc/sysrq-trigger''. Если вы используете сетевое соединение, вы потеряете связь с системой. Именно поэтому лучше проводить тест с системной консоли. Это позволит сделать процесс отчета о падении ядра видимым.
-If a value of 0 is returned the feature is disabled. Enable it with the following command :
+Типичный вывод теста будет выглядеть следующим образом:
+<code>sudo -s
-sudo sysctl -w kernel.sysrq=1
-Once this is done, you must become root, as just using sudo will not be sufficient. As the root user, you will have to issue the command echo c > /proc/sysrq-trigger. If you are using a network connection, you will lose contact with the system. This is why it is better to do the test while being connected to the system console. This has the advantage of making the kernel dump process visible.
-A typical test output should look like the following :
-sudo -s
 [sudo] password for ubuntu:
 # echo c > /proc/sysrq-trigger
@@ Строка 95: / Строка 85: @@
 [   31.662668] Oops: 0002 [#1] SMP
 [   31.662668] CPU 1
-....
+....</code>
-The rest of the output is truncated, but you should see the system rebooting and somewhere in the log, you will see the following line :
-Begin: Saving vmcore from kernel crash ...
-Once completed, the system will reboot to its normal operational mode. You will then find Kernel Crash Dump file in the /var/crash directory :
-ls /var/crash
+Дальнейший вывод отрезан, но вы можете посмотреть перезагрузку системы и где-нибудь в журнале вы сможете найти следующую строчку:
-linux-image-3.0.0-12-server.0.crash
-====Resources====
+<code>Begin: Saving vmcore from kernel crash ...</code>
-Kernel Crash Dump is a vast topic that requires good knowledge of the linux kernel. You can find more information on the topic here :
+После завершения система перезагрузится в нормальный рабочий режим. После чего вы сможете найти файл отчета о падении ядра в каталоге /var/crash:
-    Kdump kernel documentation.
+<code>ls /var/crash
+linux-image-3.0.0-12-server.0.crash</code>
+====Ссылки====
-    The crash tool
+Отчет о падении ядра - обширная тема, требующая хорошего знания ядра линукс. Вы сможете найти больше информации по следующим ссылкам:
-    Analyzing Linux Kernel Crash (Based on Fedora, it still gives a good walkthrough of kernel dump analysis)
+  -- [[http://www.kernel.org/doc/Documentation/kdump/kdump.txt|Документация по kdump]].
+  -- [[http://people.redhat.com/~anderson/|Утилита crush]]
+  -- [[http://www.dedoimedo.com/computers/crash-analyze.html|Анализ падений ядра линукс]] (Основано на дистрибутиве Fedora, однако предоставляет хороший критический анализ исследований отчетов падения ядра)
 ----

Отчет о падении ядра Сравнение версий

Различия