Игорь Олемской — практические заметки по системному администрированию Linux CentOS

Архив тега ‘raid’

XenServer 6: Storage repository on software RAID (перепечатка)

Комментариев нет

Although Citrix recommends against using software RAID with XenServer due to performance issues, I've had some pretty awful experiences with hardware RAID cards over the last few years. In addition, the price of software RAID makes it a very desirable solution.

Before you get started, go through the steps to disable GPT. That post also explains an optional adjustment to get a larger root partition (which I would recommend). You cannot complete the steps in this post if your XenServer installation uses GPT.

You should have three partitions on your first disk after the installation:

# fdisk -l /dev/sda
-- SNIP --
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2611    20971520   83  Linux
/dev/sda2            2611        5222    20971520   83  Linux
/dev/sda3            5222       19457   114345281   8e  Linux LVM

Here's a quick explanation of your partitions:

  • /dev/sda1: the XenServer root partition
  • /dev/sda2: XenServer uses this partition for temporary space during upgrades
  • /dev/sda3: your storage repository should be in this logical volume

We need to replicate the same partition structure across each of your drives and the software RAID volume will span the across the third partition on each disk. Copying the partition structure from disk to disk is done easily with sfdisk:

WHOA THERE! NO TURNING BACK! This step is destructive! If your other disks have any data on them, this step will make it (relatively) impossible to retrieve data on those disks again. Back up any data on the other disks in your XenServer machine before running these next commands.

sfdisk -d /dev/sda | sfdisk --force /dev/sdb
sfdisk -d /dev/sda | sfdisk --force /dev/sdc
sfdisk -d /dev/sda | sfdisk --force /dev/sdd

If you have only two disks, stop with /dev/sdb and you'll be making a RAID 1 array. My machine has four disks and I'll be making a RAID 10 array.

We need to destroy the main storage repository, but we need to unplug the physical block device first. Get the storage repository uuid first, then use it to find the corresponding physical block device. Once the physical block device is unplugged, the storage repository can be destroyed:

# xe sr-list name-label=Local\ storage | head -1
uuid ( RO)                : 75264965-f981-749e-0f9a-e32856c46361
# xe pbd-list sr-uuid=75264965-f981-749e-0f9a-e32856c46361 | head -1
uuid ( RO)                  : ff7e9656-c27c-1889-7a6d-687a561f0ad0
# xe pbd-unplug uuid=ff7e9656-c27c-1889-7a6d-687a561f0ad0
# xe sr-destroy uuid=75264965-f981-749e-0f9a-e32856c46361

All of the LVM data from /dev/sda3 should now be gone:

# lvdisplay && vgdisplay && pvdisplay
#

Change the third partition on each physical disk to be a software RAID partition type:

echo -e "t\n3\nfd\nw\n" | fdisk /dev/sda
echo -e "t\n3\nfd\nw\n" | fdisk /dev/sdb
echo -e "t\n3\nfd\nw\n" | fdisk /dev/sdc
echo -e "t\n3\nfd\nw\n" | fdisk /dev/sdd

Stop here and reboot your XenServer box to pick up the new partition changes. Once the server comes back from the reboot, start up a software RAID volume with mdadm:

// RAID 1 for two drives
mdadm --create /dev/md0 -l 1 -n 2 /dev/sda3 /dev/sdb3
// RAID 10 for four drives
mdadm --create /dev/md0 -l 10 -n 4 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

Check to see that your RAID array is building:

# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdd3[3] sdc3[2] sdb3[1] sda3[0]
      228690432 blocks 64K chunks 2 near-copies [4/4] [UUUU]
      [>....................]  resync =  0.3% (694272/228690432) finish=16.4min speed=231424K/sec

Although you don't have to wait for the resync to complete, just be aware that XenServer doesn't do well with a lot of disk I/O within dom0. You may notice unusually slow performance in dom0 until it finishes. Save the array's configuration for reboots:

mdadm --detail --scan > /etc/mdadm.conf

Edit the /etc/mdadm.conf file and append auto=yes to the end of the line (but leave everything on one line):

ARRAY /dev/md0 level=raid10 num-devices=4 metadata=0.90 \
  UUID=2876748c:5117eed5:ce4d62d3:9592bd84 auto=yes

Create a new storage repository on the RAID volume with thin provisioning (thanks to Spherical Chicken for the command):

xe sr-create content-type=user type=ext device-config:device=/dev/md0 shared=false name-label="Local storage"

This command takes some time to complete since it makes logical volumes and then makes an ext3 filesystem for the new storage repository. Bigger RAID arrays will take more time and it's guaranteed to take longer than you'd expect if your RAID array is still building. As soon as it completes, you'll be given the uuid of your new storage repository and it should appear within the XenCenter interface.

TIP: If you run into any problems during reboots, open /boot/extlinux.conf and remove splash and quiet from the label xe boot section. This removes the framebuffer during boot-up and it causes a lot more output to be printed to the console. It won't affect the display once your XenServer box has fully booted.

XenServer 6: Storage repository on software RAID is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Live upgrade Fedora 15 to Fedora 16 using yum (перепечатка)

Комментариев нет

Before we get started, I really ought to drop this here:

Upgrading Fedora via yum is not the recommended method. Your first choice for upgrading Fedora should be to use preupgrade. Seriously.

This begs the question: When should you use another method to upgrade Fedora? What other methods are there?

You have a few other methods to get the upgrade done:

  • Toss in a CD or DVD: You can upgrade via the anaconda installer provided on the CD, DVD or netinstall media. My experiences with this method for Fedora (as well as CentOS, Scientific Linux, and Red Hat) haven't been too positive, but your results may vary.
  • Download the newer release's fedora-release RPM, install it with rpm, and yum upgrade: This is the really old way of doing things. Don't try this (read the next bullet).
  • Use yum's distro-sync functionality: If you can't go the preupgrade route, I'd recommend giving this a try. However, leave plenty of time to fix small glitches after it's done (and after your first reboot).

Personal anecdote time (Keep scrolling for the meat and potatoes)
I have a dedicated server at Joe's Datacenter (love those folks) with IPMI and KVM-over-LAN access. The preupgrade method won't work for me because my /boot partition is on a software RAID volume. There's a rat's nest of a Bugzilla ticket over on Red Hat's site about this problem. I'm really only left with a live upgrade using yum.

Live yum upgrade process
Before even beginning the upgrade, I double-checked that I'd applied all of the available updates for my server. Once that was done, I realized I was one kernel revision behind and I rebooted to ensure I was in the latest Fedora 15 kernel.

A good practice here is to run package-cleanup --orphans (it's in the yum-utils package) to find any packages which don't exist on any Fedora mirrors. In my case, I had two old kernels and a JungleDisk package. I removed the two old kernels (probably wasn't necessary) and left JungleDisk alone (it worked fine after the upgrade). If you have any external repositories, such as Livna or RPMForge, you may want to disable those until the upgrade is done. Should the initial upgrade checks bomb out, try adding as few repositories back in as possible to see if it clears up the problem.

Once you make it this far, just follow the instructions available in Fedora's documentation: Upgrading Fedora using yum. I set SELinux to permissive mode during the upgrade just in case it caused problems.

I'd recommend skipping the grub2-install portion since your original grub installation will still be present after the upgrade. If your server has EFI (not BIOS), don't use grub2 yet. Keep an eye on the previously mentioned documentation page to see if the problems get ironed out between grub2 and EFI.

Before you reboot, be sure to get a list of your active processes and daemons. After your reboot, some old SysVinit scripts will be converted into Systemd service scripts. They might not start automatically and you might need to enable and/or start some services.

New to Systemd? This will be an extremely handy resource: SysVinit to Systemd Cheatsheet.

I haven't seen too many issues after cleaning up some daemons that didn't start properly. There is a problem between asterisk and SELinux that I haven't nailed down yet but it's not a showstopper.

Good luck during your upgrades. Keep in mind that Fedora 15 could be EOL'd as early as May or June 20102 when Fedora 17 is released.

Live upgrade Fedora 15 to Fedora 16 using yum is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Adaptec 5405 and others on Centos 5 (перепечатка)

Комментариев нет

Есть такой популярный контроллер Adaptec 5405, мы его ставим в сервера в больших количествах, соотвественно нужно мониторить состояние массивов на серверах.

Для этого есть nagios и zabbix, а чтобы наши мониторинги могли узнавать статус массивов используем утилиту arcconf от производителя Adaptec.

Покажу на примере CentOS 5.6 x86_64 установку arcconf

http://download.adaptec.com/raid/storage_manager/asm_linux_x64_v7_00_18781.tgz
tar zxf asm_linux_x64_v7_00_18781.tgz
mv cmdline/arcconf /bin/

Если вам нужен Adaptec Storage Manager agent то можете поставить пакетом всё что поставляет Adaptec

 rpm -Uvh manager/StorMan-7.00.x86_64.rpm

Ну и проверить не забудьте:

arcconf GETCONFIG 1 AD
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Channel description                      : SAS/SATA
   Controller Model                         : Adaptec 5405
   Controller Serial Number                 : 0D29116C7A3
   Physical Slot                            : 1
   Temperature                              : 67 C/ 152 F (Normal)
   Installed memory                         : 256 MB
   Copyback                                 : Disabled
   Background consistency check             : Disabled
   Automatic Failover                       : Enabled
   Global task priority                     : High
   Performance Mode                         : Default/Dynamic
   Stayawake period                         : Disabled
   Spinup limit internal drives             : 0
   Spinup limit external drives             : 0
   Defunct disk drive count                 : 0
   Logical devices/Failed/Degraded          : 1/0/0
   SSDs assigned to MaxIQ Cache pool        : 0
   Maximum SSDs allowed in MaxIQ Cache pool : 8
   MaxIQ Read Cache Pool Size               : 0.000 GB
   MaxIQ cache fetch rate                   : 0
   MaxIQ Cache Read, Write Balance Factor   : 3,1
   NCQ status                               : Enabled
   Statistics data collection mode          : Enabled
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (17899)
   Firmware                                 : 5.2-0 (17899)
   Driver                                   : 1.1-5 (24702)
   Boot Flash                               : 5.2-0 (17899)
   --------------------------------------------------------
   Controller Battery Information
   --------------------------------------------------------
   Status                                   : Not Installed

версии под другие платформы и ОС доступны на официальном сайте

Данная инструкция также верна для:
Adaptec RAID 2045
Adaptec RAID 2405
Adaptec RAID 2405Q
Adaptec RAID 2805
Adaptec RAID 5085
Adaptec RAID 51245
Adaptec RAID 51645
Adaptec RAID 52445
Adaptec RAID 5405
Adaptec RAID 5405Z
Adaptec RAID 5445
Adaptec RAID 5445Z
Adaptec RAID 5805
Adaptec RAID 5805Q
Adaptec RAID 5805Z
Adaptec RAID 5805ZQ

Замена диска в Raid на linux md (перепечатка)

Комментариев нет

Если вдруг ваш RAID1 поломался и
cat /proc/mdstat говорит:

md2 : active raid1 sda3[2] sdb3[1]
      726266432 blocks [2/1] [_U]

Т.е. один диск в зеркале U – up, а другой вылетел.

Смотрим более подробнаю информацию о raid:

mdadm -D /dev/md2

То надо менять винт. Для этого выключаем сервер, меняем винт, включаемся и затем:
Смотрим dmesg |grep sda, что новый винт на месте (в моем случае вылетел sda).
Иногда саппорт дата-цента просит сказать серийный номер винта, который надо заменить. Вот так смотрим подробную инфорамацию по жесткому диску:

hdparm -I /dev/sda

Затем копируем информацию о разделах со старого рабочего винта на новый (не ошибитесь в источнике и назначении!):

sfdisk -d /dev/sdb | sfdisk /dev/sda

В моем случае винты были разбиты на три раздела каждый и каждый из них был собран в md-зеркало с соответствующим разделом на другом винте. Теперь подключаем разделы с sda в существующие md-рэйды:

mdadm --manage /dev/md2 --add /dev/sda3

Не ошибитесь с разделами!
И наблюдаем в cat /proc/mdstat как идет синхронизация.

Как создать партицию больше 2ТБ

Комментариев нет

Хорошая статья на эту тему: http://www.cyberciti.biz/tips/fdisk-unable-to-create-partition-greater-2tb.html

И еще: http://blog.gtuhl.com/2007/09/18/big-ext3-partitions-in-opensuse-102/

17.12.2009

Написал Игорь Олемской

Рубрики: Мои записи

Теги: , , , ,

Какой выбрать RAID или почему не использовать RAID-5?

Комментариев нет

Отличная статья на эту тему: http://habrahabr.ru/blogs/hardware/78311/

15.12.2009

Написал Игорь Олемской

Рубрики: Мои записи

Теги: , , , , ,

Удаленный мониторинг хардварного RAID DELL Perc 6/i (LSI) с помощью Nagios и SNMPd (перепечатка)

Комментариев нет

Здесь можно узнать, как мониторить удаленно хардварный RAID DELL Perc 6/i (чип LSI) с помощью утилиты MegaCLI, самописных скриптов, SNMPd и Nagios в Linux (CentOS).

О том, как подготовить Nagios и SNMPd, рассказано в статье о мониторинге софтварного RAID. Туда добавить мне нечего, можно скопипастить инфу оттуда.

А вот скрипт raid_status.pl несколько изменился. Для его корректной работы нужно скачать утилитку LSI MegaCLI for Linux. Так же я ставил симлинк /opt/MegaRAID/MegaCli/MegaCli64 -> /usr/sbin/MegaCLI:

# ln -s /opt/MegaRAID/MegaCli/MegaCli64 /usr/sbin/MegaCli

Теперь надо подготовить два скрипта.

1. Скрипт AWK для приведения выдачи утилиты MegaCLI в удобочитаемый формат.

©одрал не помню откуда. Положить его в /root/scripts/analysis.awk:

# This is a little AWK program that interprets MegaCLI output
/Device Id/ { counter += 1; device[counter] = $3 }
/Firmware state/ { state_drive[counter] = $3 }
/Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }
END {
for (i=1; i<=counter; i+=1)
   printf ( "Device %02d (%s) status is: %s\n", device[i], name_drive[i], state_drive[i]);
}

2. Скрипт парсинга вывода предыдущего скрипта :)

Это уже писано лично ) Положить в /root/scripts/raid_status.pl

#!/usr/bin/perl
open $f,'/usr/sbin/MegaCli -PDList -aALL | /bin/awk -f /root/scripts/analysis.awk |';
$numraid=0;
$ok=1;
%dev=();
while (<$f>) {
   if (/^Device\s+(\d+)\s+\(([^)]+)\)\s+status is: (\w+)/) {
($d,$m,$s)=($1,$2,$3);
      if ($s !~ /Online/i) {
         $ok=0;
         $dev{$d}=$s;
      }
      $numraid++;
   }
}
 
close $f;
 
if ($numraid != 6) {
   print "FAILURE: numraid !=6";
   exit;
}
 
if ($ok!=1) {
   $s='';
   foreach (keys(%dev)) {
      $s.="$_ : ".$dev{$_}.'; ';
   }
   print "FAILURE: $s";
   exit;
}
print "OK";

Убедиться что скрипт выдает ОК. Настроить nagios как указано в начале статьи.

 

Мониторинг Software RAID в Linux с помощью Nagios (перепечатка)

Комментариев нет

Небольшой ликбез по удаленному мониторингу софтварного рейда в Linux с помощью Nagios и SNMPd.

1. Мониторинговый скрипт

[ДАННЫЙ СКРИПТ НЕПРАВИЛЬНЫЙ, НО НЕТ ВРЕМЕНИ ПОФИКСИТЬ]

Для начала подготовим перловый скрипт на машине, которую требуется мониторить. Скрипт  будет проверять рейд и выдавать «OK» или «FAILED: описание» в зависимости от статуса RAID.

#!/usr/bin/perl
open $f,’/proc/mdstat’;
$numraid=0;
$raids=5;
$ok=1;

%dev=();

while (<$f>) {
if (/^(md[0-9])\s*:\s*([a-zA-Z]+)/) {
($d,$s)=($1,$2);
if ($s !~ /active/i) {
$ok=0;
$dev{$d}=$s;
}
$numraid++;
}
}

close $f;

if ($numraid != $raids) {
print «FAILURE: numraid !=$raids»;
exit;
}

if ($ok!=1) {
$s=»;
foreach (keys (%dev)) {
$s.=»$_ : «.$dev{$_}.’; ‘;
}
print «FAILURE: $s»;
exit;
}

print «OK»;

В данном скрипте следует установить переменную $raids в значение общего числа софтварных рейд-массивов в вашей системе. Узнать сколько у вас рейдов можно командой «cat /proc/mdstat».

Положите скрипт в /root/scripts/raid_status.pl

2. Настраиваем SNMPd

/etc/snmp/snmpd.conf:

syslocation  «Cool datacenter»
syscontact  cool@admin.su
sysservices 72

# Здесь нужно перечислить айпишники, с которых позволено обращаться к SNMPd
# secret_pass – пароль комунити

com2sec nagios 1.2.3.4/32 secret_pass
com2sec nagios 127.0.0.1 secret_pass

group pgroup v2c nagios
view all included  .1
access pgroup «»      any       noauth    exact  all all  none

# для мониторинга какого то процесса поставить «proc имя_процесса»
# proc httpd
# proc vsftpd
# для мониторинга партишена поставить disk /partition
# disk /
# disk /var
# Для мониторинга ЛА поставить load 15 10 10
# load 15 10 10
# Для мониторинга свапа поставить swap минимальное_своб_место_свапа_кб
# swap 1500000

# Собственно, наш скрипт мониторинга RAID
exec raid_status /root/scripts/raid_status.pl

master agentx
AgentXSocket tcp:localhost:705

/etc/sysconfig/snmpd.options:

OPTIONS=»-Lf /var/log/snmpd.log»

Далее, стартуем snmpd и добавляем его в стартап:

# /etc/init.d/snmpd start
# chkconfig –level 3 snmpd on

3. Конфигурация удаленного Nagios’a

Задаем команду проверки RAID в commands.cfg:

define command{
command_name check_snmp_raid
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -P 2c -o .1.3.6.1.4.1.2021.
8.1.101.$ARG1$ -C $ARG2$ -r «OK»
}

В конфигурации хоста ourhost.cfg:

# Определяем удаленный хост; 1.1.1.1 – айпи удаленного хоста
define host{
use             server
host_name       our_host
alias           Remote Host with software RAID
address         1.1.1.1
check_command   check-host-alive
}

#… здесь всякие другие проверки ЛА, свапа, свободного места, процессов и пр.

# Собсно мониторинг RAID, secret_pass – пароль комунити удаленного сервера

define service{
use             remote-service
host_name       our_host
service_description RAID status
check_command   check_snmp_raid!1!secret_pass
normal_check_interval       20
}