Игорь Олемской — практические заметки по системному администрированию Linux CentOS

Архив тега ‘linux’

Kerberos for haters (перепечатка)

Комментариев нет

I'll be the first one to admit that Kerberos drives me a little insane. It's a requirement for two of the exams in Red Hat's RHCA certification track and I've been forced to learn it. It provides some pretty nice security features for large server environments. You get central single sign ons, encrypted authentication, and bidirectional validation. However, getting it configured can be a real pain due to some rather archaic commands and shells.

Here's Kerberos in a nutshell within a two-server environment: One server is a Kerberos key distribution center (KDC) and the other is a Kerberos client. The KDC has the list of users and their passwords. Consider a situation where a user tries to ssh into the Kerberos client:

  • sshd calls to pam to authenticate the user
  • pam calls to the KDC for a ticket granting ticket (TGT) to see if the user can authenticate
  • the KDC replies to the client with a TGT encrypted with the user's password
  • pam (on the client) tries to decrypt the TGT with the password that the user provided via ssh
  • if pam can decrypt the TGT, it knows the user is providing the right password

Now that the client has a a TGT for that user, it can ask for tickets to access other network services. What if the user who just logged in wants to access another Kerberized service in the environment?

  • client calls the KDC and asks for a ticket to grant access to the other service
  • KDC replies with two copies of the ticket:
    • one copy is encrypted with the user's current TGT
    • a second copy is encrypted with the password of the network service the user wants to access
  • the client can decrypt the ticket which was encrypted with the current TGT since it has the TGT already
  • client makes an authenticator by taking the decrypted ticket and encrypting it with a timestamp
  • client passes the authenticator and the second copy of the ticket it received from the KDC
  • the other network service decrypts the second copy of the ticket and verifies the password
  • the other network service uses the decrypted ticket to decrypt the authenticator it received from the client
  • if the timestamp looks good, the other network service allows the user access

Okay, that's confusing. Let's take it one step further. Enabling pre-authentication requires that clients send a request containing a timestamp encrypted with the user's password prior to asking for a TGT. Without this requirement, an attacker can ask for a TGT one time and then brute force the TGT offline. Pre-authentication forces the client to send a timestamped request encrypted with the user's password back to the KDC before they can ask for a KDC. This means the attacker is forced to try different passwords when encrypting the timestamp in the hopes that they'll get a TGT to work with eventually. One would hope that you have something configured on the KDC to set off an alarm for multiple failed pre-authentication attempts.

Oh, but we can totally kick it up another notch. What if an attacker is able to give a bad password to a client but they're also able to impersonate the KDC? They could reply to the TGT request (as the KDC) with a TGT encrypted with whichever password they choose and get access to the client system. Enabling mutual authentication stops this attack since it forces the client to ask the KDC for the client's own host principal password (this password is set when the client is configured to talk to the KDC). The attacker shouldn't have any clue what that password is and the attack will be thwarted.

By this point, you're either saying «Oh man, I don't ever want to do this.» or «How do I set up Kerberos?». Stay tuned if you're in the second group. I'll have a dead simple (or as close to dead simple as one can get with Kerberos) how-to on the blog shortly.

In the meantime, here are a few links for extra Kerberos bedtime reading:

Kerberos for haters is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

XenServer 6: Storage repository on software RAID (перепечатка)

Комментариев нет

Although Citrix recommends against using software RAID with XenServer due to performance issues, I've had some pretty awful experiences with hardware RAID cards over the last few years. In addition, the price of software RAID makes it a very desirable solution.

Before you get started, go through the steps to disable GPT. That post also explains an optional adjustment to get a larger root partition (which I would recommend). You cannot complete the steps in this post if your XenServer installation uses GPT.

You should have three partitions on your first disk after the installation:

# fdisk -l /dev/sda
-- SNIP --
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2611    20971520   83  Linux
/dev/sda2            2611        5222    20971520   83  Linux
/dev/sda3            5222       19457   114345281   8e  Linux LVM

Here's a quick explanation of your partitions:

  • /dev/sda1: the XenServer root partition
  • /dev/sda2: XenServer uses this partition for temporary space during upgrades
  • /dev/sda3: your storage repository should be in this logical volume

We need to replicate the same partition structure across each of your drives and the software RAID volume will span the across the third partition on each disk. Copying the partition structure from disk to disk is done easily with sfdisk:

WHOA THERE! NO TURNING BACK! This step is destructive! If your other disks have any data on them, this step will make it (relatively) impossible to retrieve data on those disks again. Back up any data on the other disks in your XenServer machine before running these next commands.

sfdisk -d /dev/sda | sfdisk --force /dev/sdb
sfdisk -d /dev/sda | sfdisk --force /dev/sdc
sfdisk -d /dev/sda | sfdisk --force /dev/sdd

If you have only two disks, stop with /dev/sdb and you'll be making a RAID 1 array. My machine has four disks and I'll be making a RAID 10 array.

We need to destroy the main storage repository, but we need to unplug the physical block device first. Get the storage repository uuid first, then use it to find the corresponding physical block device. Once the physical block device is unplugged, the storage repository can be destroyed:

# xe sr-list name-label=Local\ storage | head -1
uuid ( RO)                : 75264965-f981-749e-0f9a-e32856c46361
# xe pbd-list sr-uuid=75264965-f981-749e-0f9a-e32856c46361 | head -1
uuid ( RO)                  : ff7e9656-c27c-1889-7a6d-687a561f0ad0
# xe pbd-unplug uuid=ff7e9656-c27c-1889-7a6d-687a561f0ad0
# xe sr-destroy uuid=75264965-f981-749e-0f9a-e32856c46361

All of the LVM data from /dev/sda3 should now be gone:

# lvdisplay && vgdisplay && pvdisplay
#

Change the third partition on each physical disk to be a software RAID partition type:

echo -e "t\n3\nfd\nw\n" | fdisk /dev/sda
echo -e "t\n3\nfd\nw\n" | fdisk /dev/sdb
echo -e "t\n3\nfd\nw\n" | fdisk /dev/sdc
echo -e "t\n3\nfd\nw\n" | fdisk /dev/sdd

Stop here and reboot your XenServer box to pick up the new partition changes. Once the server comes back from the reboot, start up a software RAID volume with mdadm:

// RAID 1 for two drives
mdadm --create /dev/md0 -l 1 -n 2 /dev/sda3 /dev/sdb3
// RAID 10 for four drives
mdadm --create /dev/md0 -l 10 -n 4 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

Check to see that your RAID array is building:

# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdd3[3] sdc3[2] sdb3[1] sda3[0]
      228690432 blocks 64K chunks 2 near-copies [4/4] [UUUU]
      [>....................]  resync =  0.3% (694272/228690432) finish=16.4min speed=231424K/sec

Although you don't have to wait for the resync to complete, just be aware that XenServer doesn't do well with a lot of disk I/O within dom0. You may notice unusually slow performance in dom0 until it finishes. Save the array's configuration for reboots:

mdadm --detail --scan > /etc/mdadm.conf

Edit the /etc/mdadm.conf file and append auto=yes to the end of the line (but leave everything on one line):

ARRAY /dev/md0 level=raid10 num-devices=4 metadata=0.90 \
  UUID=2876748c:5117eed5:ce4d62d3:9592bd84 auto=yes

Create a new storage repository on the RAID volume with thin provisioning (thanks to Spherical Chicken for the command):

xe sr-create content-type=user type=ext device-config:device=/dev/md0 shared=false name-label="Local storage"

This command takes some time to complete since it makes logical volumes and then makes an ext3 filesystem for the new storage repository. Bigger RAID arrays will take more time and it's guaranteed to take longer than you'd expect if your RAID array is still building. As soon as it completes, you'll be given the uuid of your new storage repository and it should appear within the XenCenter interface.

TIP: If you run into any problems during reboots, open /boot/extlinux.conf and remove splash and quiet from the label xe boot section. This removes the framebuffer during boot-up and it causes a lot more output to be printed to the console. It won't affect the display once your XenServer box has fully booted.

XenServer 6: Storage repository on software RAID is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

XenServer 6: Disable GPT and get a larger root partition (перепечатка)

Комментариев нет

XenServer 6 is a solid virtualization platform, but the installer doesn't give you many options for customized configurations. By default, it installs with a 4GB root partition and uses GUID Partition Tables (GPT). GPT is new in XenServer 6.

I'd rather use MBR partition tables and get a larger root partition. If you want to make these adjustments in your XenServer 6 installation, follow these steps after booting into the XenServer 6 install disc:

xenserver_install_01
When the installer initially boots, press F2 to access the advanced installation options.

xenserver_install_02
Type shell and press enter. The installer should begin booting into a pre-installation shell where you can make your adjustments.


Once you've booted into the pre-installation shell, type vi /opt/xensource/installer/constants.py and press enter.

xenserver_install_05
Change GPT_SUPPORT = True to GPT_SUPPORT = False to disable GPT and use MBR partition tables. Adjust the value of root_size from 4096 (the default) to a larger number to get a bigger root partition. The size is specified in MB, so 4096 is 4GB. Save the file and exit vim.


Type exit and the installer should start.

Once the installation is complete, you should have a bigger root partition on a MBT partition table:

# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              20G  1.8G   17G  10% /
# fdisk -l /dev/sda
 
Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2611    20971520   83  Linux
/dev/sda2            2611        5222    20971520   83  Linux
/dev/sda3            5222       19457   114345281   8e  Linux LVM

XenServer 6: Disable GPT and get a larger root partition is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Native IPv6 connectivity in Mikrotik's RouterOS (перепечатка)

Комментариев нет

It's no secret that I'm a big fan of the Routerboard devices and the RouterOS software from Mikrotik that runs on them. The hardware is solid, the software is stable and feature-rich, and I found a great vendor that ships quickly.

I recently added a RB493G (~ $230 USD) to sit in front of a pair of colocated servers. The majority of the setup routine was the same as with my previous devices except for the IPv6 configuration.

In the past, I've set up IPv6 tunnels with Hurricane Electric and it's been mostly a cut-and-paste operation from the sample configuration in their IPv6 tunnel portal. Setting up native IPv6 involved a little more legwork.

If your provider will give you two /64's or an entire /48, getting IPv6 connectivity for your WAN/LAN interfaces is simple. However, if you can only get one /64, you'll have to see if your provider can route it to you via your Mikrotik's link local interface (I wouldn't recommend this for many reasons).

I split my Mikrotik into two interfaces: wan and lanbridge. The lanbridge bridge joins all of the LAN ethernet ports (ether2-9 on the RB493G) and the wan interface connects to the upstream switch.

My configuration:

/ipv6 address
add address=2001:DB8:0:1::2/64 advertise=yes disabled=no eui-64=no interface=wan
add address=2001:DB8:0:2::1/64 advertise=yes disabled=no eui-64=no interface=lanbridge
/ipv6 route
add disabled=no distance=1 dst-address=::/0 gateway=2001:DB8:0:1::1 scope=30 \
  target-scope=10
/ipv6 nd
add advertise-dns=no advertise-mac-address=yes disabled=no hop-limit=64 \
  interface=all managed-address-configuration=no mtu=unspecified \
  other-configuration=no ra-delay=3s ra-interval=3m20s-10m ra-lifetime=30m \
  reachable-time=unspecified retransmit-interval=unspecified
/ipv6 nd prefix default
set autonomous=yes preferred-lifetime=1w valid-lifetime=4w2d

Explanation:

/ipv6 address
add address=2001:DB8:0:1::2/64 advertise=yes disabled=no eui-64=no interface=wan
add address=2001:DB8:0:2::1/64 advertise=yes disabled=no eui-64=no interface=lanbridge

These two lines configure the IPv6 addresses for the firewall's interfaces. My provider's router holds the 2001:DB8:0:1::1/64 address and routes the remainder of that /64 to me via 2001:DB8:0:1::2/64. The second /64 is on the lanbridge interface and my LAN devices take their IP addresses from that block. My provider routes that second /64 to me via the 2001:DB8:0:1::2/64 IP on my wan interface.

/ipv6 route
add disabled=no distance=1 dst-address=::/0 gateway=2001:DB8:0:1::1 scope=30 \
  target-scope=10

I've set a gateway for IPv6 traffic so that the Mikrotik knows where to send internet-bound IPv6 traffic (in this case, to my ISP's core router).

/ipv6 nd
add advertise-dns=no advertise-mac-address=yes disabled=no hop-limit=64 \
  interface=lanbridge managed-address-configuration=no mtu=unspecified \
  other-configuration=no ra-delay=3s ra-interval=3m20s-10m ra-lifetime=30m \
  reachable-time=unspecified retransmit-interval=unspecified
/ipv6 nd prefix default
set autonomous=yes preferred-lifetime=1w valid-lifetime=4w2d

These last two lines configure the neighbor discovery on my lanbridge interface. This allows my LAN devices to do stateless autoconfiguration (which gives them an IPv6 address as well as the gateway).

Want to read up on IPv6?

Native IPv6 connectivity in Mikrotik's RouterOS is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Getting online with a CradlePoint PHS-300 and an AT&T USBConnect Mercury (перепечатка)

Комментариев нет

Anyone who has used a 3G ExpressCard or USB stick knows how handy they can be when you need internet access away from home (and away from Wi-Fi). I've run into some situations recently where I needed to share my 3G connection with more than one device without using internet sharing on my MacBook Pro.

That led me to pick up a CradlePoint PHS-300 (discontinued by the manufacturer, but available from Amazon for about $35). It's compatible with my AT&T USBConnect Mercury (a.k.a. Sierra Wireless Compass 885/885U) USB stick.

Configuring the PHS-300 was extremely easy since I could just associate with the wireless network and enter the password printed on the bottom of the unit. However, getting the 3G stick to work was an immense pain. If you're trying to pair up these products, these steps should help:

  • Access the PHS-300's web interface
  • Click the Modem tab
  • Click Settings on the left
  • Click Always on under Reconnect Mode
  • Uncheck Aggressive Modem Reset
  • Put the following into the AT Dial Script text box:
    ATE0V1&F&D2&C1S0=0
    ATDT*99***1#
  • Add ISP.CINGULAR to the Access Point Name (APN) box
  • Flip the Connect Mode under Dual WiMAX/3G Settings to 3G Only
  • Scroll up and push Save Settings and then Reboot Now

Once the PHS-300 reboots, the USB stick may light up, then turn off, and the display on the PHS-300 might show a red light for the 3G card. Wait about 10-15 seconds for the light to turn green. The lights on the 3G stick should be glowing and blinking as well.

So how did I figure this out?

After scouring Google search results, Sierra Wireless FAQ's, CradlePoint's support pages, and trolling through minicom (yes, minicom), I thought I'd try connecting with my MacBook Pro using the 3G Watcher application provided by Sierra Wireless. Before connecting, I opened up Console.app and watched the ppp.log file. Sure enough, two lines popped up that were quite relevant to my interests:

Fri Dec 16 00:37:51 2011 : Initializing phone: ATE0V1&F&D2&C1S0=0
Fri Dec 16 00:37:51 2011 : Dialing: ATDT*99***1#

I didn't have the exact initialization string in the PHS-300 and that was the cause of the failure the entire time.

If you'd like to talk to your USBConnect Mercury stick with minicom, just install minicom from macports (sudo port -v install minicom) and start it up like so:

sudo minicom -D /dev/cu.sierra04

For other Sierra Wireless cards and adapters, there's a helpful page on Sierra Wireless' site for Eee PC users.

Getting online with a CradlePoint PHS-300 and an AT&T USBConnect Mercury is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Live upgrade Fedora 15 to Fedora 16 using yum (перепечатка)

Комментариев нет

Before we get started, I really ought to drop this here:

Upgrading Fedora via yum is not the recommended method. Your first choice for upgrading Fedora should be to use preupgrade. Seriously.

This begs the question: When should you use another method to upgrade Fedora? What other methods are there?

You have a few other methods to get the upgrade done:

  • Toss in a CD or DVD: You can upgrade via the anaconda installer provided on the CD, DVD or netinstall media. My experiences with this method for Fedora (as well as CentOS, Scientific Linux, and Red Hat) haven't been too positive, but your results may vary.
  • Download the newer release's fedora-release RPM, install it with rpm, and yum upgrade: This is the really old way of doing things. Don't try this (read the next bullet).
  • Use yum's distro-sync functionality: If you can't go the preupgrade route, I'd recommend giving this a try. However, leave plenty of time to fix small glitches after it's done (and after your first reboot).

Personal anecdote time (Keep scrolling for the meat and potatoes)
I have a dedicated server at Joe's Datacenter (love those folks) with IPMI and KVM-over-LAN access. The preupgrade method won't work for me because my /boot partition is on a software RAID volume. There's a rat's nest of a Bugzilla ticket over on Red Hat's site about this problem. I'm really only left with a live upgrade using yum.

Live yum upgrade process
Before even beginning the upgrade, I double-checked that I'd applied all of the available updates for my server. Once that was done, I realized I was one kernel revision behind and I rebooted to ensure I was in the latest Fedora 15 kernel.

A good practice here is to run package-cleanup --orphans (it's in the yum-utils package) to find any packages which don't exist on any Fedora mirrors. In my case, I had two old kernels and a JungleDisk package. I removed the two old kernels (probably wasn't necessary) and left JungleDisk alone (it worked fine after the upgrade). If you have any external repositories, such as Livna or RPMForge, you may want to disable those until the upgrade is done. Should the initial upgrade checks bomb out, try adding as few repositories back in as possible to see if it clears up the problem.

Once you make it this far, just follow the instructions available in Fedora's documentation: Upgrading Fedora using yum. I set SELinux to permissive mode during the upgrade just in case it caused problems.

I'd recommend skipping the grub2-install portion since your original grub installation will still be present after the upgrade. If your server has EFI (not BIOS), don't use grub2 yet. Keep an eye on the previously mentioned documentation page to see if the problems get ironed out between grub2 and EFI.

Before you reboot, be sure to get a list of your active processes and daemons. After your reboot, some old SysVinit scripts will be converted into Systemd service scripts. They might not start automatically and you might need to enable and/or start some services.

New to Systemd? This will be an extremely handy resource: SysVinit to Systemd Cheatsheet.

I haven't seen too many issues after cleaning up some daemons that didn't start properly. There is a problem between asterisk and SELinux that I haven't nailed down yet but it's not a showstopper.

Good luck during your upgrades. Keep in mind that Fedora 15 could be EOL'd as early as May or June 20102 when Fedora 17 is released.

Live upgrade Fedora 15 to Fedora 16 using yum is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Getting apache, PHP, and memcached working with SELinux (перепечатка)

Комментариев нет

SELinux PenguinI'm using SELinux more often now on my Fedora 15 installations and I came up against a peculiar issue today on a new server. My PHP installation is configured to store its sessions in memcached and I brought over some working configurations from another server. However, each time I accessed a page which tried to initiate a session, the page load would hang for about a minute and I'd find this in my apache error logs:

[Thu Sep 08 03:23:40 2011] [error] [client 11.22.33.44] PHP Warning:
Unknown: Failed to write session data (memcached). Please verify that
the current setting of session.save_path is correct (127.0.0.1:11211)
in Unknown on line 0

I ran through my usual list of checks:

  • netstat showed memcached bound to the correct ports/interfaces
  • memcached was running and I could reach it via telnet
  • memcached-tool could connect and pull stats from memcached
  • double-checked my php.ini
  • tested memcached connectivity via a PHP and ruby script — they worked

Even after all that, I still couldn't figure out what was wrong. I ran strace on memcached while I ran a curl against the page which creates a session and I found something significant — memcached wasn't seeing any connections whatsoever at that time. A quick check of the lo interface with tcpdump showed the same result. Just before I threw a chair, I remembered one thing:

SELinux.

A quick check for AVC denials showed the problem:

# aureport --avc | tail -n 1
4021. 09/08/2011 03:23:38 httpd system_u:system_r:httpd_t:s0 42 tcp_socket name_connect system_u:object_r:memcache_port_t:s0 denied 31536

I'm far from being a guru on SELinux, so I leaned on audit2allow for help:

# grep memcache /var/log/audit/audit.log | audit2allow
 
#============= httpd_t ==============
#!!!! This avc can be allowed using one of the these booleans:
#     httpd_can_network_relay, httpd_can_network_memcache, httpd_can_network_connect
 
allow httpd_t memcache_port_t:tcp_socket name_connect;

The boolean we're looking for is httpd_can_network_memcache. Flipping the boolean can be done in a snap:

# setsebool -P httpd_can_network_memcache 1
# getsebool httpd_can_network_memcache
httpd_can_network_memcache --> on

After adjusting the boolean, apache was able to make connections to memcached without a hitch. My page which created sessions loaded quickly and I could see data being stored in memcached. If you want to check the status of all of the apache-related SELinux booleans, just use getsebool:

# getsebool -a | grep httpd | grep off$
allow_httpd_anon_write --> off
allow_httpd_mod_auth_ntlm_winbind --> off
allow_httpd_mod_auth_pam --> off
allow_httpd_sys_script_anon_write --> off
httpd_can_check_spam --> off
httpd_can_network_connect_cobbler --> off
httpd_can_network_connect_db --> off
httpd_can_network_relay --> off
httpd_can_sendmail --> off
httpd_dbus_avahi --> off
httpd_enable_ftp_server --> off
httpd_enable_homedirs --> off
httpd_execmem --> off
httpd_read_user_content --> off
httpd_setrlimit --> off
httpd_ssi_exec --> off
httpd_tmp_exec --> off
httpd_unified --> off
httpd_use_cifs --> off
httpd_use_gpg --> off
httpd_use_nfs --> off

If you're interested in SELinux, a good way to get your feet wet is to head over to the CentOS Wiki and review their SELinux Howtos

Getting apache, PHP, and memcached working with SELinux is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Highlight IP addresses with a double click in Firefox (перепечатка)

Комментариев нет

My daily work involves working with a large number of servers and one of my frustrations with Firefox is that it's not possible to select an entire IP address with a double click with the default settings. Although it works right out of the box with Safari, you have to make a configuration adjustment in Firefox to get the same behavior.

To change the setting in Firefox, open up a new Firefox tab and go to about:config in the browser. Paste word_select.stop in the search bar that appears below your tab bar and double click the layout.word_select.stop_at_punctuation line. It should become bold and the value on the end will flip from true to false.

Go back to another tab and open a web page which displays an IP address. Double click on any portion of the IP address and Firefox should highlight the entire address.

Highlight IP addresses with a double click in Firefox is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Xen Summit 2011: My Takeways (перепечатка)

Комментариев нет

Xen Summit 2011 LogoQuite a few people who couldn't make it to Xen Summit 2011 this year asked me to write a post summarizing my takeaways from the event. I'm not generally one to back down from peer pressure, so read on if you're interested about the discussions at this year's Summit.

The feeling I had at last year's summit is that Xen was on the verge of losing traction in the market. Very few distributions still had Xen support going forward and much of the discussion was around the lack of dom0 support in upstream Linux kernels. Distribution vendors were hesitant to drag patches forward into modern kernels and this made it much more difficult to get Xen working for many people.

Major at the Golden Gate BridgeThis year was quite different. The number of attendees was up, the venue was much better, and there was an obvious buzz of energy in the room. As many of the presenters noted, this excitement stemmed from the upstream dom0 support in Linux 3.0. This inclusion is a huge win and it helps to drive Xen forward since the developers don't have to worry about dragging patches forward. They can focus on improving performance, adding features, and tightening security.

Many of the discussions this year focused on security and performance. Ian Pratt discussed Xen's ability to view memory pages of virtual machines via an API to detect malware running inside the instance. Memory pages could be identified and marked as not executable or applications could be triggered when a VM attempts to touch a particular memory page. Also, the whole VM could be frozen if needed.

There's also a big push to bring code out of the dom0 and push it into utility VM's. Driver domains could manage the network or I/O infrastructure and this would further reduce the amount of privileged code actively running in dom0. There is already very little code required for the Xen hypervisor itself (much much less than the Linux kernel — I'm looking at you, KVM) and this reduces the attack surface for potential compromises of the hypervisor. Some projects even aim to restart driver domains multiple times per minute to ensure that any malicious code injected into those virtual machines can't exist for long periods.

Pradeep Vincent from Amazon talked about how Amazon uses Xen and the pain points they have with its current architecture. Much of his discussion was around scaling problems (and we see many of the same issues at Rackspace). Higher performance could easily be gained by multi-threaded operations in dom0 when attaching block devices and creating virtual network interfaces. He also saw some areas for performance gains in the pvops I/O code.

Quite a few of the talks centered on the ARM architecture and what Xen is able to do on those systems after Samsung published their port in 2008. HVM is on the way for ARM and it might even show up in Xen 4.2. Some demos of Xen on mobile phones from Samsung were amazing. They showed how an attacker could compromise the web browser on the phone with a keylogger, but that application was running in a VM. Once the user switched back to the phone's main menu, the keylogger couldn't access the keystrokes any longer. After that, a simple close of the browser killed the VM and destroyed the malicious code.

Xen 4.2 should be available in early 2012 and the feature list is staggering. Improvements to libxenlight, pvops performance (even in HVM), and guest memory sharing should be available with the new release. Nested virtualization (run a hypervisor inside a hypervisor) is also coming in Xen 4.2 and I'm sure Xzibit will be a huge fan. This should streamline hypervisor testing, allow for embedded hypervisor options and extend the capabilities of client hypervisors. Remus should be available in 4.2 as well, but it might be marked as experimental. OVMF will be added as a BIOS option for UEFI (along with the standard SeaBIOS) and this should allow for Mac OS X guests. UEFI allows Windows to boot faster since it switches to PV mode sooner and it allows for simpler platform certification for software vendors.

Mike McClurg's presentation on XCP was pretty important to me since Rackspace is a big consumer of XenServer. If you're not familiar with XCP, it's basically open-source XenServer which runs on bleeding edge (and sometimes unstable) components. XCP 1.5 and XenServer 6 should be available in November with Xen 4.1 and Linux 2.6.32. GPU passthrough, up to 1TB RAM, and disaster recovery will be available. Another goal for the XCP team is to work closely with OpenStack via Project Olympus. Mike's vision is to have XCP become the configuration of choice for open source clouds. Project Kronos was also extremely interesting. It's essentially XCP's XenAPI stack running on Debian and Ubuntu. You'd be able to install either OS on a physical server and run XCP's services on it for a fully OSS hypervisor.

Konrad Wilk gave an update on Linux pvops and it appears there is a shift to get Xen working well on a desktop. This includes 3D graphics support, S3/hibernate capabilities and various bug fixes. There's also a push to get PV functionality into HVM and get HVM functionality into PV. Driver/device domains were discussed again in Patrick Kolp's talk and he had plenty of graphs showing performance changes when regularly restarting device domains. The performance dips were almost negligible with 10 second restarts and the security gains were significant.

There were several other great presentations on other topics like GlusterFS, OpenStack Nova, and Linpicker (from the NSA!). If these types of things interests you, keep your eyes peeled for Xen Summit 2012 next year. The weather in the bay area is well worth the trip. ;)

Xen Summit 2011: My Takeways is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

Xen 4.1 on Fedora 15 with Linux 3.0 (перепечатка)

Комментариев нет

If you haven't noticed already, full Xen dom0 support was added in the Linux 3.0 kernel. This means there's no longer a need to drag patches forward from old kernels and work from special branches and git repositories when building a kernel for dom0.

Something else you might not have noticed is that the Fedora kernel team has quietly slipped Linux 3.0 into Fedora 15's update channels in disguise. Click that link, scroll down, and you'll see «Rebase to 3.0. Version reports as 2.6.40 for compatibility with older userspace.» Although I'm not a fan of calling something what it isn't (2.6.40 doesn't exist on kernel.org), I can understand some of the reasoning behind the choice.

This change makes the Xen installation on Fedora 15 pretty trivial. To get started, update your kernel to the latest if you're not already on Fedora's 2.6.40 kernels:

yum -y upgrade kernel

We need three more packages (quite a few dependencies will roll in with them):

yum -y install xen libvirt python-virtinst

The xen package reels in the hypervisor itself along with libraries and command line tools (like xl and xm). Libvirt gives us easy access to VM management with the virsh command and python-virtinst gives us the handy virt-install command to make OS installations easy.

Once those packages are installed, we need to make some adjustments in your grub configuration. Open /boot/grub/menu.lst in your text editor of choice and add something like this at the bottom:

title Fedora + Xen (2.6.40-4.fc15.x86_64)
        root (hd0,1)
	kernel /boot/xen.gz
        module /boot/vmlinuz-2.6.40-4.fc15.x86_64 ro root=/dev/sda1
        module /boot/initramfs-2.6.40-4.fc15.x86_64.img

Ensure that the root (hd0,1) is applicable to your system (adjust it if it isn't). Also, check the kernel version to ensure it matches your installed kernel and adjust the root= portion to match your root volume. Flip the default line to a value which will boot your new grub entry and ensure the timeout is set to a reasonable number if you need to temporarily switch back to your original grub entry at boot time. (Hey, we all make mistakes.)

I take one extra precaution and change the UPDATEDEFAULT=yes line to no in /etc/sysconfig/kernel. This ensures that future kernel updates don't trample the entry you've just made. Keep in mind that you'll need to manually update your grub configuration when you do kernel upgrades later.

Cross your fingers and reboot. If your system doesn't reboot properly, reboot it again and choose your old kernel from the grub menu. Double-check your configuration for fat-fingering and give it another try. If your system boots and pings but you have no output via a monitor, don't fret. There's a patch for the problem which should appear soon in Linux 3.0. The impatient can snag a kernel source RPM, add the patch file, and build a local kernel (or you can download my local build from when I did it).

Log in and verify that you booted into the dom0:

[root@xenbox ~]# xm dmesg | head -n 5
 __  __            _  _    _   _   ____     __      _ ____
 \ \/ /___ _ __   | || |  / | / | |___ \   / _| ___/ | ___|
  \  // _ \ '_ \  | || |_ | | | |__ __) | | |_ / __| |___ \
  /  \  __/ | | | |__   _|| |_| |__/ __/ _|  _| (__| |___) |
 /_/\_\___|_| |_|    |_|(_)_(_)_| |_____(_)_|  \___|_|____/

Once you're done with that, make sure libvirtd is running:

/etc/init.d/libvirtd start; chkconfig libvirtd on

Try installing a VM:

virt-install \
  --paravirt \
  --name=testvm \
  --ram=512 \
  --vcpus=4 \
  --file /dev/vmstorage/testvm \
  --graphics vnc,port=5905 --noautoconsole \
  --autostart --noreboot \
  --location=http://mirrors.kernel.org/debian/dists/squeeze/main/installer-amd64/

You should have a VM installation underway pretty quickly and it will be visible via port 5905 on the local host. Enjoy the power and freedom of your brand new type 1 hypervisor.

Xen 4.1 on Fedora 15 with Linux 3.0 is a post from: Major Hayden's Racker Hacker blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.