A videographer from CNN Money stopped by the office today to ask about what makes Rackspace a unique place to work. As soon as we got started, everyone started to make as many distractions as they could to crack me up. Very few succeeded.
Thanks to for snapping the photo.
is a post from: Major Hayden's blog.
Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.
Quite a few people who couldn't make it to this year asked me to write a post summarizing my takeaways from the event. I'm not generally one to back down from peer pressure, so read on if you're interested about the discussions at this year's Summit.
The feeling I had at last year's summit is that Xen was on the verge of losing traction in the market. Very few distributions still had Xen support going forward and much of the discussion was around the lack of dom0 support in upstream Linux kernels. Distribution vendors were hesitant to drag patches forward into modern kernels and this made it much more difficult to get Xen working for many people.
This year was quite different. The number of attendees was up, the , and there was an obvious buzz of energy in the room. As many of the presenters noted, this excitement stemmed from the . This inclusion is a huge win and it helps to drive Xen forward since the developers don't have to worry about dragging patches forward. They can focus on improving performance, adding features, and tightening security.
Many of the discussions this year focused on security and performance. Ian Pratt discussed Xen's ability to view memory pages of virtual machines via an API to detect malware running inside the instance. Memory pages could be identified and marked as not executable or applications could be triggered when a VM attempts to touch a particular memory page. Also, the whole VM could be frozen if needed.
There's also a big push to bring code out of the dom0 and push it into utility VM's. Driver domains could manage the network or I/O infrastructure and this would further reduce the amount of privileged code actively running in dom0. There is already very little code required for the Xen hypervisor itself (much much less than the Linux kernel — I'm looking at you, ) and this reduces the attack surface for potential compromises of the hypervisor. Some projects even aim to restart driver domains multiple times per minute to ensure that any malicious code injected into those virtual machines can't exist for long periods.
Pradeep Vincent from talked about how Amazon uses Xen and the pain points they have with its current architecture. Much of his discussion was around scaling problems (and we see many of the same issues at ). Higher performance could easily be gained by multi-threaded operations in dom0 when attaching block devices and creating virtual network interfaces. He also saw some areas for performance gains in the pvops I/O code.
Quite a few of the talks centered on the ARM architecture and what Xen is able to do on those systems after . HVM is on the way for ARM and it might even show up in Xen 4.2. Some demos of Xen on mobile phones from Samsung were amazing. They showed how an attacker could compromise the web browser on the phone with a keylogger, but that application was running in a VM. Once the user switched back to the phone's main menu, the keylogger couldn't access the keystrokes any longer. After that, a simple close of the browser killed the VM and destroyed the malicious code.
Xen 4.2 should be available in early 2012 and the feature list is staggering. Improvements to libxenlight, pvops performance (even in HVM), and guest memory sharing should be available with the new release. Nested virtualization (run a hypervisor inside a hypervisor) is also coming in Xen 4.2 and I'm sure Xzibit will be a huge fan. This should streamline hypervisor testing, allow for embedded hypervisor options and extend the capabilities of client hypervisors. Remus should be available in 4.2 as well, but it might be marked as experimental. OVMF will be added as a BIOS option for UEFI (along with the standard SeaBIOS) and this should allow for Mac OS X guests. UEFI allows Windows to boot faster since it switches to PV mode sooner and it allows for simpler platform certification for software vendors.
Mike McClurg's presentation on was pretty important to me since Rackspace is a big consumer of . If you're not familiar with XCP, it's basically open-source XenServer which runs on bleeding edge (and sometimes unstable) components. XCP 1.5 and XenServer 6 should be available in November with Xen 4.1 and Linux 2.6.32. GPU passthrough, up to 1TB RAM, and disaster recovery will be available. Another goal for the XCP team is to work closely with OpenStack via Project Olympus. Mike's vision is to have XCP become the configuration of choice for open source clouds. was also extremely interesting. It's essentially XCP's XenAPI stack running on Debian and Ubuntu. You'd be able to install either OS on a physical server and run XCP's services on it for a fully OSS hypervisor.
Konrad Wilk gave an update on Linux pvops and it appears there is a shift to get Xen working well on a desktop. This includes 3D graphics support, S3/hibernate capabilities and various bug fixes. There's also a push to get PV functionality into HVM and get HVM functionality into PV. Driver/device domains were discussed again in Patrick Kolp's talk and he had plenty of graphs showing performance changes when regularly restarting device domains. The performance dips were almost negligible with 10 second restarts and the security gains were significant.
There were several other great presentations on other topics like , , and (from the NSA!). If these types of things interests you, keep your eyes peeled for Xen Summit 2012 next year. The is well worth the trip.
is a post from: Major Hayden's blog.
Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.
If you work for a growing company like I do, it's inevitable that you'll have to do your fair share of interviewing. I love it when I leave an interview with a good feeling about the candidate. That «wow, they really nailed it» feeling is always great to have when you need to fill a critical role. Most often, the successful candidates are the ones who do their homework before they ever walk in our office doors.
What do I mean by «do your homework?» Here are some bullet points to get you on your way:
Know what the company does.
This one is critical and it should be easy. However, make sure to do thorough research first. For example, if you interviewed at a company like Apple, becoming familiar with their hardware lineup should be a no-brainer. That's their bread and butter. On the other hand, remember that Apple isn't solely a hardware company; they write lots of software, provide online productivity services, and they distribute music, movies, and other entertainment media.
While you're doing this research, try to discover what makes the company unique. Sure, sells laptops and desktops (just like a lot of other companies), but what makes their particular products unique? Is there something unique about the way they provide their services? Have they cornered a certain market segment by providing a combination of products and services to that group of consumers? Answering these simple questions may help you tip the scales in the interview process.
Try one or more of the company's products.
The feasibility of trying a company's product before an interview could be debatable. For example, if you wanted to interview at , you probably don't need to drop $2M USD on your own before walking in the door. For companies where the barrier to entry for purchasing a product is much lower, such as cloud computing companies, there's no excuse to not try things out first. Amazon has a and a Rackspace Cloud Server could cost you .
It's concerning when I talk to an applicant about a job working with Rackspace's Cloud Servers and they haven't tried out any cloud products from any provider. How can I take a candidate's interest seriously when they haven't shown interest in any portion of my group's market segment?
Know what the company's competitors do.
It's often more impressive to an interviewer to know what a company's competitors are doing and how it compares to what that company is doing in the market. For example, if you can walk into an interview and say «I like the way your company makes these widgets, but Company X is able to make them more lightweight, and I value that more than the added customer service your company offers.» This shows the interviewer that you're familiar with various products in the segment and you've used them enough to understand what makes them different.
Some of you might be thinking: «Why would I say something like that to the interviewer? They'll think I'm being too negative about their product.» That's always possible, but you can guard against it by wording everything carefully. Make sure you have a solid reason for the way you feel that is based on something substantial (usability, price, features, etc). I've had candidates talk for five to ten minutes about why one of our product is inferior to one of our competitors' products and I was very impressed.
One quick gotcha: your interviewer might turn your comments back on you and ask you how you would improve one of the inferior products (I do this regularly). Make sure that you're prepared for that question and consider offering up a suggestion before the question is presented to you.
Can't get the information you need? Ask!
When you reach the end of the interview and the interviewer asks if you have questions, be sure to ask any questions about topics you had trouble researching. Going back to the Cray example, compare what you know about an XE6 to servers you've used before. You could mention a problem you had with the density of your previous configurations and ask how they overcame that hurdle at Cray. If it's a proprietary trade secret, you might not get an answer, but they'll know that you did some serious research ahead of time. If they can share the answer, they might still be impressed, and you might end up learning something you didn't know prior to the interview.
Conclusion
In summary, doing your homework and learning about the company shows the interviewers that you not only have what it takes to do the work, but that the work interests you as well. I've interviewed folks in the past who lacked on technical ability but had plenty of desire and drive. More often than not, those people are now Rackers.
is a post from: Major Hayden's blog.
Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.
Anyone who says management is easy obviously hasn't done it for very long or they're not doing their job very well. Coordinating the activities and personal development of each person on the team is always a challenge and it introduces an unbelievable number of variables into an already difficult job. However, watching members of the team grow and succeed in their work is tremendously rewarding.
Taking on the job of a technical manager presents its own unique challenges. It's easy for a technical manager to lose focus and get down in the weeds of daily work. It's also very difficult to let go of the reins and resign to the fact that the direct involvement in technical work will have to be reduced.
These problems resonate with me as I've recently taken on another technical management role at Rackspace. My previous experience involved managing a team of technicians at various skill levels who were working on customer environments made up of dedicated servers and network equipment. The current position has quite a few differences. I'm now managing a small group of highly technical and extremely dedicated Linux engineers and we're responsible for maintaining the systems and networks which run the Cloud Servers product.
One of my goals of this blog is to help others learn things much more easily than I have. Here are some things I've had to learn the hard way while working as a technical manager:
Get out of the mindset of an individual contributor
When you're a system administrator on a team (or by yourself), you're often judged on your personal job performance. Team interaction is important for some companies (especially at Rackspace), but not for others. Breaking the mindset of being an individual contributor was extremely difficult for me to do.
Managers are judged on the success of the team as a whole. Encouraging your team members to succeed and playing an active role in their personal and professional development is key. Each time you find yourself buried in the weeds of a problem rather than facilitating your team's work on the problem is when your performance as a manager will drop. If you do it more often, you may find that your team members aren't getting the support they need.
Don't be afraid of your team becoming smarter than you
One of the biggest things I've heard from my team is: «Aren't you worried about losing your technical skills when you're a manager?» My answer: «Of course.» Anyone who has technical abilities will always be afraid of watching those abilities wane over time. However, as your team becomes stronger, you should be able to continue learning not through your own work, but through theirs. When your team members see that you're still interested in learning and you're now able to learn from them, they'll become more energized about their own work.
If you find yourself thinking negatively about a potential job candidate because they're smarter than you, step back and think for a moment. Put your own ego aside and consider what's best for you, your team, and your company. Your goal is to build a strong and successful team, not to pad your own ego. If your managers are judging you (as a technical manager) on your technical ability, then you need to solve that problem first.
Inspire instead of direct
Every manager faces the challenge of working with team members who disagree with a particular company policy or with the direction of their particular infrastructure. Keep in mind that your team members are probably not intending to be insubordinate and they might have something useful to contribute.
When you find yourself locking horns with your team members, inspire a discussion about the problem. Break out the disagreement onto a whiteboard and let the team make suggestions for improvements. Even if the entire discussion leads back to the fact that the original problem is inevitable, fostering that feedback loop is critical. You'll learn more about your team while they find ways to express their opinions and feel empowered.
The really tough part is when your team comes up with an alternative plan and you find yourself presenting to your leadership team. Always remember to take it seriously and know that you may need to refine the plan many times over before you find something acceptable for your team and the business.
De-stress by staying on task
If you're anything like me, you need some way to keep tabs on action items coming from meetings, e-mails, phone calls, and walk-ups. I've heard great things about applications like and , but I settled on . I really enjoy a strong to-do list which allows me to set priorities, due dates, and write extended notes about a particular task.
The best way to tackle a wall of tasks is to keep them organized into a concise list. Even if it's a small task, get it into your list so it's on your radar and you won't forget it. Work through the simple tasks and the high priority ones first but watch out for tasks with due dates.
Conclusion
All of these processes get easier over time and although your job will surely have challenges and pitfalls, the enjoyment will greatly increase. I feel privileged to lead a team of talented people who work on a complex and ever-expanding product.
Also, I'd like to mention that I'm not an expert on management! There are probably much better ways to do much of this than I've outlined in this post. Be sure to share your ideas in the comments section below.
is a post from: Major Hayden's blog.
Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.
The first day of FUDCon 2011 in Tempe is coming to a close tonight and I'm completely exhausted. , I'll try to summarize the day and cover the talks which I attended.
The day started out with «State of Fedora» address. The audio has already been , but the speech was very positive overall. He talked about some of the struggles that have happened in the past and how they'll probably happen again in some form or another. It was pretty inspirational and you could obviously tell that people in the room were energized by it.
After the address, all of the talks were pitched in . It was a very efficient and entertaining way to create a schedule for the conference. Everyone had 15-20 seconds to present their talk and then they had to rush outside to post their topic on the wall. We all had the opportunity to go outside and vote for the talks that sounded interesting. Once the votes were tallied, the schedule was set and the conference was fully underway.
The first talk for me was about . (Note: If you Google for BoxGrinder, make sure that you enter it as a single word. You'll get some wild unrelated results if you use two words.) In short, BoxGrinder gives you the ability to have a -ish method for automatically building images for virtual machine environments. It's completely , so you can have different platform and delivery plugins depending on where your VM needs to be deployed. For example, you could deploy a VM with BoxGrinder that is in a format for VMWare (platform) and is delivered to the target server via SFTP (delivery). The public cloud plugins are only compatible with Amazon's products, but I'm eager to change that during one of the upcoming hackfests.
The talk started up right after lunch and although it was interesting, I think it left most people with quite a few questions when it was over. However, I think people are generally apprehensive when anyone tries to do anything innovative with storage. Losing data due to a bug is a big concern and many of the questions went deeper into data safety than performance and functionality.
Next up was talk about the different implementations of python. This was definitely an eye-opening talk for my coworker and I. Dave covered CPython, Jython, PyPy and various other implementations and compared their advantages and disadvantages. I'm still pretty new to Python (I'm clutching on to ruby, PHP and perl still), but this talk really had me thinking about which implementations are best for a particular environment or task. It was quite a bit of fun to learn about some of the deep underpinnings of Python and how they differ depending on the specific implementation.
talk about was very intriguing. I've been a fan of recently, but I eventually moved away due to a lack of enterprise features and degrading performance. Jeff is working to add in encryption and authentication without rewriting the filesystem itself. There are quite a few tricky problems involved in the encryption portion due to partial writes and general security during the handshake process. CloudFS could potentially be a network filesystem which could be shared by multiple tenants with their own individual namespaces and segregated UID's. This could be a big win for providers as they could offer up large amounts of storage in an organized fashion without too many management headaches.
We wrapped up the day of talks with presentation about . In short, it's a bag of daemons that allow you to manage multiple public or private clouds. Everything from image management to provisioning are included in the project. Questions were raised about whether another application was needed since vendor-specific libraries are abundant and libcloud offers many of the same features in a simpler package.
Tonight's social event was FUDPub at ASU's Memoral Union building. The food and drinks were excellent (thanks to !) and it was a great opportunity to relax and talk with other Fedora users and developers. We had the opportunity to meet people from around the world while playing round after round of bowling and billiards. The discussions were extremely valuable, but as I said before, it was quite tiring.
I've compiled the FUDCon photos I've taken into a .
That's the end of today's summary. I'll try to keep this going tomorrow as well. Thanks for reading!
is a post from: Major Hayden's blog.
Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.
Diagram: OpenVPN to Rackspace Cloud Servers and Slicehost
A recent inspired me to write a quick how-to for Fedora users on using OpenVPN to talk to instances privately in the Rackspace Cloud.
The diagram at the right gives an idea of what this guide will allow you to accomplish. Consider a situation where you want to talk to the MySQL installation on db1 directly without requiring extra ssh tunnels or MySQL over SSL via the public network. If you tunnel into one of your instances, you can utilize the private network to talk between your instances very easily.
There's one important thing to keep in mind here: even though you'll be utilizing the private network between your tunnel endpoint and your other instances, your traffic will still traverse the public network. That means that the instance with your tunnel endpoint will still get billed for the traffic flowing through your tunnel.
You'll only need the openvpn package on the server side:
yum -y install openvpn
Throw down this simple configuration file into /etc/openvpn/server.conf:
port 1194
proto tcp
dev tun
persist-key
persist-tun
server 10.66.66.0 255.255.255.0
ifconfig-pool-persist ipp.txt
#push "route 10.0.0.0 255.0.0.0"
push "route 10.176.0.0 255.248.0.0"
keepalive 10 120
ca /etc/openvpn/my_certificate_authority.pem
cert /home/major/vpn_server_cert.pem
key /home/major/vpn_server_key.pem
dh /etc/openvpn/easy-rsa/2.0/keys/dh1024.pem
status log/openvpn-status.log
verb 3
Here's a bit of explanation for some things you may want to configure:
push — These are the routes that will be sent over the VPN that are pushed to the clients. If you don't use any IP addresses in the 10.0.0.0/8 network block in your office, you can probably use the commented out line above. However, you may want to be more specific with the routes if you happen to use any 10.0.0.0/8 space in your office.
server — These are the IP addresses that the VPN server will assign and NAT out through the private interface. I've used a /24 above, but you may want to adjust the netmask if you have a lot of users making tunnels to your VPN endpoint.
ca, cert, key — You will need to create a certificate authority as well as a certificate/key pair for your VPN endpoint. I already use on my Mac to manage some other CA's and certificates, but you can use scripts if you wish. They are already included with the openvpn installation.
Build your Diffie-Hellman parameters file:
cd /etc/openvpn/easy-rsa/2.0/ && ./build-dh
Tell iptables that you want to NAT your VPN endpoint traffic out to all 10.x.x.x IP addresses on the private network:
The last step on the server side is to ensure that the kernel will forward packets from the VPN endpoint out through the private interface. Ensure that your /etc/sysctl.conf looks like this:
# Controls IP packet forwarding
net.ipv4.ip_forward = 1
Adjusting your sysctl.conf ensures that forwarding is enabled at boot time, but you'll need to enable it on your VPN endpoint right now:
echo 1 > /proc/sys/net/ipv4/ip_forward
Start the openvpn server:
/etc/init.d/openvpn start
If all is well, you should see openvpn listening on port 1194:
You'll need to configure a client to talk to your VPN now. This involves three steps: creating a new certificate/key pair for the client (same procedure as making your server certificates), signing the client's certificate with your CA certificate (same one that you used above to sign your server certificates), and then configuring your client application to access the VPN.
There are many openvpn clients out there to choose from.
If you're using a Linux desktop, you may want to consider using the . For Mac users, I'd highly recommend using ($9), but there's also (free).
Last night's about my charity drive for the was the 400th post on my blog! I started posting on rackerhacker.com way back in the spring of 2007 shortly after I was hired by in December of 2006.
My main purpose for the blog at the beginning was to create a place where I could write quick articles about problems I found and how to fix them. Most of the people around me were using their own handy systems to store notes (Stickies on the Mac, Tomboy notes on Linux, or just simple text files), but they weren't able to share them easily. I wanted a way to write up a solution and instantly share it with someone. I also wanted that person to be able to pass along the fix to someone else if they wanted.
Needless to say, it took off from there.
It's important to note that I couldn't have done this by myself. I've learned some efficient strategies for managing large systems and troubleshooting complex issues from my peers, my managers, and colleagues outside of Rackspace. There have been many triumphs and there have been quite a few failures.
The failures have taught me the most. I've made some pretty large mistakes and here are a few:
inserted data into a MySQL slave in an active replication pair
run a fsck on an online ext3 partition
marked a failed drive online in a hardware RAID array
mangled Plesk installations in ways that you can't comprehend
typed 'reboot' into a terminal and pressed enter, only to realize I was in the wrong terminal
ran UPDATE statements without a WHERE clause in MySQL (well, I only did this one twice)
Even after all that, people occasionally tell me that I'm very good at what I do. I don't know if that's true or not, but I'm glad some people think so! Many of those folks end up asking me this question:
How do I learn how to be a successful Linux systems administrator?
My answer is this: Be humble. Always be thirsty for knowledge. Don't be afraid to make mistakes. Love what you do and the people you serve.
On most systems, using Fedora's
href="http://fedoraproject.org/wiki/PreUpgrade">preupgrade package is the most reliable way to update to the next Fedora release. However, this isn't the case with Slicehost and Rackspace Cloud Servers.
Here are the steps for an upgrade from Fedora 13 to Fedora 14 via yum:
If you happen to be upgrading a 32-bit instance on Slicehost, simply replace x86_64 with i386 in the url shown above.
href="http://rackerhacker.com/2010/11/03/upgrading-fedora-13-to-fedora-14-on-slicehost-and-rackspace-cloud-servers/">Upgrading Fedora 13 to Fedora 14 on Slicehost and Rackspace Cloud Servers is a post from: Major Hayden's
href="http://rackerhacker.com">Racker Hacker blog.
Today, on my 28th birthday, I'm finally delivering on a promise to my readers which I made about two months ago. I've on how to host a web application redundantly in a cloud environment. While it's still a bit of a rough draft, it should be a good starting point for those who haven't worked in virtualized environments before. Also, it may show some of the more experienced systems administrators a new way to do things.
The guide:
As always, if you find anything in the guide that needs improvement, I'm all ears.
High availability is certainly not a new concept, but if there's one thing that frustrates me with high availability VM setups, it's storage. If you don't mind going active-passive, you can set up , toss your favorite filesystem on it, and you're all set.
If you want to go active-active, or if you want multiple nodes active at the same time, you need to use a clustered filesystem like , or . These are certainly good options to consider but they're not trivial to implement. They usually rely on additional systems and scripts to provide reliable and capabilities.
What about the rest of us who want multiple active VM's with simple replicated storage that doesn't require any additional elaborate systems? This is where really shines. GlusterFS can ride on top of whichever filesystem you prefer, and that's a huge win for those who want a simple solution. However, that means that it has to use , and that will limit your performance.
Let's get this thing started!
Consider a situation where you want to run a WordPress blog on two VM's with load balancers out front. You'll probably want to use GlusterFS's replicated volume mode (RAID 1-ish) so that the same files are on both nodes all of the time. To get started, build two small Slicehost slices or Rackspace Cloud Servers. I'll be using Fedora 13 in this example, but the instructions for other distributions should be very similar.
First things first — be sure to set a new root password and update all of the packages on the system. This should go without saying, but it's important to remember. We can clear out the default iptables ruleset since we will make a customized set later:
# iptables -F
# /etc/init.d/iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables: [ OK ]
GlusterFS communicates over the network, so we will want to ensure that traffic only moves over the private network between the instances. We will need to add the private IP's and a special hostname for each instance to /etc/hosts on both instances. I'll call mine gluster1 and gluster2:
10.xx.xx.xx gluster1
10.xx.xx.xx gluster2
You're now ready to install the required packages on both instances:
Make the directories for the GlusterFS volumes on each instance:
mkdir -p /export/store1
We're ready to make the configuration files for our storage volumes. Since we want the same files on each instance, we will use the --raid 1 option. This only needs to be run on the first node:
# glusterfs-volgen --name store1 --raid 1 gluster1:/export/store1 gluster2:/export/store1
Generating server volfiles.. for server 'gluster2'
Generating server volfiles.. for server 'gluster1'
Generating client volfiles.. for transport 'tcp'
Once that's done, you'll have four new files:
booster.fstab — you won't need this file
gluster1-store1-export.vol — server-side configuration file for the first instance
gluster2-store1-export.vol — server-side configuration file for the second instance
store1-tcp.vol — client side configuration file for GlusterFS clients
Copy the gluster1-store1-export.vol file to /etc/glusterfs/glusterfsd.vol on your first instance. Then, copy gluster2-store1-export.vol to /etc/glusterfs/glusterfsd.vol on your second instance. The store1-tcp.vol should be copied to /etc/glusterfs/glusterfs.vol on both instances.
At this point, you're ready to start the GlusterFS servers on each instance:
/etc/init.d/glusterfsd start
You can now mount the GlusterFS volume on both instances:
mkdir -p /mnt/glusterfs
glusterfs /mnt/glusterfs/
You should now be able to see the new GlusterFS volume in both instances:
# df -h /mnt/glusterfs
Filesystem Size Used Avail Use% Mounted on
/etc/glusterfs/glusterfs.vol
9.4G 831M 8.1G 10% /mnt/glusterfs
As a test, you can create a file on your first instance and verify that your second instance can read the data:
If you remove that file on your second instance, it should disappear from your first instance as well.
Obviously, this is a very simple and basic implementation of GlusterFS. You can increase performance by making dedicated VM's just for serving data and you can adjust the default performance options when you mount a GlusterFS volume. Limiting access to the GlusterFS servers is also a good idea.
If you want to read more, I'd recommend reading the and the .
Thank you for your e-mails! I'll be expanding on this post later with some sample benchmarks and additional tips/tricks, so please stay tuned.