Having successfully installed OpenStack all-in-one with PackStack, I started to try out the multi-node deployment. It ended up much longer time than I thought because of various issues mainly with networking. The following summarizes what I did to make it work, and some tricks and tips I found out during the process.
While waiting for my physical machines ready, I created 3 virtual machines for first deployment: one for controller and the other two for nova compute. As the underlying ESXi is still version 5.0, a new line is needed into the /etc/vmware/config file to expose virtualization instruction set to virtual machine to run KVM: vhv.allow=”TRUE”. Because the cluster was shared with others and I didn’t have root password, it’s not easy to change and reset the hypervisor. More importantly, Openstack works just fine without the exposure as it would fall back to QEMU emulation which is much slower than KVM. For a testing purpose, I am just trying out functionalities so it’s totally fine.
Bothered by SLOW Web UI to manage vSphere? Want to manage ALL your VMware vCenters, AWS, Azure, Openstack, container behind a SINGLE pane of glass? Want to search, analyze, report, visualize VMs, hosts, networks, datastores, events as easily as Google the Web? Find out more about vSearch 3.0: the search engine for all your private and public clouds.
After installing CentOS 6.4 on the 3 virtual machines, I made a snapshot for each of the virtual machines just in case I need to re-install. As it turned out, it’s quite helpful.
Still, my end goal is not to play Openstack on virtual, but on physical. So when 3 blades are ready, I moved onto physical environment. The following reflects what I tried and learned in both environments.
As a starting point, I followed this well-written article. It documented the steps to use PackStack to install OpenStack on CentOS 6.4 – exactly what I wanted to know.
The installation using Packstack is pretty straight forward with the following commands.
# vim /etc/selinux/config SELINUX=disabled # yum install -y http://rdo.fedorapeople.org/openstack/openstack-grizzly/rdo-release-grizzly-3.noarch.rpm # yum install -y openstack-packstack # packstack --gen-answer-file=/root/grizzly_openstack.cfg # vim /root/grizzly_openstack.cfg # packstack --answer-file=/root/grizzly_openstack.cfg
The important part is to modify the configuration file for your environment, for example, the IP address of the host to install components like controller, nova compute, quantum server, etc. Coming each line in the configuration is a comment telling what the parameter is about, so it’s not that difficult to figure out. Most of the parameters come with default values.
One problem I got into is I included an extra space in the comma separated host IP address for the nova compute. It actually picked up the space as part of the IP and used it as such in some URL therefore caused a little problem. No space should be there between IP address and comma.
Also, the CONFIG_CINDER_HOST must not include more than one host.
# The IP address of the server on which to install Cinder CONFIG_CINDER_HOST=192.168.45.21
There is no password in the .cfg file therefore you would be asked for passwords as first thing when running the Packstack command.
Tip: Always check newer version for the Packstack as it keeps moving with new bug fixes. I used version 2, but there is version 3 there already. With newer versions, you may not see what I had got into.
If all successful, you would see something similar to the following:
Installing: Clean Up... [ DONE ] Adding pre install manifest entries... [ DONE ] Installing time synchronization via NTP... [ DONE ] Setting up ssh keys...firstname.lastname@example.org's password: email@example.com's password: firstname.lastname@example.org's password: [ DONE ] Adding MySQL manifest entries... [ DONE ] … [ DONE ] **** Installation completed successfully ****** Additional information: * To use the command line tools you need to source the file /root/keystonerc_admin created on 192.168.45.21 * To use the console, browse to http://192.168.45.21/dashboard * To use Nagios, browse to http://192.168.45.21/nagios username : nagiosadmin, password : vijava * Kernel package with netns support has been installed on host 192.168.45.23. Please note that with this action you are loosing Red Hat support for this host. Because of the kernel update host mentioned above requires reboot. * Kernel package with netns support has been installed on host 192.168.45.21. Please note that with this action you are loosing Red Hat support for this host. Because of the kernel update host mentioned above requires reboot. * Kernel package with netns support has been installed on host 192.168.45.22. Please note that with this action you are loosing Red Hat support for this host. Because of the kernel update host mentioned above requires reboot. * The installation log file is available at: /var/tmp/packstack/20130613-120309-mJMZg2/openstack-setup.log
Tip: You want to change your machine’s hostname because Openstack use it while showing what node a VM instance runs on. By default, CentOS uses localhost.localdomain as hostname and you will find all VM instances run on “localhost.localdomain” even though you know they are not.
To change it, just edit the file as follows. My suggestion is to use meaningful name like controller, node1, node2. If you have a convention that maps the node number with the last part of its IP address, it’s even better. For example, the node18 has IP address of 192.168.23.18.
# vi /etc/sysconfig/network HOSTNAME=controller.openstack
Proxy Problem, Again
The installation went through finally but the system was not working. Again, it turned out to be proxy related, in a different way. As I mentioned in my previous article that I had to set up proxy to install Openstack. I didn’t forget the trick and applied it on every node before running Packstack.
After installing the Openstack, however, it became a problem because OpenStack controller and other nodes depend on HTTP based REST API calls, which were redirected to the proxy I set up for external access, thus blocked. The solution was quite easy – just unset the proxy configuration after installation, but it’s a bit painful to root cause this problem.
# unset http_proxy # unset https_proxy
By default, CentOS has pretty restrictive firewall setting that blocks most ports. Openstack uses ports like 35357. If these ports are blocked, API calls would fail. To make it simple for installation, I just turned off firewall on every host. You can do it with GUI but it is better with command line as follows. The reason is that the GUI returns firewall to be on after rebooting, which may block some initial communications among the different components.
# service iptables stop # chkconfig iptables off
If you run Openstack in production environment, you want to find out all the ports and open them with firewall on all the nodes accordingly.
There are two RPMs you want to install for Openstack to work properly. If you run version 3 of Packstack, they may have been taken care of, but I am not quite sure as I didn’t try version 3.
The first one is related to network namespace support.
# rpm -ihv http://repos.fedorapeople.org/repos/openstack/openstack-grizzly/epel-6/kernel-2.6.32-358.6.2.openstack.el6.x86_64.rpm
The second is related to the DHCP server. By default, CentOS 6.4 installs dnsmasq 2.48 but Openstack requires 2.59. If you run yum to upgrade, it doesn’t install newer version due to the repository. I found RPM at repoforge.org and installed latest 2.65.
rpm -ihv http://pkgs.repoforge.org/dnsmasq/dnsmasq-2.65-1.el6.rfx.x86_64.rpm
You can also install other versions if dependency is an issue.
After the Openstack started to work, I could deploy VM instances (Go this cirros image from the Internet). But the problem was that they could not be assigned IPv4 addresses even though I selected DHCP while creating network. Somehow they all got IPv6 addresses. It does not mean DHCP working. An IPv6 host can configure itself automatically with interacting with router. The is called StateLess Address Auto Configuration (SLAAC).
This problem was pretty hard to debug and took quite some time. This article documented a few great tips to debug the problem, but still didn’t solve my problem right away.
To isolate the problem, it’s good to have basic understanding on how DHCP works. A new host, while booting, broadcasts DHCP request and wait for response from a DHCP server. Both ends and the link between could fail. In my case, the problem happened in both server and link.
The first step is to get into the VM instance and make it send DHCP request. In cirros, the command is
$ sudo udhcpc
It then sends request every second, frequent enough to check it as needed.
Then, find out the DHCP server. Openstack uses dnsmasq as DHCP server. For a network, it should have two processes of dnsmasq running with exactly same long parameters: one by root and the other by nobody.
[root@controller ~]# ps -ef | grep dnsmasq nobody 5150 1 0 Jun14 ? 00:00:01 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=ns-9b91f42d-17 --except-interface=lo --pid-file=/var/lib/quantum/dhcp/f611820b-c562-4fb8-8f5b-34ec25679252/pid --dhcp-hostsfile=/var/lib/quantum/dhcp/f611820b-c562-4fb8-8f5b-34ec25679252/host --dhcp-optsfile=/var/lib/quantum/dhcp/f611820b-c562-4fb8-8f5b-34ec25679252/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,192.168.20.0,static,120s --conf-file= --domain=openstacklocal root 5151 5150 0 Jun14 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=ns-9b91f42d-17 --except-interface=lo --pid-file=/var/lib/quantum/dhcp/f611820b-c562-4fb8-8f5b-34ec25679252/pid --dhcp-hostsfile=/var/lib/quantum/dhcp/f611820b-c562-4fb8-8f5b-34ec25679252/host --dhcp-optsfile=/var/lib/quantum/dhcp/f611820b-c562-4fb8-8f5b-34ec25679252/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,192.168.20.0,static,120s --conf-file= --domain=openstacklocal
Within the host file, you should see something like the following. With more VM instance comes up, you should see more lines here. If that is not the way, it means the Openstack framework has issue.
# vim /var/lib/quantum/dhcp/f611820b-c562-4fb8-8f5b-34ec25679252/host fa:16:3e:45:5b:67,192-168-20-2.openstacklocal,192.168.20.2 fa:16:3e:f0:20:63,192-168-20-4.openstacklocal,192.168.20.4
To troubleshoot the linkage, I found the tcpdump is extremely helpful. The typical link from a virtual machine instance consists of: vm -> tap* -> qbr* -> qvb* -> qvo* -> br-int -> eth1 (nova) -> eth1 (quantum server) -> dnsmasq, where * is the hex string like 65ed97b0-be. You can find the string from the ports in a network in Horizon GUI. I haven’t found a way to get the string from a VM instance.
The route may be different depends on topology and tunneling. You can monitor all the steps on either nova node or quantum server with tcpdump.
# tcpdump –n –i eth1 # tcpdump –n –i br-int # tcpdump -i tap65ed97b0-be # tcpdump -i qbr65ed97b0-be # tcpdump -i qvb65ed97b0-be # tcpdump -i qbr65ed97b0-be
With each of the command, you should see the DHCP request and reply like the following continuously if you run udhcpc command.
23:45:59.878543 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:f0:e3:3f (oui Unknown), length 280 23:45:59.879159 IP 192.168.21.3.bootps > 192.168.21.2.bootpc: BOOTP/DHCP, Reply, length 322
If you don’t see request on any node, you should look into open vswith connectivity with commands like:
# ovs-vsctl list-br # ovs-vsctl list-ports br-int
When needed, you can add ports to bridges as follows. But in general, Openstack should take care of all and you don’t need to run these commands to add ports.
# ovs-vsctl add-port br-int eth1
To assist DHCP troubleshooting, you may need the related log file. You can find it as follows.
# vim /var/log/quantum/dhcp-agent.log # tail -f /var/log/quantum/dhcp-agent.log
At one point, I found errors there relating to the sudo command to run quantum-rootwrap. That led me to look into /etc/sudoers with visudo command and found the following line with interesting comment “the # here does not mean a comment”
Anyway, I hacked the file with ALL permission to the quantum user, and rebooted the hosts and the problem seemed going away. It’s not a good idea to grant all permissions to quantum in production system, but for troubleshooting it helps to isolate problem.
I know it’s pretty long so far. Before we wrap up, here is last tip:
Tip: reboot all your hosts as last resort while troubleshooting. It’s for sure not the most efficient way to solve a problem, but it does the magic sometimes.