Get Presentation Tarball

If you haven't got the presentation tarball, yet, now is the time to download
it. It will come in very handy because it contains all the commands you'll need
for this part in an easily pastable format. So if you haven't got it, yet,
please download it now.

Pre-checks for deploying a cluster using magnum

Login to the controller node and source the openstack credentials. Check if the
magnum services are up and running. At least one magnum-conductor service
should be up and running. The API is not a service under its definition of
service, so service-list will not show it.

As mentioned earlier, we will be using the SLE12SP2 image for our demo. We will
always have this image created if we use the magnum barclamp for deployment as
it is a part of the magnum cookbook. When it is not we can install the
openstack-magnum-k8s-package which will have the image and then upload it to
glance. The important bit of the image creation is the os-distro argument which
needs to be present when creating the image because when magnum builds the
cluster-template it searches for the os_distro field. If it is not found it
will throw an error. ERROR: Image doesn't contain os-distro field. (HTTP 404)

Pre-checks for deploying a cluster using magnum (cont.)

The keypair should already be present on our mkcloud created cloud. If not we
can upload the ssh public key using the openstack keypair create command.

Then we can check if the favor we need exists. It should be there as it is a part
of the magnum recipe, otherwise you can create it using the openstack flavor create
command.

We also need to check if the external network is created.
In our deployment the name will be floating. This network basically allows access
to the outside for the cloud.

Commands we will use frequently

These are some of the commands that we will be using very often here on.

This will list the template, show details of the cluster-template, list the
cluster and show details about the cluster. Please remember this slide for
future reference.

Steps to deploy a Kubernetes cluster using magnum CLI

First of all, we create the cluster template. We need to specify the name for
the template, the magnum image name, the keypair, the external network to be
used, dns-nameserver, flavors for the master and minion nodes, docker volume
size, the network driver and the container orchestration engine which is docker
in our case but it can be Swarm or Mesos. To diable ssl support pass
--tls-disabled. You can also pass --master-lb-enabled to enable load
balancing for etcd and the kubernetes API. We can simply run magnum cluster-template-list
to see if the template is created and then run magnum cluster-template-show
to see the details of the cluster.

It is important to specify the --external-network since this assigns floating ips to the
nodes which allows access to the internet. If this is not setup properly then the etcd discovery
will fail(if using public discovery) and the container images cannot be downloaded, among other
failures.

Steps to deploy a Kubernetes cluster using magnum CLI(cont.)

Then we create the cluster by specifying the number of nodes and the template
to be used (which is the one we just created). This will take some time as this
only starts the process, and in the background Heat will now begin provisioning
of the nova instances, and trigger cloud-init to configure all the things
inside them. We can run the magnum cluster-list command here to see the
the current status of the cluster.

Monitoring and updating the kubernetes cluster

We can now run the openstack stack list command to see the heat stack
building.

In order to monitor our cluster's stack, we can watch the stack's resource list
to see when the stack creation is complete:

stack=$(magnum cluster-show k8s_cluster \
        | grep -w stack_id \
        | awk '{print $4}')
export stack

watch openstack stack resource list -n 5 $stack \| grep -v CREATE_COMPLETE

Once it is complete we can run magnum cluster-show k8s_cluster to see the
details of the cluster like the master and minion address and the node count.

At any later stage we can update the cluster by using the magnum cluster-update
command. We can add more nodes to the cluster at any point with this command.
Currently the update command rebuilds all the nodes in the cluster so you will
ultimately end up creating a new cluster. You need to be specifically careful
while using this with customers deployments because if the cluster has container
workload deployed you will loose them all if you update the cluster and this can
cause serious problems.

This might be too big for our standard mkcloud Magnum setup which has this
configuration

export nodenumber=3
export nodenumbercompute=2

But can work with a setup with a bigger setups with configuration like

export nodenumber=4
export nodenumbercompute=3

Accessing the Kubernetes cluster using magnum CLI

Next we need to access our cluster. There are two ways to do that. Either by
running magnum cluster-show and getting the IP of the master node and SSHing
to, it or accessing it from any machine using the magnum CLI. If you ssh to
the master node then you can directly run the kubernetes commands as
kubectl(the command line interface for kubernetes) will be installed and
configured by cloud init.

For the second method, we will need python-magnumclient installed. Then you
can get the config of your specific cluster using the magnum cluster-config
command. This will create a config file, which it does not overwrite by
default so you need to be careful where you are running it and delete or move
any old config file that might still exist.

You also need to be careful about making sure the config file is up-to-date as
it can happen over time that the IP for the master node changes and needs to be
updated in the config file. If you want to be on the safe side you just run
magnum create-config once again. Next you would need to install kubectl.
Unfortunately we do not have a package for that in SLES, yet. For now you can
either run the following:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

chmod +x ./kubectl

sudo mv ./kubectl /usr/local/bin/kubectl

or visit the offical kubernetes page to download it.

Kubernetes commands we will frequently use

These are some commands we will use often going ahead.

Running our first pod

Pods are the most basic unit in a kubernetes cluster. It is a collection
of one or more containers which are always co-located, co-scheduled and
run in s shared context.
The yaml files are called the manifests and can also be in json format.

Here's what we will use:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
   app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80

Without delving into much detail what you see in this file is

Now we create the pod using the create command:

kubectl create -f nginx.yaml

Running our first service

Next we create a service. One of the most powerful abstractions in Kubernetes
is the service resource. While pods can come and go, services are an
abstraction that makes guarantees to the consumer about a specific service. The
idea is that the consumer should only care about the service being provided,
rather than the underlying implemetation.

Services can be made available to external clients or they can only be used for
internal purpose.

Most of the information in the file is the same as for the pod, but there are
some differences:

Now we create the service using the create command.

Demo: Deploying kubernetes cluster from horizon

First we will deploy magnum by applying the magnum barclamp using the crowbar
UI on the node you want, mostly likely the dashboard node. Then we switch to
the openstack dashboard. We will use the SLES12SP2 based Magnum image to
deploy the kubernetes cluster.

First we need to create a cluster template which is a set of parameters that
specify how to deploy our kubernetes (or any other cloud orchestration engine)
to Magnum. Most of the metadata describing a cluster are set in the
ClusterTemplate it is associated with.

We need to choose the image, master flavor, minion flavor, backend and volume
for container data. For networking for kubernetes we use flannel as the network
driver. We only need to specify the external network name for the Floating IPs.
We will use the local docker registry.

After the cluster template we create the cluster, where we only need to specify
the number of the master and minion nodes. This starts the provisioning process
to deploy the kubernets cluster. Magnum uses heat templates, so our deployment
can be seen in the stack topology tab. Heat spawns the nova instances for the
master and minion nodes. We can check on the console to see how the master node
is booting and which services are configured and started. The cloud-init
service will take the metadata and configure the rest of the kubernetes
services needed. It deploys the services needed on the Kubernetes master first
like etcd, kube-master, kube-scheduler and apicontroller which will generate
ssh keys for the node.

Once the master is finished cloud-init will run on the minion (or minions)
where the containers are started. It will create the minions using the flavor
we specified and customize everything inside. On the network topology we can
see all the instances are in the same network, connected together and routed to
the outside.

Troubleshooting

So much for how Magnum looks in good working condition. Now we are coming to
the part where we will deliberately break Magnum in various common ways and try
to fix it again.

Preparation: Some Sabotage

Before we begin we'll need to create two packet filter rules to ensure things
go South later. You will find these roles as a ready-to-run shell script in
cmd/iptables-break.sh:

iptables -A INPUT \
         -s $(ip addr sh br-public | grep -w inet \
              | awk '{print $2}' | sed 's#/.*##') \
         -p tcp --dport 8004 -j ACCEPT

iptables -A INPUT \
         -s $(neutron net-list | grep floating \
              | awk '{print $7}') \
         -p tcp --dport 8004 \
         -j REJECT --reject-with tcp-reset

They will prevent access to the Heat API from the floating IP network while
still allowing access from the controller node itself. Please just create them
for now and move on. We'll provide a detailed explanation of how and why later.
For now we'll need to move on because we'll need to recreate our Kubernetes
cluster in order for it to suffer from this act of sabotage and this will take
time. So please create the rules now, we'll take care of rebuilding the cluster
on the next slide.

Preparation: Create a Fresh Cluster

Ok, now that everyone's got the iptables rules in place, let's rebuild the
cluster so it is finished failing by the time we are ready to analyze the
error:

Delete Existing Cluster

magnum cluster-delete k8s_cluster

Create the New Cluster

magnum cluster-create \
  --name k8s_cluster \
  --cluster-template k8s_template \
  --node-count 1

When is a Cluster Complete?

Before we get started on troubleshooting clusters, let's take a look at what
constitutes a completed cluster. In order to indicate completion, Magnum uses
Heat's WaitCondition mechanism. A WaitCondition gives
you a magic Heat API URL you can insert into a cloud-init user data script to
signal to Heat it has reached the point where the URL is curled. Heat will then
set the WaitCondition associated with that URL to CREATE_COMPLETE.

Magnum creates such a WaitCondition along with every instance. Its curl command
is triggered at the very end of the user data payload, so if cloud-init manages
to get to that point, this indicates a successfully deployed instance. If all
of the stack's WaitConditions transition to state CREATE_COMPLETE that way, the
cluster itself is marked complete.

There are a number of ways this can go wrong, and they are the main failure
modes for cluster creation (barring problems with resource creation such as the
ever popular "no valid host was found" from Nova). First of all, the instance
may be unable to reach the Heat API for some reason. We are going to use the
blunt instrument of the iptables rule we created to bring that situation about.

Second, the instance may simply take to long, causing the wait condition to
timeout and thus transition to CREATE_FAILED.

Third, some part of the deployment process may fail, causing cloud-init never
to reach the point where it curls the Heat API.

1: Heat API Unreachable / Timeout

By now we should all have a cluster in state CREATE_FAILED. So let's triage it
and find out which of the failure modes we are dealing with. The first thing we
check for is a wait condition timeout or unreachable Heat API. You can tell
these two apart fairly easily.

First of all, we need to figure out the cluster's main Heat stack UUID. We can
get this UUID from a magnum cluster-show:

stack=$(magnum cluster-show k8s_cluster \
        | grep -w stack_id \
        | awk '{print $4}')
export stack

That command is a bit opaque. We did test it and it works, but we'd like to
show you what it works on:

root@d52-54-77-77-01-01:~ # magnum cluster-show k8s_cluster
+---------------------+----------------------------------------------------------------------------------------------------------------------+
| Property            | Value                                                                                                                |
+---------------------+----------------------------------------------------------------------------------------------------------------------+
| status              | CREATE_FAILED                                                                                                        |
| cluster_template_id | ee9d182c-6987-46f6-8e6a-c873e2b10f6c                                                                                 |
| uuid                | bab88e18-a0dc-4db0-9be2-c1a9bf23104a                                                                                 |
| stack_id            | f59404cd-14c9-4b62-b62b-cb0beb057514                                                                                 |
| status_reason       | Resource CREATE failed: WaitConditionTimeout: resources.kube_masters.resources[0].resources.master_wait_condition:   |
|                     | 0 of 1 received                                                                                                      |
| created_at          | 2017-05-16T08:54:39+00:00                                                                                            |
| name                | k8s_cluster                                                                                                          |
| updated_at          | 2017-05-16T09:06:55+00:00                                                                                            |
| discovery_url       | https://discovery.etcd.io/e025ff3b532c3f5845824b36671faca0                                                           |
| faults              | {'0': 'WaitConditionTimeout: resources[0].resources.master_wait_condition: 0 of 1 received', 'kube_masters':         |
|                     | 'WaitConditionTimeout: resources.kube_masters.resources[0].resources.master_wait_condition: 0 of 1 received',        |
|                     | 'master_wait_condition': 'WaitConditionTimeout: resources.master_wait_condition: 0 of 1 received'}                   |
| api_address         | http://:8080                                                                                                         |
| coe_version         | v1.3.7                                                                                                               |
| master_addresses    | ['192.168.232.129']                                                                                                  |
| create_timeout      | 60                                                                                                                   |
| node_addresses      | []                                                                                                                   |
| master_count        | 1                                                                                                                    |
| container_version   | 1.12.3                                                                                                               |
| node_count          | 1                                                                                                                    |
+---------------------+----------------------------------------------------------------------------------------------------------------------+

It looks for the line with the stack_id property and extracts the second
column which contains the UUID for the cluster's main Heat stack.

Next we'll list this stack's resources and look for the failing WaitCondition:

nested=$(openstack stack resource list -n 5 $stack \
                | grep CREATE_FAILED \
                | grep OS::Heat::WaitCondition \
                | awk '{print $11}')
export nested

The failing WaitCondition will be in a nested
stack. "nested" means a Heat
stack created from inside a Heat stack in this context. Magnum creates a main
Heat stack with one or more nested stacks for the Kubernetes masters and
Kubernetes minions. How many there are depends on the --num-nodes and
--num-masters options.

Again, this command is a bit opaque, so let's look at what it operates on again:

root@d52-54-77-77-01-01:~ # openstack stack resource list -n 5 -c resource_name -c resource_type -c resource_status -c stack_name f59404cd-14c9-4b62-b62b-cb0beb057514
+-----------------------------+--------------------------------------------------------------------------------------------------+-----------------+------------------------------------------------------------------------+
| resource_name               | resource_type                                                                                    | resource_status | stack_name                                                             |
+-----------------------------+--------------------------------------------------------------------------------------------------+-----------------+------------------------------------------------------------------------+
| api_monitor                 | Magnum::Optional::Neutron::LBaaS::HealthMonitor                                                  | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| extrouter_inside            | OS::Neutron::RouterInterface                                                                     | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| secgroup_kube_master        | OS::Neutron::SecurityGroup                                                                       | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| kube_master_ports           | OS::Heat::ResourceGroup                                                                          | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| secgroup_kube_minion        | OS::Neutron::SecurityGroup                                                                       | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| etcd_pool                   | Magnum::Optional::Neutron::LBaaS::Pool                                                           | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| api_pool                    | Magnum::Optional::Neutron::LBaaS::Pool                                                           | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| kube_minions                | OS::Heat::ResourceGroup                                                                          | INIT_COMPLETE   | k8s_cluster-jf3atei2jlpu                                               |
| api_address_floating_switch | Magnum::FloatingIPAddressSwitcher                                                                | INIT_COMPLETE   | k8s_cluster-jf3atei2jlpu                                               |
| etcd_loadbalancer           | Magnum::Optional::Neutron::LBaaS::LoadBalancer                                                   | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| api_address_lb_switch       | Magnum::ApiGatewaySwitcher                                                                       | INIT_COMPLETE   | k8s_cluster-jf3atei2jlpu                                               |
| etcd_address_lb_switch      | Magnum::ApiGatewaySwitcher                                                                       | INIT_COMPLETE   | k8s_cluster-jf3atei2jlpu                                               |
| fixed_subnet                | OS::Neutron::Subnet                                                                              | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| kube_masters                | OS::Heat::ResourceGroup                                                                          | CREATE_FAILED   | k8s_cluster-jf3atei2jlpu                                               |
| etcd_monitor                | Magnum::Optional::Neutron::LBaaS::HealthMonitor                                                  | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| api_pool_floating           | Magnum::Optional::Neutron::FloatingIP                                                            | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| extrouter                   | OS::Neutron::Router                                                                              | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| fixed_network               | OS::Neutron::Net                                                                                 | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| secgroup_base               | OS::Neutron::SecurityGroup                                                                       | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| api_loadbalancer            | Magnum::Optional::Neutron::LBaaS::LoadBalancer                                                   | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| kube_minion_ports           | OS::Heat::ResourceGroup                                                                          | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| etcd_listener               | Magnum::Optional::Neutron::LBaaS::Listener                                                       | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| api_listener                | Magnum::Optional::Neutron::LBaaS::Listener                                                       | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu                                               |
| 0                           | file:///usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/templates/kubeport.yaml   | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_master_ports-5p52lkj663n3                |
| kube_port                   | OS::Neutron::Port                                                                                | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_master_ports-5p52lkj663n3-0-lajcxcfmpgth |
| 0                           | file:///usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/templates/kubemaster.yaml | CREATE_FAILED   | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m                     |
| kube_master_floating        | Magnum::Optional::KubeMaster::Neutron::FloatingIP                                                | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| api_pool_member             | Magnum::Optional::Neutron::LBaaS::PoolMember                                                     | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| write_heat_params           | OS::Heat::SoftwareConfig                                                                         | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| create_kubernetes_user      | OS::Heat::SoftwareConfig                                                                         | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| kube_master                 | OS::Nova::Server                                                                                 | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| kube_master_init            | OS::Heat::MultipartMime                                                                          | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| master_wait_handle          | OS::Heat::WaitConditionHandle                                                                    | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| etcd_pool_member            | Magnum::Optional::Neutron::LBaaS::PoolMember                                                     | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| master_wait_condition       | OS::Heat::WaitCondition                                                                          | CREATE_FAILED   | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| configure_kubernetes        | OS::Heat::SoftwareConfig                                                                         | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| master_wc_notify            | OS::Heat::SoftwareConfig                                                                         | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| configure_etcd              | OS::Heat::SoftwareConfig                                                                         | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| make_cert                   | OS::Heat::SoftwareConfig                                                                         | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| configure_flanneld          | OS::Heat::SoftwareConfig                                                                         | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_masters-gypg2jphpe3m-0-gqt6az75rjgs      |
| 0                           | file:///usr/lib/python2.7/site-packages/magnum/drivers/k8s_opensuse_v1/templates/kubeport.yaml   | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_minion_ports-u2ny44qx4lko                |
| kube_port                   | OS::Neutron::Port                                                                                | CREATE_COMPLETE | k8s_cluster-jf3atei2jlpu-kube_minion_ports-u2ny44qx4lko-0-znm2n4vv5bxq |
+-----------------------------+--------------------------------------------------------------------------------------------------+-----------------+------------------------------------------------------------------------+

In that list we look for a OS::Heat::WaitCondition resource in state
CREATE_FAILED (we will only have one in the case at hand) Once we've found it
we look at the 4th column, which contains the name of the nested Heat stack the
WaitCondition's is defined in. Now that we've got it, we will use that stack
name in the next step to figure out whether anything has tried to signal the
wait condition.

The magic signalling URL for a wait condition always ends in /signal and
contains the stack name of the Heat stack it was defined in. Hence we grep for
this combination in the Heat API's log file:

  grep $nested /var/log/heat/heat-api.log \
    | grep /signal

Typically you will be dealing with an HA enabled cloud and thus multiple
heat-api instances, so you'll have to perform this step for all heat-api log
files to be sure.

If this search yields a result with a 200 response but the WaitCondition is
in CREATE_FAILED state nonetheless, that indicates a genuine timeout. The
200 status is a bit misleading since wait condition signalling will report
success back to whoever issues a request against that magic URL, but won't
change the WaitCondition's state once it has timed out. We will later bring
this situation about as well and show you how to fix the problem.

In the case at hand we will not find anything, and can thus rule out late
signalling as the problem. The remaining possiblities are connectivity issues
and something going wrong in one of the user-data scripts. Since we lovingly
handcrafted the network problem leading to this situation we already know what
went wrong. Before we fix it let's take a quick look at how it manifests inside
the affected instance because, though. For this particular network problem will
manifest as a user-data script failure inside the instance and we'd like to
show you how to spot and debug that.

2: Failing user-data script

To debug a failing user data script we will have to log in to the instance
where the user-data scripts failed, causing the WaitCondition not to trigger.
To that end we need to figure out a few things again.

First of all we need to obtain the failing cluster's main Heat stack UUID
again:

stack=$(magnum cluster-show k8s_cluster \
        | grep -w stack_id \
        | awk '{print $4}')
export stack

Once we've got it, we will again look at that stack's resource list and find
the stack name of the nested stack the failing WaitCondition is part of:

nested=$(openstack stack resource list -n 5 $stack \
                | grep CREATE_FAILED \
                | grep OS::Heat::WaitCondition \
                | awk '{print $11}')
export nested

With that stack name we can now obtain the problematic node's floating IP
address:

openstack stack output show $nested kube_master_external_ip  # Only run this now
openstack stack output show $nested kube_minion_external_ip  # Run if minion fails

Depending on whether the WaitCondition failed for a Kubernetes master or a
Kubernetes minion, the output name will differ. This is why I listed two
commands here. In our case we'll use the first command because the Minion
stacks are only created once all WaitConditions in the Master stacks have
transition to CREATE_COMPLETE. Since our little act of sabotage threw a
spanner into the works for that first step already, we will be dealing with a
failed Kubernetes master now, so that's the machine we log in to.

Now that you have the floating IP address, please log in to the machine where
cloud-init hiccupped and look at /var/log/cloud-init-output.log (do not
bother with /var/log/cloud-init.log - it's always empty). That will tell us
which script failed and give us its output. In this case it's quite
straightforward: the spanner we threw into the works earlier caused a
"Connection Refused" on this side.

Sometimes it's not quite so straightforward, though. In that case you might
look at the script on the instance, or the source script that comes with the
magnum python module. You might even add debugging output to it, if the
scripts regular output is not enough.

This is a bit beyond the scope of this workshop, though: we'll be quite happy
if you debug the problem up to this point and then provide a bug report with
the contents of /var/log/cloud-init-output.log and steps to reproduce the
problem. If you are interested we might be able to take a look at how to add
debugging output and manipulate scripts after we're through with the slides.
First we'll need to finish with the slides, though, for we've got two more
failure modes for you.

Interlude: Fix iptables, Break Timeout...

Before we go on I'll need you to quickly remove the iptables rule we used to
sabotage WaitCondition signalling...

iptables -D INPUT \
         -s $(neutron net-list | grep floating \
              | awk '{print $7}') \
         -p tcp --dport 8004 \
         -j REJECT --reject-with tcp-reset

...and break the WaitCondition timeout instead. More on this in a bit.

sed -i s/600/60/ \
/usr/lib/python2.7/site-packages/magnum\
/drivers/k8s_opensuse_v1/templates/kubecluster.yaml

We only need to remove the reject role here because the ACCEPT one doesn't do
any harm.

Interlude: ...and Rebuild Cluster

And now we'll rebuild the cluster without these rules in place, so delete the
old cluster first...

magnum cluster-delete k8s_cluster

...and recreate it:

magnum cluster-create \
  --name k8s_cluster \
  --cluster-template k8s_template \
  --node-count 1

Once everybody has created the new cluster, let's wait a minute and check the
cluster's state again using magnum cluster-show.

3: Genuine Timeout

Everybody should have a cluster in CREATE_FAILED state now. So lets do the
same thing we did in the beginning to determine whether we were dealing with a
genuine timeout or a user-data script/network failure.

First, we obtain the cluster's Heat stack UUID again:

stack=$(magnum cluster-show k8s_cluster \
        | grep -w stack_id \
        | awk '{print $4}')
export stack

Next, we go through the main stack's resource list to find the name of the
nested stack where the failing WaitCondition resides:

nested=$(openstack stack resource list -n 5 $stack \
                | grep CREATE_FAILED \
                | grep OS::Heat::WaitCondition \
                | awk '{print $11}')
export nested

Finally we grep for the wait condition's URL components in /var/log/heat/heat-api.log
again. If we gave it enough time we to let the Kubernetes master's cloud-init
scripts to finnish, we will find an attempt to signal the WaitCondition, even
with a 200 OK. That is misleading, though: since the WaitCondition was already
in state CREATE_FAILED at this point in time, it's state didn't change anymore.

This sort of problem can easily be fixed by adjusting the cluster's timeout
upwards. There's no hard and fast rule for how much to adjust: that depends on
the cloud's load and general slowness in deploying resources. Just deploy a
couple of clusters and use the longest deployment time with a few minutes'
safety margin added on top. Our current default timeout is fairly tight,
unfortunately, so you may encounter this problem a lot, but we've got a patch
on the way to raise it.

Before we go on with the next error, please fix the timeout once more so we can
properly provoke the next error. While we'll hit it anyway, even with an
undersized timeout, it's cleaner to fix this first.

Next, add a bogus entry for discovery.etcd.io to /etc/hosts:

echo '127.0.0.1    discovery.etcd.io' >> /etc/hosts

That is all there is to our next act of sabotage.

Redeploy Cluster Once More

With discovery.etcd.io freshly unreachable, redeploy your cluster now:

Delete existing cluster

magnum cluster-delete k8s_cluster

Create a new cluster

magnum cluster-create \
  --name k8s_cluster \
  --cluster-template k8s_template \
  --node-count 1

Failure to Get Discovery URL

So let's see how our cluster is faring:

root@d52-54-77-77-01-01:~ # magnum cluster-show k8s_cluster
+---------------------+--------------------------------------------------------------------------+
| Property            | Value                                                                    |
+---------------------+--------------------------------------------------------------------------+
| status              | CREATE_FAILED                                                            |
| cluster_template_id | ee9d182c-6987-46f6-8e6a-c873e2b10f6c                                     |
| uuid                | 05203920-0c2b-4741-871a-8f298431a769                                     |
| stack_id            | -                                                                        |
| status_reason       | Failed to get discovery url from 'https://discovery.etcd.io/new?size=1'. |
| created_at          | 2017-05-17T15:04:01+00:00                                                |
| name                | k8s_cluster                                                              |
| updated_at          | -                                                                        |
| discovery_url       | -                                                                        |
| faults              | {}                                                                       |
| api_address         | -                                                                        |
| coe_version         | -                                                                        |
| master_addresses    | []                                                                       |
| create_timeout      | 60                                                                       |
| node_addresses      | []                                                                       |
| master_count        | 1                                                                        |
| container_version   | -                                                                        |
| node_count          | 1                                                                        |
+---------------------+--------------------------------------------------------------------------+

Oops. That failed fast. What Magnum tried to do here was to obtain a discovery
URL. This URL is randomly generated by discovery.etcd.io. If that had been
successful, Magnum would have passed the discovery URL to all cluster members,
who would have used it to register as the cluster's etcd cluster members.
Without a discovery URL this is not possible.

So what do we do about this?

Static etcd Cluster to the Rescue

Since some of our customers have clouds without access to the Internet (or only
heavily restricted access), we came up with various mechanisms for allowing
Magnum to deploy a cluster in this sort of environment. Normally it is a bit
hipster and assumes that it can reach various third-party services such as the
public etcd registry or Docker Hub. We came up with local solutions for the
things these services are used for to allow Magnum to function without Internet
access. These mechanisms are only available in our OpenSUSE Magnum driver, so
clusters on Fedora, CoreOS or Ubuntu images will continue to require Internet
access.

One of the things we came up with as part of these efforts is discoveryless
etcd operation. Rather than register cluster membership through a discovery URL
obtained from , every instance in the etcd cluster gets a list of all other instance's
IP addresses.

This will always work but it causes a bit of a problem when updating clusters:
since this list is recomputed every time an instance is added or removed, and
then inserted into every instance's user data payload, all these instances
will be rebuilt (in order to cause the updated user data payload to run). So
if someone assumes they can simply add nodes to a cluster they have active
workloads on, they are going to lose these workloads if they actually perform a
cluster-update operation.

Redeploy Without Discovery URL

Let's redeploy our cluster with --discovery-url none now.

Delete existing cluster

magnum cluster-delete k8s_cluster

Create a new cluster

magnum cluster-create \
  --name k8s_cluster \
  --cluster-template k8s_template \
  --node-count 1 \
  --discovery_url none

Undo etcd Sabotage

As usual, please undo the etcd sabotage:

sed -i /discovery.etcd.io/d /etc/hosts

Also, there are various other ways for etcd to break. For instance, Magnum may
be able to request a discovery URL but it may then become unreachable at some
much later point. Upstream covers most of these problems in their Troubleshooting Guide.

Since most of the solutions in there revolve around simply restarting etcd
and flannel until the cluster is back in working order, I'll spare you to
tedium. Also, this is hard to sabotage reliably. Just remember that upstream
troubleshooting guide for when you encounter this kind of problem.

Finally we have a bonus error condition for you. This one does not require any
sabotage since our default configuration triggers it. It is also fairly easy to
debug and fix, but let me give you a quick rundown on its history before we get
to the actual debugging part:

We recently fixed a bit of a security issue in Magnum. Before our patch, Magnum
would always pass Keystone credentials that allowed full Openstack API access
in the project the cluster was created in, as the creating user and with all
that user's roles in the project. That's a lot of out-of-band access, especially
if the admin user is foolish enough to create a cluster in the admin tenant
(that would give somebody able to compromise an instance the keys to the
kingdom).

Our patch modified Magnum not to require these credentials in most cases, but
there are two things that continue to use Openstack APIs from inside instances:
the local docker registry which stores its images in Glance and the rexray
Docker volume driver which uses Cinder as its storage backend. If either is
used, we need to revert to the old, insecure behaviour. For us this is mostly
relevant in our offline scenario since that requires registry_enabled to be
true in the ClusterTemplate.

The key to this little act of sabotage is using a ClusterTemplate with
registry_enabled=true, so let's create one:

magnum cluster-template-create \
  --registry-enabled  \
  --name k8s_registry \
  --image-id openstack-magnum-k8s-image \
  --keypair-id default \
  --external-network-id floating \
  --dns-nameserver 8.8.8.8 \
  --flavor-id m1.magnum \
  --master-flavor-id m1.magnum \
  --docker-volume-size 5 \
  --network-driver flannel \
  --coe kubernetes \
  --tls-disabled

Now let's get rid of the old cluster...

magnum cluster-delete k8s_cluster

...and create a new one using our shiny new template:

magnum cluster-create \
  --name k8s_cluster \
  --cluster-template k8s_registry \
  --node-count 1

And now let's do another magnum cluster-show on it...

Error: cluster_user_trust=false

Oops, that didn't go too well. So what happened here?

If we attempt create a cluster with registry-enabled set in its template on
Cloud 7 with default settings, we will get this error message:

This cluster can only be created with trust/cluster_user_trust = True in magnum.conf

To make the offline scenario work in this case you will need to set the raw
mode setting trustee["cluster_user_trust"] to True in Crowbar. This
restores the old, insecure behaviour for clusters with registry-enabled or
volume_driver=Rexray. With this configuration change in place you will be
able to create the cluster now.

It is advisable to run each cluster with registry-enabled in a project of its
own with all quotas set to exactly match the cluster's resource consumption.
This way, an attacker who manages to gain access to the cluster can at worst
destroy the cluster itself but will neither be able to affect other projects'
OpenStack resources nor will they be able to create resources beyond those
allocated for the cluster itself.