Nomad CSI on AWS

I decided to write this documentation because the official one is outdated and wrong. This is my journey to configure CSI on AWS.

What’s wrong with the current documentation?

A number of things, but in particular some of the configuration is no longer valid as some APIs has changed. For example, the configuration for EBS defines the following for the volume file:

# volume registration
type = "csi"
id = "mysql"
name = "mysql"
external_id = "${aws_ebs_volume.mysql.id}"
access_mode = "single-node-writer"
attachment_mode = "file-system"
plugin_id = "aws-ebs0"

Current volume definition must include, at least, one capability for each volume:

capability {
  access_mode     = "single-node-reader-only"
  attachment_mode = "file-system"
}

The shopping list

In order for CSI to work, we need the following:

  • Nomad cluster on AWS (not documented here)
  • AWS role attached to the EC2 instances
  • Deployment of CSI controller and CSI node
  • Volume creation and attachment

AWS Setup

All the EC2 instances of the Nomad cluster must have the ability to attach and detach volumes. For that, we need to create an IAM role that will be attached to the EC2 instances. This role must have a policy with, at least, the following permissions:

data "aws_iam_policy_document" "mount_ebs_volumes" {
  statement {
    effect = "Allow"

    actions = [
      "ec2:DescribeInstances",
      "ec2:DescribeTags",
      "ec2:DescribeVolumes",
      "ec2:AttachVolume",
      "ec2:DetachVolume",
    ]
    resources = ["*"]
  }
}

Once the role is created and attached, we can check the attachment with awscli:

$ aws ec2 describe-iam-instance-profile-associations
{
    "IamInstanceProfileAssociations": [
        {
            "AssociationId": "iip-assoc-0d69e1589bec21b29",
            "InstanceId": "i-050749c3a248c0937",
            "IamInstanceProfile": {
                "Arn": "arn:aws:iam::111285186890:instance-profile/demoCSI_profile",
                "Id": "AIPART2I5GFFGWMJ4LISF"
            },
            "State": "associated"
        }
    ]
}

Once that’s done, we can deploy the controller and the node jobs. This is pretty straightforward so no need to explain further. The important part is that both jobs must be running in a healthy state:

$ nomad job status
ID                         Type     Priority  Status   Submit Date
plugin-aws-csi-controller  service  50        running  2022-07-20T10:37:37-04:00
plugin-aws-csi-nodes       system   50        running  2022-07-20T10:38:01-04:00

That will also enable the CSI plugin:

$ nomad plugin status 
Container Storage Interface
ID        Provider         Controllers Healthy/Expected  Nodes Healthy/Expected
aws-ebs0  ebs.csi.aws.com  1/1                           1/1

Now that everything is healthy, we can run the nomad volume register volume.hcl command to create a new volume. If all goes well, we’ll be able to see the following:

$ nomad volume status
Container Storage Interface
ID        Name      Plugin ID  Schedulable  Access Mode
demo_csi  demo_csi  aws-ebs0   true         single-node-writer

Show me the code!

Most of what I did here is part of a private repository, but I made some of it available on a public gist. Feel free to copy, paste and comment if you modify it!