Kubernetes on AWS, EKS Production Best Practice

Home
Kubernetes on AWS, EKS Production Best Practice

admin April 22, 2024 0 Comments

Kubernetes on AWS, a Very Informative Article For EKS Production Best Practice

In this guide, we’ll explore the best practices to focus on when working with Amazon Elastic Kubernetes Service (EKS) and how to optimize application workloads, harden security configurations, and simplify cluster operations while making the most out of AWS’s powerful cloud infrastructure.

Network Security Best Practice:

✅ Block SSH/RDP remote access to EKS cluster node groups

Disabling SSH/RDP remote access to your EKS cluster node groups largely prevents unauthorized access and potential breaches.

✅ Block Public Access to EKS Cluster Endpoint

When launching a new EKS cluster, a public endpoint is automatically generated on the Kubernetes API server, so that the Kubernetes management tools (e.g. kubectl) can communicate with your EKS cluster. As a best practice, this public access to EKS cluster endpoints must be revoked by using endpointPublicAccess=false option with update-cluster-config command. However, you can still set endpointPrivateAccess=true in order to maintain private access to the EKS cluster.

✅ Enable Envelope Encryption for EKS Kubernetes Secrets using AWS KMS Service

By default, all Kubernetes secrets are stored on the Kubernetes backend database — etcd, in plain text. Anyone having access to the Kubernetes master will be able to see the secrets by looking it up in the backend. This is a huge vulnerability and to add an extra layer of security, implement envelope encryption (i.e. encrypt a key with another key) for these Kubernetes secrets using KMS keys. This will encrypt plaintext Kubernetes secrets with Data Encryption Key (DEK) and encrypt the DEK with kms:encrypt before storing in etcd. KMS can support Customer-managed keys (CMKs), AWS-managed keys, or AWS-owned keys for encryption — and in general, CMKs are the most recommended option.

✅ Restrict unnecessary ingress traffic using EKS Security Groups

Avoid opening all ports within EKS security groups, as it can expose vulnerabilities to attackers who may use port scanners and probing techniques to identify applications and services and launch malicious activities like brute-force attacks. In most instances, permitting inbound traffic solely on TCP port 443 (HTTPS) would be sufficient.

✅ Harden IAM Role Policies of EKS Cluster Node Groups

An IAM role is assigned to every worker node in the EKS cluster node group in order to run kubelet and interact with various other APIs. This IAM role eliminates the need for individual credentials on each node and simplifies providing fine-grained permissions. Also, ensure that these IAM roles must only have the necessary permissions for the tasks they perform, following the principle of least privilege.

✅ Restrict Kubernetes RBAC

Limit permissions in not only IAM but also Kubernetes RBAC, reducing the attack surface and adhering to the “principle of least privilege” — especially, minimising permissions granted via the aws-auth ConfigMap and Kubernetes roles and clusterroles to decrease the risk of compromised credentials.

✅ Authenticate Kubernetes API calls by integrating with an OpenID Connect identity provider

OpenID Connect (OIDC) provides a secure and flexible way to authenticate and authorize users within applications and systems. OIDC providers can be used as an alternative to IAM and after configuring authentication to EKS cluster, you can create Kubernetes roles and clusterroles to assign permissions to the roles, and then bind the roles to the identities using Kubernetes rolebindings and clusterrolebindings. Note that you can only associate one OIDC identity provider to your cluster. For instructions, read more about authenticating users for your cluster from an OpenID Connect identity provider.

✅ Use EKS CNI policy (AWS-managed) to access networking resources

Attach the Amazon EKS_CNI_Policy AWS-managed policy for EKS cluster node groups to effectively manage networking resources. This policy allows the Kubernetes CNI (Container Network Interface) to perform essential tasks such as listing, describing, and modifying VPC ENIs (Elastic Network Interfaces) using the VPC CNI Plugin ( amazon-vpc-cni-k8s) on behalf of the cluster, ensuring proper networking functionality and communication within the EKS environment. For additional instructions, read more about configuring the Amazon VPC CNI plugin for Kubernetes.

✅ Use ECR read-only policy (AWS-managed) to access ECR repositories

Attach the AmazonEC2ContainerRegistryReadOnly AWS-managed policy for EKS cluster node groups to grant permissions to only read and retrieve container images from ECR repositories, without allowing any unnecessary operations on ECR.

✅ Use EKS Cluster policy (AWS-managed) to manage AWS resources

Attach the AmazonEKSClusterPolicy AWS-managed policy for EKS cluster role to provide Kubernetes with the permissions it requires to manage resources on your behalf. It ensures secure access control and cluster operations, seamless integration with AWS services, and regular updates from AWS.

Enable Logging And Monitoring:

✅ Setup EKS control plane logging:

Ensure control plane logs are activated for all EKS clusters, which enables publishing API, audit, controller manager, scheduler, and authenticator logs to AWS CloudWatch Logs

✅ Setup EKS Audit Log Monitoring in GuardDuty

It consumes Kubernetes audit log events directly from the Amazon EKS control plane logging feature and captures chronological activities from users, applications using the Kubernetes API, and the control plane.

✅ Setup CloudTrail logging for Kubernetes API calls

Ensure that CloudTrail logging is activated for all EKS clusters to capture and document all Kubernetes API calls.

Maintain a Healthy EKS Cluster:

✅ Enable readiness and liveness probes for all pods

Readiness probes determine if a pod is ready to serve traffic. When a pod is not ready, it’s removed from service, but it remains running. Readiness probes are crucial for avoiding sending traffic to pods that are still initializing or experiencing issues. Liveness probes verify if a pod is alive and functioning correctly. If a liveness probe fails, Kubernetes restarts the pod. Liveness probes are essential for detecting and recovering from situations where a pod becomes unresponsive or enters a faulty state while running.

✅ Enable pod anti-affinity to ensure spreading pod replicas across multiple worker nodes

Deploying pod workloads with multiple replicas spread across multiple worker nodes is crucial for ensuring high availability and fault tolerance in Kubernetes clusters. By utilizing the Kubernetes Anti-Affinity feature, pods are automatically scheduled across different worker nodes, minimizing the risk of a single node failure affecting all application pods.

✅ Enable CPU & Memory resource requests and limits for pods

Applying appropriate resource requests and limits to every pod is vital for optimizing resource utilization and maintaining cluster stability in AWS EKS. Without proper allocation, resource waste can accumulate over time, leading to inefficiencies and performance bottlenecks. Utilizing Kubernetes’ Vertical Pod Autoscaling (VPA) can help automate this process, adjusting resource requests based on historical usage data. While VPA may require pod eviction for changes, upcoming Kubernetes updates aim to address this limitation. Complementing Kubernetes autoscaling with machine learning technology for fine-grained analysis of real-time capacity utilization ensures efficient resource management, enhancing the overall performance and scalability of your EKS clusters.

✅ Deploy worker nodes across multiple Availability Zones

Configuring worker nodes to deploy across multiple Availability Zones is critical for enhancing the resilience and availability of AWS EKS clusters. By spreading worker nodes across zones, the impact of a single zone outage is mitigated, preventing complete cluster downtime. This is achieved by configuring AWS Auto Scaling Groups (ASGs) to span multiple Availability Zones.

✅ Deploy worker nodes across multiple Availability Zones

Configuring worker nodes to deploy across multiple Availability Zones is critical for enhancing the resilience and availability of AWS EKS clusters. By spreading worker nodes across zones, the impact of a single zone outage is mitigated, preventing complete cluster downtime. This is achieved by configuring AWS Auto Scaling Groups (ASGs) to span multiple Availability Zones.

✅ Keep the Kubernetes version of the EKS cluster up-to-date

Ensure all EKS clusters run on the latest stable version of Kubernetes. This approach provides access to the latest features, design updates, bug fixes, enhanced security, and improved performance. Ideally, these version checks must happen regularly (e.g. quarterly — since Kubernetes releases new minor versions every ~3 months). For Kubernetes versions compatible with EKS, read more about Amazon EKS Kubernetes versions.

✅ Match the CoreDNS add-on version with the EKS cluster’s Kubernetes version

When launching a new EKS cluster, for high availability purposes, 2 CoreDNS replicas are deployed by default (regardless of node count). Since these CoreDNS pods serve as the cluster DNS which provides name resolution for all pods in the cluster, its version has to be always up-to-date and compatible with the Kubernetes version of the cluster.

Conclusion:

By following these guidelines, you ensure your EKS environment is secure, highly available, and optimized for performance. Over the past years, the AWS team has been super innovative and has released a plethora of new features on the EKS ecosystem. Embrace these practices to unlock the full potential of AWS EKS and drive success in your cloud-native journey.

For any technical assistance or further inquiries, whether for business or personal use, we are here to help, Feel free to reach out to us at info@regitsfy.com