Migrating Servers Serverlessly — Part 2

Michael Kelly

Principal Engineer

June 5, 2024

Michael Kelly

Principal Engineer

The second in a two part guide on migrating servers on AWS

In part one of this article, I set the foundation by reviewing some of the fundamental services available in AWS for re-hosting servers as part of lift-and-shift migrations. As mentioned, Application Migration Service, Application Discovery Service and Cloud Migration Factory provide both the means for a lift-and-shift migration and the governance needed to keep it on track.

The great thing about each of these services is that they can be interacted with via the AWS API to automate steps needed against the migration services, streamlining the actions that are classically managed by “click-ops” by a migration engineer. These services can all be automated together end-to-end to perform parallel lift-and-shift migrations and enforce cloud best practices so applications are positioned in the best manner possible in the new tenancy.

Versent has created this automation and the collection of scripts we call the migration factory. The solution has gone through multiple iterations and has reached a state that it has become a tool we regularly deploy out to customers to augment engineering teams and accelerate cloud migrations. Some of the best-practice features we have added to the factory include:

High-level architecture of the Serverless Migration Factory for rehosts, image by author.

In this article I hope to showcase some of these latest architectural extensions we use at Versent which assist the orchestration of lift-and-shift deployments and help customers move into the cloud rapidly.

As such, the article will cover:

An architectural pattern to make use of AWS serverless technology and orchestrate parallel re-hosts in a way that reduces cost alleviates technical debt for our customers and improves the lives of our engineers working with the solution,
A method for managing end-to-end encryption of machine images for re-host migrations to meet non-negotiable security compliance requirements for organisations,
Architecture to bootstrap hosts leveraging MGN SSM post-launch scripts to improve auditability and bootstrap flexibility.

Step Function Orchestration

In earlier iterations of the factory, lift-and-shift migrations have been orchestrated using a secured EC2 command server that hosted automation scripts to enact the host deployments. This pattern gets the job done, however, the following issues exist:

The host becomes another instance in the fleet that needs to be managed,
It needs to be secured,
Rigour needs to be put around it to deal with DR scenarios that could impact the migration engagement timeline.

The automation scripts used did not perform any tasks that were especially resource intensive for CPU, RAM or disk and were primarily orchestrating network-bound tasks. Given these conditions, the automation was a great candidate for refactoring and use of AWS Step Functions.

An example of a state machine migration. Left: managing multiple host migrations in parallel. Right: the process for managing a single host migration, image from AWS Step Function Dashboard.

Moving to Step Functions provided the following benefits to migration:

Easier to track pain points in a migration. If the state machine fails, the operator can click through to the Lambda function and get an understanding of the nature of the issue,
Cheaper to operate,
Multiple state machines can be deployed in parallel, allowing migration teams to operate in parallel autonomously without affecting other engineers,
Improved security with isolated Lambda role boundaries.

While State Machines provide an excellent top down management tool for lift-and-shift migrations it is not the only way we can improve the migration automation to benefit customers. Managing the encryption of host resources from a centralised account improves the orchestration of migrations without compromising security, in the next section I will provide an architecture to manage encryption across the account boundary with EventBridge.

Event Bridge Encryption Manager

As mentioned when reviewing MGN, re-hosts are enabled by the ability in the service to create machine images of hosts in a VPC via replication. One of the new requirements of MGN that isn’t immediately obvious is that machine images are, by default, encrypted. As you may have experienced as an engineer in the AWS compute environment you can’t create an unencrypted machine image from an encrypted image, however you can re-encrypt an image with a different key. This sounds easy enough, however the re-encryption process can take several hours relative to the size of the image.

Encryption can be performed with KMS keys shared from the production environment where the host will live in the landing zone. The time-penalty incurred re-encrypting the machine image happens early in the migration process, not at time of cut-over. This is important if you plan to run test cut-overs multiple times before the production cut-over and get it to a perfect place. This is where the Event Bridge AMI Encryption automation can be made use of.

Bundled with our migration factory is automation that makes use of EventBridge rules bound to AWS Lambda to detect new re-hosts and action image re-encryption to ease migration into a target account while maintaining security guard rails of best practice.

EventBridge-based AMI encryption manager, Image by Versent.

Some of the benefits of the EventBridge Encryption management automation

Confidence that customer application data at rest privacy is enforced throughout the server rehost pipeline,
Maintains velocity of migration process without compromising the ability to rapidly troubleshoot and iterate on host cut-over process,
Happens transparently to the engineer, so no additional work is required,
Allows the migration management to continue to be managed from a migration staging account centrally, reducing complexity for engaged migration engineers,
The key used to encrypt a host’s AMI/EBS resources is from the target AWS account, such that once a migration has concluded and the factory and its AWS account are decommissioned, the associated encryption keys exist in the target account, and only those with the role access to the given account have access to the encrypted resources.

Second to resource encryption, auditability of deployments and consistency of deployed instances are also features that have been in demand when lift-and-shifts are performed. With this in mind, we have extended the host bootstrap process implemented in MGN and structured it to use SSM.

SSM-Based Re-host Bootstrap

As more hosts are being migrated into new landing zones, the demand to have each aligned with the operational, platform and security guidelines of the organisation has also increased. We can understand the intuition here, and organisations do not want to inherit the technical debt of the old environment in the new space. As a feature, being able to remediate hosts when redeployed and having a standardised suite of tools deployed uniformly is valuable. Also being able to remove legacy tools no longer required is also desirable.

Showing off some available documents to assist with migrations, AWS SSM Document Dashboard.

Previous iterations of the migration factory made use of CloudEndure, which allowed post-launch scripts be executed that ran when hosts were brought up in the new environment. In the newest replicating tool iteration, MGN takes this further by including custom post-launch actions and installing by default the AWS SSM agent.

Those familiar with AWS SSM know its reputation for providing a unified method for the management of the configuration of hosts. SSM automation documents can be added to provide operational requirements for new hosts going into the new cloud environment. SSM documents can also be deployed using CloudFormation, promoting infrastructure-as-code best practices and being versioned to allow different iterations of your scripts based on continuous improvement of your migration bootstrap toolkit.

Common examples of post-launch behaviours include:

Installing non-negotiable agents that need to be present (e.g. Inspector, CloudWatch),
Removing software agents that are no longer applicable once in the new environment,
Domain join for active directory or LDAP configuration changes,
AWS also provides a set of predefined post-launch scripts for common host configuration behaviours.

Making use of SSM-based host bootstrap provides the following benefits:

SSM agent is installed by default by MGN and available for all operating systems that support the replication agent.
Provides an audit trail while bootstrapping hosts that get sent to CloudWatch logs.

This prevents logs from getting lost if the host is terminated or you can’t access the host (a common problem when moving a domain-joined host),

If you execute a list of post-launch SSM scripts, it’s more organised and easier to see potential faults in the bootstrap and can assist with rapid debugging.

Consistently handle hosts of the same operating system the same way across environments. Hosts with different operating systems can be managed using different SSM documents as the solution is operating system aware.

A final note on SSM-based post-launch actions: be sure to correctly configure permissions required by your bootstrap scripts. An IAM policy exists (AWSApplicationMigrationSSMAccess) to ensure the MGN service is able to interact with SSM and the associated documents. In the case where you also require access to other services (e.g. you need to retrieve a System Manager Parameter at run time) make certain you have also allowed the associated permissions.

Final Thoughts

As illustrated here, it is possible to leverage the best of the AWS serverless framework to orchestrate migration waves. We believe that by bringing the latest AWS services together with serverless orchestration, we can:

Increase the number of hosts manageable by an engineering team,
Decrease the time it takes to get waves of workloads into AWS,
Reduce the ongoing cost of migration resources,
Make the migration process more auditable,
Reduce some of the operational and security debt brought over from an incumbent environment,
Align to AWS’s well-architected guidelines,
Leave the customer in the best place possible.

Thank you to both Jesse Aranki and Adrian Cservak for their work in architecting and implementing these features.