FSx NetApp ONTAP Remounted

Michael Kelly

Principal Engineer

July 30, 2025

Michael Kelly

Principal Engineer

Reviewing Second Generation Cloud Filesystems

Cloud Storage is a critical feature of most top-tier enterprises we’ve all used and take for granted. These solutions enable team collaboration, planning, and content sharing. Given that many business enterprises can be storage-hungry, being able to scale in an elastic fashion is important. With a presence spanning two decades, NetApp ONTAP is a renowned network storage solution recognised for its reliability and performance. The adoption of NetApp into the AWS FSx family of services in 2021 combines the scale and automation of AWS Cloud with the reliability of NetApp, creating a top-tier cloud storage solution.

Over the last year, the FSx NetApp ONTAP offering has undergone several notable improvements to further scale past some of its limitations, enabling it to meet the demands of customers and address some complex scale challenges. Something that professionally interests me is the architecture of large-scale cloud storage solutions. I’d like to review the new enhancements and guide you on a journey to understand better how to leverage the newly available scale.

In this article, we will:

Get up to speed with the latest solution offerings of the new generation of NetApp filesystems in AWS.
Review the recent uplifts and changes between first and second generation FSx NetApp ONTAP filesystems.
Investigate HA-pairs and how to scale to petabytes of throughput.
Review the caveats of using second-generation file systems, including considerations for management when using multiple HA pairs.

For a review of the value of the FSx NetApp ONTAP solution, check out the official technical documentation, and for a discussion on FSx ONTAP migration considerations, check out this article from 2022.

Figure 1. Single-AZ HA-Pair Architecture for FSx NetApp ONTAP, supported by AWS Backup for Disaster Recovery

Second Generation ONTAP Filesystems

On July 9th of last year, AWS announced the release of upgrades for the second-generation file systems, rolling these out to the North Virginia, Ohio, Oregon, Ireland & Sydney regions, quickly followed by California and Frankfurt, as well as Stockholm and Singapore. Most recently, support for CloudFormation has been added. This article in the AWS Storage Blog, written by Charles Inglis, covers the upgrades provided to second-generation single and multi-availability zone filesystems. The enhancements include the following improvements from the first generation filesystems:

1st generation filesystems allowed for up to 4GBps throughput; 2nd generation filesystems providing up to 6GBps throughput.
The maximum performance tier size has increased from 192 TiB to 512 TiB.
Maximum SSD IOPS has increased from 160K to 200K, with 3 allocated IOPS per GiB of performance tier allocated, scaling linearly.

Beyond just limit increases, backup access retrieval times were reduced, citing a 17-fold improvement on first-generation filesystems, which allowed critical files to be accessed first during a restore, enabling the repair of critical components in the event of a disaster recovery scenario. Also, NVMe-over-TCP protocol support was included as an alternative to iSCSI block storage. This reduces protocol overhead and improves throughput, latency, and available IOPS in relation to the iSCSI protocol.

Most interestingly, the second-generation feature release enables single-AZ 2nd-generation filesystems to easily add extra single-AZ HA pairs, scaling beyond the limitation of single environments. With appropriate planning and architecture, it is possible to achieve massive scale by employing up to 12 HA-pairs deployed concurrently, allowing for the multiplier of scale of the above numbers:

Maximum throughput increases of 72 GBps (12 x 6 GBps),
Maximum performance tier available of 1 PiB (12 x 512 GiB),
Maximum SSD IOPS increases of 2.4 M.

Given that the recommended allocated performance tier for a filesystem is to provision 20% of your storage in the performance tier, and capacity tier has no defined limit, architectures with HA-pairs can support multi-petabyte scale, which can easily meet the demands of enterprise environments.

Figure 2. HA-Pair configuration to scale beyond the limitations of single HA-pair architecture, note Single-AZ placement, solution supports up to five SVMs which manage volumes balanced out to the user base.

Architecture of ONTAP HA-Pairs

As shown in the previous section, next-generation ONTAP filesystems provide an enormous amount of scale out of the box; however, it is possible to scale beyond this with careful planning and the right management. Using second-generation FSxN filesystems, deployment of up to 12 HA pairs enables this.

What is a High Availability Pair in NetApp

High availability pairs (HA-pairs) are pairs of EC2 instances that host the ONTAP solution, set up in an active-passive configuration. If one instance becomes unavailable, the other can take its place and continue serving traffic. Each HA-pair is configured in the NetApp solution to expose a file system aggregate (a logical pool of disks), which abstracts away the performance tier and capacity tier (SSD and HDD, respectively). In second-generation filesystems, as HA pairs are added to the filesystem, more aggregates become available in the console. They can be managed relative to other HA pairs through the NetApp interfaces (CLI/API). This complexity is hidden from the end user, who is provided a DNS endpoint and a share path to mount.

Moving from the first generation to the second

As mentioned in the above references, adding HA-pairs to a single filesystem is not backward compatible for first-generation filesystems. To achieve this, there are several migration paths to move to second-generation filesystems and leverage the benefits of the new platform. This includes:

NetApp SnapMirror: This is the native NetApp solution built into the ONTAP file system. SnapMirror is the most efficient method for transferring data between two ONTAP filesystems, but it requires some setup, an intermediate understanding of the ONTAP CLI, and will only work between two ONTAP environments. For more information, see the AWS documentation on establishing a SnapMirror relationship. In my experience pursuing this path needs to be a strictly orchestrated process with run sheets and/or automation scripts, as there are steps where you can go down a rabbit hole trying to troubleshoot the issue and missing a step.
AWS DataSync: This is the AWS-native method for transferring data to and from file shares in AWS. DataSync is a good way to manage a migration to new file systems if you have familiarity with the AWS ecosystem, as it offers considerable flexibility in selecting source and destination locations for data. If you are familiar with CloudFormation, it can be deployed as a stack for faster and cleaner setup and teardown. It is worth noting that there are price considerations associated with using DataSync, as it requires dedicated hosts to manage the transfer (multiple hosts are needed if you aim to achieve maximum throughput). It’s also worth noting the maximum number of files each task can process: 50 million for transfers from on-premises and 25 million between AWS storage services. The maximum 64 GB of memory needs to be allocated to agents to support this, and a maximum of four parallel agents can be used on the same task in standard mode. This means for enterprise shares, you will need to plan ahead, breaking up transfers into chunks divisible across a filesystem and agents.
For more information on how to execute this, see the documentation on FSxN transfer using DataSync and consider DataSync limitations.
AWS Backup: Restoring a backup into a new filesystem is also a possible migration path.

Scaling up with HA-Pairs

When HA-pairs are added to a filesystem, it is worth having a solid model of how the solution is configured, not to make any one-way-door decisions.

Adding extra HA pairs to an existing Filesystem does not impact performance while the new HA pair (s) are being provisioned. This is important for production environments where downtime SLAs need to be respected.
Adding HA pairs can be done using the AWS Console or BlueXP Workload Factory, but once an HA pair has been added, it cannot be removed. To scale back down again, a migration to a smaller filesystem will be necessary. Refer to the previous section for some of the available migration options.
New HA pairs will have the same throughput and storage capacity as existing HA pairs. This means giving some consideration to future scale when creating the initial filesystem, as it can save on the number of HA pairs that need to be provisioned later.
When adding HA pairs, you can still have a maximum of five Storage Virtual Machines (SVMs) associated with the filesystem. This can have downstream effects on how you manage aggregates and volumes, and should be considered as part of the design.
If iSCSI or NVMe-over-TCP protocol is in use, 6 HA-pairs are the maximum allowed for the Filesystem.
Finally, for performance gains to be realised, volumes need to be balanced between HA pairs and relevant clients, with the latter balanced between the new HA pairs.

Balancing connections across the Filesystem

A central consideration for working with Petabyte-scale ONTAP filesystems is balancing I/O and data between HA-pairs. Under certain situations, datasets can attract more traffic, skewing I/O and disk usage to the high-traffic data and throttling resources for that HA pair.

The general guidance for HA-pairs is to maintain an average aggregate capacity of 80% and a minimum of 50% for all other performance limits, ensuring proper operation across filesystems where multiple HA-pairs are used.
In cases where there is data imbalance, it is possible to increase the filesystem’s primary storage capacity. This will increase the storage capacity of each aggregate, which is beneficial when data capacity exceeds 80% and SVM operations are affected. This will bring usage back under 80% to allow data to be rebalanced between HA pairs.
If imbalances arise, do this; otherwise, move volumes between aggregates

HA-pair imbalance issues can be diagnosed using CloudWatch Metrics, via the ONTAP CLI or ONTAP API. Previously, when problems diagnosing with volumes, aggregates, and SVMs in the ONTAP CLI, you could use commands such as:

network connections active show-clients: Get a count of active clients connecting to Vservers. Useful when diagnosing high traffic

statistics show: Collect SMB/NFS statistics to get performance details to inform rebalancing.
volume show-footprint: Provides a breakdown of where the volume’s data is tenanted, compression details, and other details that can help with rebalancing.

volume show-space: Gives detail of where space is being used on the volume, i.e. users’ data %, snapshots, inodes.
df -h fsxvol_xxx: Get human-readable mounts and total volume sizes.
volume show / -instance: Shows volumes available to the filesystem. If an instance is added, you get a detailed breakdown of the volume configuration, including information on its aggregate, which is useful for rebalancing.
storage aggregate show-space: Gives a breakdown of where space is being used in an aggregate

Clients can be rebalanced by unmounting the volume from the client and remounting using the DNS name of the storage virtual machine share endpoint. See this reference for more information on how to execute this.

Conclusion

As seen with the latest release, second-generation FSx ONTAP filesystems refine the sharp edges of the first FSx offering. As can be expected with a petabyte-scale file system, there are management considerations to ensure that load is evenly distributed across HA pairs. NetApp and AWS provide all the tools necessary for cloud storage experts to have everything they need for ongoing maintenance and growth of the solution.

References