How Crusoe Built a Scalable, Climate-aligned AI Cloud

Crusoe sped up their Crusoe Cloud platform while minimizing costs with powerful block storage from Lightbits.

It’s been a year since ChatGPT, the chatbot that creates humanlike conversational dialogue, was released and put the spotlight on Generative AI as the disruptive technology. Today, businesses in almost every sector are investigating ways generative AI can give them a competitive edge for nearly everything from content creation to personalized customer experiences to solving difficult business problems, such as identifying suspicious activity to detect and prevent fraud, forecasting future customer volume to optimize staffing, or predicting when a piece of machinery is likely to malfunction. 

Riding this wave are AI technology companies like Microsoft, Meta, Open AI, etc., with the knowledge and resources to build the complex, demanding infrastructure that drives AI. Training large AI models is compute-intensive, requiring expensive GPUs and even more electricity than mining cryptocurrency. Along with the massive specialized compute requirements, AI also demands high-performance storage to feed the data that drives the training and fine-tuning to deliver results faster. Successful AI initiatives are reliant on training models access to data. The more data the model has access to, and the faster it can receive that data, the better the AI outcome. Many enterprises are not equipped to do this on their own due to IT infrastructure required, particularly the high-performance storage required. 

Cloud-based AI services reduce barriers to entry 
Instead, enterprises are turning to cloud-based AI services so they can capitalize on AI without a large capital investment. Hyperscale cloud providers have been beefing up their offerings in response to the demand, but their pricing is prohibitively expensive for many would-be customers. To fill that gap, specialty AI cloud service providers have emerged. They offer more economical pricing and in some cases greater scalability due to superior relationships with GPU suppliers. 

For years, Lightbits has been the go-to performance storage vendor for cloud service providers who bring their expertise to market for specific sectors or use cases. They turn to Lightbits for its software-defined, disaggregated, storage platform because it gives them the performance they need, with the flexibility and economics for success as they scale their services and their business. That track record of proven value is what led Crusoe’s cloud team to Lightbits. 

Turning stranded energy into AI power 
When Crusoe saw methane being flared in oil fields, they saw potential. Crusoe pioneered their Digital Flare Mitigation system, which taps into wasted energy to power advanced computing systems. Their Crusoe Cloud provides a platform for machine learning training, real-time inference, and other AI processes with the goal of being half the cost of hyperscale alternatives, while it lowers the environmental impact of running these compute-intensive workloads. 


To store massive datasets and deliver fast, direct access to performance-sensitive AI/ML workloads demanded storage with low latency and high throughput. Because of the size of the datasets, they needed a solution with a cost model that scaled linearly with capacity. Because their cloud is operated by a small team consisting mostly of software engineers, not storage specialists, they need a software-defined solution that would be easy to operate. And they need to be backed by a high-quality, developer-focused support team from the storage vendor. 

How Crusoe found–and selected–Lightbits 
To help find the right solution for Crusoe Cloud, Crusoe consulted with Alexey Stolyar, CEO International Computer Concepts (ICC), a leading system integrator with broad expertise in standard and customized server, workstations, storage, and networking solutions for a wide variety of industries and markets including GPU-accelerated AI cloud computing and high-performance computing (HPC). 

Stolyar recommended Lightbits for its performance and data protection capabilities that would give the Crusoe Cloud a data platform that could not only deliver the required performance, but also ensure availability, assist in user customization, and be easy for their team to maintain. 

Crusoe determined that the Lightbits data platform would scale performance and storage capacity to meet their growth targets for Crusoe Cloud while maintaining sub-millisecond latency. Lightbits would also meet their high availability and data protection goals through its fast, efficient snapshotting technology across multiple availability zones. Lightbits would enable them to support custom operating system images for their customers, plus allow them to share data and extend their storage more easily. 

In their “How We Built It, Block Storage for AI/ML Workloads, Powered by Lightbits” blog, Crusoe shared benchmarks and explained the advantages of Lightbits. We’ve excerpted key sections of their blog below. For more detail, read the full article

How Crusoe Assessed Lightbits Performance 
Crusoe considered several potential storage options, including other vendors and open-source projects such as Ceph. They performed extensive performance testing across these solutions operating on the same hardware, which revealed Lightbits was superior in terms of IOPS, with notably lower latencies. From the Crusoe blog:

In a test vs the leading open-source software option, Lightbits demonstrates up to 4x performance advantage in terms of bandwidth, particularly notable for smaller IO sizes. Additionally, as Lightbits scales IOPS with increased load, it consistently maintains latencies under 500 microseconds, outperforming the competition in terms of latency boundaries, especially for random accesses. The competition faces challenges in scaling IOPS, leading to latency exceeding 2.5ms under random access.


Lightbits vs Competition, 4kB writes

From data preprocessing to real-time inference, the advantages of lower and more consistent latency, higher throughput, and linear scalability make Lightbits-backed block storage an excellent offering for Crusoe Cloud customers to optimize their AI workflows. 

Crusoe also detailed data protection and operational advantages of Lightbits for their AI customers in their blog. 

Lightbits fills performance and operational gaps that other block storage solutions struggle to address for these workloads and does so while providing a comprehensive set of enterprise functionality to help cloud builders like ourselves operate the system at scale. 

Snapshots/backups. Lightbits' advanced snapshot and backup capabilities provide crucial data protection and recovery options for our users. With the ability to take storage-efficient snapshots and perform rapid data restoration, Crusoe Cloud users will soon be able to periodically snapshot their data to safeguard against unexpected or unintended loss. 

Operational improvements for Crusoe. Customer operating system (OS) images are now stored in the shared Lightbits cluster, enabling us to free up stopped VMs, as well as enabling customers to resize their VMs. Additionally, customers can now consume persistent and high-performance storage from the Lightbits cluster in the form of persistent disks. 

Curated/custom images. 
Building on top of our existing OS images, we're able to leverage 
HashiCorp Packer templates to build OS images, which are then stored within Lightbits, and served to customers on-demand. In the near future, customers will be able to leverage the same pipelines to generate their own workload-specific images, e.g. LLM training with Jax or Generative AI with Stable Diffusion. This flexibility allows our users to choose the most optimized environment for their research and development or production workload needs. This functionality is a relatively minor lift, due in large part to Lightbits' robust API surface area. 

Finally, Crusoe cited how their close collaboration with Lightbits translates to customer success for the AI cloud service. 

Crusoe Cloud's close collaboration with Lightbits has leveled up our GPU cloud platform, enabling us to provide unparalleled performance and scalability to meet the needs of AI and ML, scientific computing, and graphics customers. 

Lightbits' high-performance block storage solution has addressed our storage challenges, unlocking the full potential of Crusoe Cloud's infrastructure and empowering users to pursue innovative research and development in the field of climate science and AI. Together, Crusoe Cloud and Lightbits are driving the future of climate-focused computing toward a more sustainable and efficient tomorrow. 

Thank you to ICC for their strategic partnership and to Crusoe for sharing how you built Crusoe Cloud on the Lightbits cloud data platform! 

To learn more about how you can build a high-performance cloud for your clients, request a demo with Lightbits experts today.



General Enquiry

How Crusoe Built a Scalable, Climate-aligned AI Cloud

Crusoe sped up their Crusoe Cloud platform while minimizing costs with powerful block storage from Lightbits.

It’s been a year since ChatGPT, the chatbot that creates humanlike conversational dialogue, was released and put the spotlight on Generative AI as the disruptive technology. Today, businesses in almost every sector are investigating ways generative AI can give them a competitive edge for nearly everything from content creation to personalized customer experiences to solving difficult business problems, such as identifying suspicious activity to detect and prevent fraud, forecasting future customer volume to optimize staffing, or predicting when a piece of machinery is likely to malfunction. 

Riding this wave are AI technology companies like Microsoft, Meta, Open AI, etc., with the knowledge and resources to build the complex, demanding infrastructure that drives AI. Training large AI models is compute-intensive, requiring expensive GPUs and even more electricity than mining cryptocurrency. Along with the massive specialized compute requirements, AI also demands high-performance storage to feed the data that drives the training and fine-tuning to deliver results faster. Successful AI initiatives are reliant on training models access to data. The more data the model has access to, and the faster it can receive that data, the better the AI outcome. Many enterprises are not equipped to do this on their own due to IT infrastructure required, particularly the high-performance storage required. 

Cloud-based AI services reduce barriers to entry 
Instead, enterprises are turning to cloud-based AI services so they can capitalize on AI without a large capital investment. Hyperscale cloud providers have been beefing up their offerings in response to the demand, but their pricing is prohibitively expensive for many would-be customers. To fill that gap, specialty AI cloud service providers have emerged. They offer more economical pricing and in some cases greater scalability due to superior relationships with GPU suppliers. 

For years, Lightbits has been the go-to performance storage vendor for cloud service providers who bring their expertise to market for specific sectors or use cases. They turn to Lightbits for its software-defined, disaggregated, storage platform because it gives them the performance they need, with the flexibility and economics for success as they scale their services and their business. That track record of proven value is what led Crusoe’s cloud team to Lightbits. 

Turning stranded energy into AI power 
When Crusoe saw methane being flared in oil fields, they saw potential. Crusoe pioneered their Digital Flare Mitigation system, which taps into wasted energy to power advanced computing systems. Their Crusoe Cloud provides a platform for machine learning training, real-time inference, and other AI processes with the goal of being half the cost of hyperscale alternatives, while it lowers the environmental impact of running these compute-intensive workloads. 


To store massive datasets and deliver fast, direct access to performance-sensitive AI/ML workloads demanded storage with low latency and high throughput. Because of the size of the datasets, they needed a solution with a cost model that scaled linearly with capacity. Because their cloud is operated by a small team consisting mostly of software engineers, not storage specialists, they need a software-defined solution that would be easy to operate. And they need to be backed by a high-quality, developer-focused support team from the storage vendor. 

How Crusoe found–and selected–Lightbits 
To help find the right solution for Crusoe Cloud, Crusoe consulted with Alexey Stolyar, CEO International Computer Concepts (ICC), a leading system integrator with broad expertise in standard and customized server, workstations, storage, and networking solutions for a wide variety of industries and markets including GPU-accelerated AI cloud computing and high-performance computing (HPC). 

Stolyar recommended Lightbits for its performance and data protection capabilities that would give the Crusoe Cloud a data platform that could not only deliver the required performance, but also ensure availability, assist in user customization, and be easy for their team to maintain. 

Crusoe determined that the Lightbits data platform would scale performance and storage capacity to meet their growth targets for Crusoe Cloud while maintaining sub-millisecond latency. Lightbits would also meet their high availability and data protection goals through its fast, efficient snapshotting technology across multiple availability zones. Lightbits would enable them to support custom operating system images for their customers, plus allow them to share data and extend their storage more easily. 

In their “How We Built It, Block Storage for AI/ML Workloads, Powered by Lightbits” blog, Crusoe shared benchmarks and explained the advantages of Lightbits. We’ve excerpted key sections of their blog below. For more detail, read the full article

How Crusoe Assessed Lightbits Performance 
Crusoe considered several potential storage options, including other vendors and open-source projects such as Ceph. They performed extensive performance testing across these solutions operating on the same hardware, which revealed Lightbits was superior in terms of IOPS, with notably lower latencies. From the Crusoe blog:

In a test vs the leading open-source software option, Lightbits demonstrates up to 4x performance advantage in terms of bandwidth, particularly notable for smaller IO sizes. Additionally, as Lightbits scales IOPS with increased load, it consistently maintains latencies under 500 microseconds, outperforming the competition in terms of latency boundaries, especially for random accesses. The competition faces challenges in scaling IOPS, leading to latency exceeding 2.5ms under random access.


Lightbits vs Competition, 4kB writes

From data preprocessing to real-time inference, the advantages of lower and more consistent latency, higher throughput, and linear scalability make Lightbits-backed block storage an excellent offering for Crusoe Cloud customers to optimize their AI workflows. 

Crusoe also detailed data protection and operational advantages of Lightbits for their AI customers in their blog. 

Lightbits fills performance and operational gaps that other block storage solutions struggle to address for these workloads and does so while providing a comprehensive set of enterprise functionality to help cloud builders like ourselves operate the system at scale. 

Snapshots/backups. Lightbits' advanced snapshot and backup capabilities provide crucial data protection and recovery options for our users. With the ability to take storage-efficient snapshots and perform rapid data restoration, Crusoe Cloud users will soon be able to periodically snapshot their data to safeguard against unexpected or unintended loss. 

Operational improvements for Crusoe. Customer operating system (OS) images are now stored in the shared Lightbits cluster, enabling us to free up stopped VMs, as well as enabling customers to resize their VMs. Additionally, customers can now consume persistent and high-performance storage from the Lightbits cluster in the form of persistent disks. 

Curated/custom images. 
Building on top of our existing OS images, we're able to leverage 
HashiCorp Packer templates to build OS images, which are then stored within Lightbits, and served to customers on-demand. In the near future, customers will be able to leverage the same pipelines to generate their own workload-specific images, e.g. LLM training with Jax or Generative AI with Stable Diffusion. This flexibility allows our users to choose the most optimized environment for their research and development or production workload needs. This functionality is a relatively minor lift, due in large part to Lightbits' robust API surface area. 

Finally, Crusoe cited how their close collaboration with Lightbits translates to customer success for the AI cloud service. 

Crusoe Cloud's close collaboration with Lightbits has leveled up our GPU cloud platform, enabling us to provide unparalleled performance and scalability to meet the needs of AI and ML, scientific computing, and graphics customers. 

Lightbits' high-performance block storage solution has addressed our storage challenges, unlocking the full potential of Crusoe Cloud's infrastructure and empowering users to pursue innovative research and development in the field of climate science and AI. Together, Crusoe Cloud and Lightbits are driving the future of climate-focused computing toward a more sustainable and efficient tomorrow. 

Thank you to ICC for their strategic partnership and to Crusoe for sharing how you built Crusoe Cloud on the Lightbits cloud data platform! 

To learn more about how you can build a high-performance cloud for your clients, request a demo with Lightbits experts today.



General Enquiry