Enterprise

All the sessions from Transform 2021 are available on-demand now. Watch now.


Maintainers of the open source Apache Cassandra Project today announced an update that can stream data up to 5 times faster during scaling operations while providing up to 25% faster throughput on reads and writes. Version 4.0 of the Apache Cassandra database has also been optimized for deployment in the cloud, as well as on Kubernetes clusters, said Ekaterina Dimitrova, a software engineer at DataStax, which provides a curated instance of Cassandra to enterprise IT organizations.

Other added capabilities include the ability to keep data replicas synchronized to optimize incremental repairs, audit logs for tracking user access and activity with minimal impact to workload performance, simpler configuration settings, enhanced compression, and improved latency achieved via reduced pause time for a garbage collector that cleans up memory.

Finally, the Apache Cassandra Project maintainers announced today they are now shifting to a yearly release cycle, with each major release to be supported for three years.

Apache Cassandra database update a long time coming

The latest version of the Apache Cassandra databases has been in development for more than three years. The goal is to simplify the migration process by providing a highly stable upgrade instead of a platform that might otherwise be viewed as a work in progress, said Dimitrova. “There have been more than 1,000 bug fixes,” she said.

As part of that effort, the Apache Cassandra community deployed several testing and quality assurance (QA) projects and methodologies during the testing and quality assurance phase of the project that enabled the maintainers and contributors to generate reproducible real-life workloads that could be tested without having to pause a workload.

Apache Cassandra as a NoSQL database has gained traction as an alternative to relational databases that were not designed to process massive amounts of unstructured data. Originally developed by Facebook, Cassandra is based on a wide-column store that makes it possible to efficiently process massive amounts of unstructured data spanning thousands of writes per second with no single point of failure. Facebook donated the database to the Apache Software Foundation in 2009.

Organizations that make use of Cassandra today include Apple, which has deployed more than 160,000 instances storing over 100PB of data across more than 1,000 clusters, and Netflix, which has deployed more than 10,000 instances storing 6PB of data across more than 100 clusters that process more than 1 trillion requests per day. Similarly, Bloomberg serves up more than 20 billion requests per day across a nearly 1PB dataset spanning more than 1,700 Cassandra nodes.

Other organizations that have adopted Apache Cassandra include Activision, Backblaze, BazaarVoice, Best Buy, CERN, Constant Contact, Comcast, DoorDash, eBay, Fidelity, Hulu, ING, Instagram, Intuit, Macy’s, Macquarie Bank, McDonald’s, the New York Times, Monzo, Outbrain, Pearson Education, Sky, Spotify, Target, Uber, Walmart, and Yelp.

Cassandra learning curve is long

The challenge advocates of Cassandra continue to face is that deploying and managing a Cassandra database requires a significant amount of expertise. In many cases, applications only find their way off an open source document database once they run out of headroom. Developers don’t always know to what degree their applications might one day need to scale. Many of them can configure a document database without any intervention of a database administrator (DBA) required.

However, a database that can scale up to process petabytes of unstructured data may eventually be required. The good news is that after an organization encounters that issue the first time, it’s more likely to bring some level of Cassandra expertise to bear on the next application that needs to be refactored to run on a database designed to scale.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member