Francis Nana-DabankahAbout Projects Contact Blog More

Cluster Topology and Design

  • A single Cassandra instance is called a node
  • Cassandra supports horizontal scalability achieved by adding more that one node as part of a cassandra cluster
  • Cassandra works with peer to peer to architecture with each node connected to all other nodes
  • Each Cassandra node performs all database operations and can serve client requests without a need for a master node
  • Nodes in a cluster communicate with each other via Seeds and Gossip
  • Seeds - Each node configures a list of seeds which is simply a list of other nodes. A seed node is used to bootstrap a node when it is first joining a cluster
  • Gossip - Gossip is the protocol used by Cassandra nodes for peer-to-peer communication. The gossip informs a node about the state of all other nodes
  • A cluster is subdivided into racks and data centers. These terminologies are Cassandra's representation of a real world rack and data center

Database Structures

  • Cassandra stores data in tables where each table is organised in rows and columns the same as any other database
  • Tables are grouped in keyspaces. A keyspace could be used to group tables serving a similar purpose from a business perspective like all transactional tables, metadata tables, use information tables
  • Each table has a defined primary key. The primary key is divided into partition key and clustering columns
  • The partition key is used by Cassandra to index the data. All rows which share a common partition key make a single data partition which is the basic unit of data partitioning, storage and retrieval in Cassandra

Partitioning

  • A partition key is converted to a token by a partitioner
  • The tokens are signed integer values between -2^63 to 2^63-1 and this range is referred to as token range
  • Each Cassandra node owns a portion of this range and it primarily owns data corresponding to the range A token is used to prescisly locate the data among the nodes and on the data storage of the corresponding node
  • Here is a simplified example to illustrate token range assignment. If we consider there are only 100 tokens used for a Cassandra cluster with three nodes. Each node is assigned approximately 33 tokens like ----- node1: 0-33 ----- node2: 34-66 ----- node3: 67-99