Cassandra Interview Questions
A list of top frequently asked Cassandra interview questions and answers are given below.
Cassandra is a popular NOSQL database management system used to handle large amount of data. It is free and open source distributed database that provides high availability without any failure.
2) In which language Cassandra is written?
Cassandra is written in Java. It is originally designed by Facebook consisting of flexible schemas. It is highly scalable for big data.
3) Who was the original author of Cassandra?
The original authors of Cassandra are Avinash Lakshman and Prashant Malik. It was initially developed at Facebook to power the Facebook inbox search feature.
4) Which query language is used in Cassandra database?
Cassandra introduced its own Cassandra Query Language (CQL). CQL is a simple interface for accessing Cassandra, as an alternative to the traditional Structured Query Language (SQL).
5) What are the benefits/advantages of Cassandra?
- Cassandra delivers real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.
- It provides extensible scalability and can be easily scaled up and scaled down as per the requirements.
- Data can be replicated to several nodes for fault-tolerance.
- Being a distributed management system, there is no single point of failure.
- Every node in a cluster contains different data and able to serve any request.
6) Where Cassandra stores its data?
Cassandra stores its data in the data dictionary.
7) What was the design goal of Cassandra?
The main design goal of Cassandra was to handle big data workloads across multiple nodes without a single point of failure.
8) How many types of NoSQL databases? Give some examples.
There are mainly 4 types of NoSQL databases:
- Document store types ( MongoDB and CouchDB)
- Key-Value store types ( Redis and Volgemort)
- Column store types ( Cassandra)
- Graph store types ( Neo4j and Giraph)
9) Mention some important components of Cassandra data models?
These are some key components of Cassandra data model: –
- Table( collection of columns)
10) What are the other components of Cassandra?
Some other components of Cassandra are:
- Data Center
- Commit log
- Bloom Filter
11) What is keyspace in Cassandra?
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster contains of one keyspace per node.
12) What are the different composite keys in Cassandra?
In Cassandra, composite keys are used to define key or a column name with a concatenation of data of different type. There are two types of Composite key in Cassandra:
- Row Key
- Column Name
13) What is data replication in Cassandra?
Data replication is an electronic copying of data from a database in one computer or server to a database in another so that all users can share the same level of information. Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. The replication strategy decides the nodes where replicas are placed.
14) What is node in Cassandra?
In Cassandra, node is a place where data is stored.
15) 15)What do you mean by data center in Cassandra?
Data center is a complete data of clusters.
16) What do you mean by commit log in Cassandra?
In Cassandra, commit log is a crash-recovery mechanism. Every write operation is written to the commit log.
17) 17)What do you mean by column family in Cassandra?
Column family is a table in RDMS that contains an ordered collection of rows.
18) What do you mean by consistency in Cassandra?
Consistency in Cassandra specifies how to synchronize and up to date a row of Cassandra data and its replicas.
19) How many types of tunable consistency are supported in Cassandra?
It supports two consistencies: Eventual Consistency and Strong Consistency.
The eventual consistency is used when no new updates are made on a given data item, all accesses return the last updated value eventually. Systems with eventual consistency are known to have achieved replica convergence.
Cassandra supports the following conditions for strong consistency:
R + W > N
N: Number of replicas
W: Number of nodes that need to agree for a successful write
R: Number of nodes that need to agree for a successful read
20) What is tunable consistency in Cassandra?
Tunable Consistency is a phenomenal characteristic of Cassandra which makes it a popular choice. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s Tunable Consistency facilitates users to select the consistency level best suited for their use cases.
21) What is the syntax to create keyspace in Cassandra?
22) What is a column family in Cassandra?
In Cassandra, a collection of rows is referred as “column family”.
23) How does Cassandra perform write function?
Cassandra performs the write function by applying two commits:
- First commit is applied on disk and then second commit to an in-memory structure known as memtable.
- When the both commits are applied successfully, the write is achieved.
- Writes are written in the table structure as SSTable (sorted string table).
24) What is memtable?
Memtable is in-memory/write-back cache space containing content in key and column format. In memtable, data is sorted by key, and each ColumnFamily has a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.
25) What is SSTable?
SSTable is a short form of ‘Sorted String Table’. It refers to an important data file in Cassandra and accepts regular written memtables. They are stored on disk and exist for each Cassandra table.
26) How the SSTable is different from other relational tables?
SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.
27) What are the management tools in Cassandra?
DataStaxOpsCenter: It is an internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter.
SPM: SPM primarily administers Cassandra metrics and various OS and JVM metrics. It also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms besides Cassandra.
28) Mention some important features of SPM in Cassandra?
The main features of SPM are:
- Correlation of events and metrics
- Distributed transaction tracing
- Creating real-time graphs with zooming
- Detection and heartbeat alerting
29) What is cluster in Cassandra?
In Cassandra, the cluster is an outermost container for keyspaces that arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.
30) What is the role of ALTER KEYSPACE?
ALTER KEYSPACE is used to change the value of DURABLE_WRITES with its related properties.
31) What do you mean by Cassandra-Cqlsh?
Cqlsh is a Cassandra query language shell used to execute the commands of CQL (Cassandra query language).
32) What are the differences between a node, a cluster, and datacenter in Cassandra?
Node: A node is a single machine running Cassandra.
Cluster: A cluster is a collection of nodes that contains similar types of data together.
Datacenter: A datacenter is a useful component when serving customers in different geographical areas. Different nodes of a cluster can be grouped into different data centers.
33) What is the use of Cassandra CQL collection?
Cassandra CQL collection is used to collect the data and store it in a column where each collection represents the same type of data. CQL consist of three types of types:
- SET: It is a collection of unordered list of unique elements.
- List: It is a collection of elements arranged in an order and can contain duplicate values.
- MAP: It is a collection of unique elements in a form of key-value pair.
34) What is the use of Bloom Filter in Cassandra?
On a request of a data, before doing any disk I/O Bloom filter checks whether the requested data exist in the row of SSTable.
35) How does Cassandra delete data?
In Cassandra, to delete a row, it is required to associate the value of column to Tombstone (where Tombstone is a special value).
36) What is SuperColumn in Cassandra?
In Cassandra, SuperColumn is a unique element containing similar collection of data. They are actually key-value pairs with values as columns.
37) What is the difference between Column and SuperColumn?
Difference between Column and SuperColumn:
- The values in columns are string while the values in SuperColumn are Map of Columns with different data types.
- Unlike Columns, Super Columns do not contain the third component of timestamp.
38) What is Hadoop, HBase, Hive and Cassandra? Specify similarities and differences among them.
Hadoop, HBase, Hive and Cassandra all are Apache products.
Apache Hadoop supports file storage, grid compute processing via Map reduce. Apache Hive is a SQL like interface on the top of Haddop. Apache HBase follows column family storage built like Big Table. Apache Cassandra also follows column family storage built like Big Table with Dynamo topology and consistency.
39) What is the usage of “void close()” method?
In Cassandra, the void close() method is used to close the current session instance.
40) Which command is used to start the cqlsh prompt?
The cqlsh command is used to start the cqlsh prompt.
41) What is the usage of “cqlsh-version” command?
The “cqlsh-version” command is used to provide the version of the cqlsh you are using.
42) Does Cassandra work on Windows?
Yes. Cassandra is compatible on Windows and works pretty well. Now its Linux and Window compatible version are available.
43) What is Kundera in Cassandra?
In Cassandra, Kundera is an object-relational mapping (ORM) implementation which is written using Java annotations.
44) What do you mean by Thrift in Cassandra?
Thrift is the name of RPC client which is used to communicate with the Cassandra Server.
45) What is Hector in Cassandra?
Hector was one of the early Cassandra clients. It is an open source project written in Java using the MIT license.