Week 8 Answers
1. Identify the correct Kafka commands for the following:
P: It is the distributed, durable equivalent of Unix pipes. Use it to connect and compose your large-scale data applications.
Q: These are the commands of your Unix pipelines. Use it to transform data stored in Kafka.
R: It is the I/O redirection in your Unix pipelines. Use it to get your data into and out of Kafka.
a) P: Kafka Streams, Q: Kafka Connect, R: Kafka Core
b) P: Kafka Core, Q: Kafka Connect, R: Kafka Stream
c) P: Kafka Streams, Q: Kafka Core, R: Kafka Connect
d) P: Kafka Core, Q: Kafka Streams, R: Kafka Connect
Answer: D
2. Kafka is run as a cluster comprised of one or more servers each of which is called __________.
a) cTakes
b) Chunks
c) Broker
d) None of the mentioned
Answer: C
3. Kafka maintains feeds of messages in categories called______________.
a) Chunks
b) Domains
c) Messages
d) Topics
Answer: D
4. Each Kafka partition has one server which acts as the _________.
a) Leader
b) Followers
c) Stater
d) None of the mentioned
Answer: A
5. Which type of processing can Apache Spark handle?
a) Stream Processing
b) Batch Processing
c) Graph Processing
d) All of the Mentioned
Answer: D
6. Which is not a component on the top of Spark Core?
a) Spark Streaming
b) Spark RDD
c) MLlib
d) None of the mentioned
Answer: B
7. In Spark, a ______________________ is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.
a) Spark Streaming
b) Resilient Distributed Dataset (RDD)
c) FlatMap
d) Driver
Answer: B
8. ______________ is a distributed machine learning framework on top of Spark. Its goal is to make practical machine learning scalable and easy.
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs
Answer: A
9. Which of the following is true about Apache Kafka?
a) Kafka is a message queuing system that stores messages in queues
b) Kafka uses a distributed commit log to enable high throughput and fault tolerance.
c) Kafka is a real-time data processing framework used for complex transformations.
d) Kafka does not support message retention.
Answer: B
10. In Apache Spark, which of the following is true about Resilient Distributed Datasets (RDDs)?
a) RDDs are immutable and cannot be modified after creation.
b) RDDs are automatically partitioned across nodes without any fault tolerance.
c) RDDs require explicit management of memory and storage.
d) RDDs can only perform operations in a batch processing mode.
Answer: A