A Case Study with Flink and Kafka (plus Spring and Cassandra and Kubernetes and Docker) – coooool stuff

Even though I have already mentioned the project in my last blog, I am so happy how the case study turned out that I decided to write about it also in this dedicated blog post. All this description together with exact instructions how to run the project and obviously – a complete source code – you will find on my GitHub, here.

Project Description

This Case Study draws from the mobile network operators (MNO) domain. The goal of the project is to implement ongoing, live tracking of data usage by mobile subscribers. As soon as data used within a billing period exceeds max data usage defined in the data plan – the system should generate appropriate notification. In this case study, we expect that appropriate messages are published to data-records-aggregates Kafka topic. Later these messages could be used for different purposes, like to inform the subscriber they exceeded the data plan or to lower the data transfer speed for the data used outside of the plan.

The source data for the project are data record files (tracking data used by mobile phone users) and subscribers’ agreements defined by MNO.

Architecture

The project uses Flink to import the files into the system, and as a streaming platform for detecting the moment of exceeding the data plan.

An auxiliary Spring backend project helps in generating test data:

  • 1️⃣ subscribers’ agreements that define max data in the data plan, and
  • 2️⃣ data record files, that would be coming from MNO in a real project.

The project uses Kafka as the messaging platform and Cassandra database to store ingested data records. Spring backend publishes generated agreements to agreements Kafka topic, and stores generated data record files in /mobilecs/incoming-data-records folder.

There are two main processes in the Case Study.

Importing data records process
  • Implemented by 3️⃣ Incoming Data Records Importer Flink job
  • Reads raw data record files received from MNO (from /mobilecs/incoming-data-records folder)
  • Enriches them with an identifier that allows to uniquely identify any record globally in the system (regardless of identification provided by MNO)
  • Publishes them to 4️⃣ incoming-data-records Kafka topic
Ingesting data records process
  • Implemented by 5️⃣ Incoming Data Records Ingester Flink job
  • Reads imported data records from the incoming-data-records Kafka topic
  • Reads subscribers’ agreements from the agreements Kafka topic
  • Matches incoming data records with agreements (using Flink RichCoFlatMapFunction), creating ingested data records, from now on referred to as simply data records
  • Stores resulting data records in 6️⃣ Cassandra database, in mobilecs.data_record table
  • Defines 7️⃣ a tumbling window that corresponds to the billing period (with a custom assigner, a custom trigger, and a process window function)
  • Aggregates data records within the window and publishes appropriate message to 8️⃣ data-records-aggregates topic twice in the window lifetime:
    • as soon as the data used in the period exceeds the data plan (DATA_PLAN_EXCEEDED)
    • at the end of the billing period (BILLING_PERIOD_CLOSED)

Running the project

To run the project have a look at the instructions on my GitHub, here. There, you will find a step-by-step guide both for docker compose and for Kubernetes. Have fun!

Leave a reply:

Your email address will not be published.

Site Footer

Sliding Sidebar

About Me

About Me

My name is Bartosz Kaminski. I am a software engineer, looking for simple and elegant software design achieved by great teams. My professional interests revolve around distributed systems, microservices, DDD, motivation and building great teams.

Social Profiles