Red Hat OpenShift Serverless: Knative Kafka component: Skipped initial events in new KafkaSource or new subscription on a KafkaChannel

Updated

This issue has been fixed as part of the OpenShift Serverless 1.18 release. Release notes can be found here

Issue

  • The initial messages, after the resource(s): KafkaSource or a Subscription on a KafkaChannel is created, may be skipped if no messages have ever been delivered to the intended sink and the underlying data plane pods of the source/channel is restarted.

  • Sometimes the initial messages after the resource(s) KafkaSource or a Subscription on a KafkaChannel, is created are skipped.

Root Cause

This is due to both the Kafka source and channel implementation setting the consumer group offset, before the first event is fully transmitted. If Kafka’s source or channel underlying pod is restarted for any reason when the initial events have been sent, but before the offset is set, the newly created pod will start sending events from the latest offset, and thus some events might be skipped and not transmitted to the intended sink.

Resolution

This issue is scheduled to be resolved in Red Hat OpenShift Serverless 1.18.0.

NOTE
This has been resolved as part of Red Hat OpenShift Serverless Release 1.18. Release notes can be found here

Workaround

Workaround for this issue in earlier version:

After creating resource(s) KafkaSource or a Subscription on a KafkaChannel, but before sending any production traffic, either:

  • Manually reset the consumer group offsets that the Kafka channels and source have created. After you have successfully reset the initial offsets in consumer groups, it is safe to continue using your Kafka channels and sources as usual.
  • Or, if it’s possible, send a few test events to ensure that events are not being skipped. After you have confirmed that test events are not being lost, it is safe to continue using your Kafka channels and sources as usual.

Recovery

In the case that messages have already been skipped and you want to retransmit them:

The Knative Kafka components don’t remove the data from the topic, so the messages are still present there, until the configured retention time is reached. To retransmit missing parts of the data to the intended sink of Kafka channel or source, you should:

  • Identify the start of the missing data period or offset.
  • Delete the KnativeKafka resource in knative-eventing namespace and all Kafka source deployments in your project namespaces. This will make the consumer groups inactive on Kafka cluster, so the offsets become writable. Kafka channel and source will stop operations.
  • Modify the offsets on appropriate Kafka topic consumer group, for example:
$ oc exec -it my-cluster-kafka-0 -n kafka -- /bin/sh
sh-4.4$ bin/kafka-consumer-groups.sh --bootstrap-server <kafkahost:port> \
  --group <group_id> \
  --topic <topic_name> \
  --reset-offsets \
  --to-datetime <XXXX-XX-XXTXX:XX:XX.XXX> \
  --execute
  • Recreate the KnativeKafka resource in knative-eventing namespace.

When KafkaSource resources and the Subscription resources on the KafkaChannel become ready, they will retransmit all the messages to thier intended sink or subscriber, starting from the point in time you provided ( which will include the missing messages.)

Category
Article Type