Skip to content

Service to retrieve data to update search index

License

Notifications You must be signed in to change notification settings

ONSdigital/dp-search-data-extractor

Repository files navigation

dp-search-data-extractor

Service to retrieve published data to be used to update a search index This service calls /publisheddata endpoint on zebedee and metadata endpoint on dataset API.

This service listens to the content-updated kafka topic for events of type contentUpdatedEvent e.g. see schemas package. You can also read our AsyncAPI specification.

This service takes the uri, from the consumed event, and either calls ...

  1. ... /publisheddata endpoint on zebedee. It passes in the URI as a path parameter e.g. http://localhost:8082/publisheddata?uri=businessindustryandtrade
  2. ... /datasets//editions//versions//metadata endpoint on dataset API, e.g. http://localhost:22000/datasets/CPIH01/editions/timeseries/versions/1/metadata

See search service architecture docs here

Getting started

  • Run make debug
  • Run make help to see full list of make targets

The service runs in the background consuming messages from Kafka. An example event can be created using the helper script, make produce.

Dependencies

  • golang 1.20.x
  • Running instance of zebedee
  • Requires running…
  • No further dependencies other than those defined in go.mod

To run make validate-specification you require Node v20.x and to install @asyncapi/cli:

   npm install -g @asyncapi/cli

Configuration

Environment variable Default Description
BIND_ADDR localhost:25800 The host and port to bind to
DATASET_API_URL http://localhost:22000 The URL for the DatasetAPI
ENABLE_DATASET_API_CALLBACKS false Feature flag to enable dataset api callback
ENABLE_SEARCH_CONTENT_UPDATED_HANDLER false Feature flag to enble search-content-updated topic processing
ENABLE_ZEBEDEE_CALLBACKS false Feature flag to enable zebedee callback
GRACEFUL_SHUTDOWN_TIMEOUT 5s The graceful shutdown timeout in seconds (time.Duration format)
HEALTHCHECK_INTERVAL 30s Time between self-healthchecks (time.Duration format)
HEALTHCHECK_CRITICAL_TIMEOUT 90s Time to wait until an unhealthy dependent propagates its state to make this app unhealthy (time.Duration format)
KAFKA_ADDR "localhost:9092" The address of Kafka (accepts list)
KAFKA_OFFSET_OLDEST true Start processing Kafka messages in order from the oldest in the queue
KAFKA_VERSION 1.0.2 The version of Kafka
KAFKA_NUM_WORKERS 1 The maximum number of parallel kafka consumers
KAFKA_SEC_PROTO unset (only TLS) if set to TLS, kafka connections will use TLS
KAFKA_SEC_CLIENT_KEY unset PEM [2] for the client key (optional, used for client auth) [1]
KAFKA_SEC_CLIENT_CERT unset PEM [2] for the client certificate (optional, used for client auth) [1]
KAFKA_SEC_CA_CERTS unset PEM [2] of CA cert chain if using private CA for the server cert [1]
KAFKA_SEC_SKIP_VERIFY false ignore server certificate issues if set to true [1]
KAFKA_CONTENT_UPDATED_GROUP dp-search-data-extractor The consumer group this application to consume content-updated messages
KAFKA_CONTENT_UPDATED_TOPIC content-updated The name of the topic to consume messages from
KAFKA_PRODUCER_TOPIC search-data-import The name of the topic to produce messages to
KEYWORDS_LIMITS -1 The keywords allowed, default no limit
SERVICE_AUTH_TOKEN unset The service auth token for the dp-search-data-extractor
STOP_CONSUMING_ON_UNHEALTHY true Application stops consuming kafka messages if application is in unhealthy state
TOPIC_TAGGING_ENABLED false Enable topics tagging using the topic cache
TOPIC_CACHE_UPDATE_INTERVAL 30m The time interval to update topics cache (time.Duration format)
TOPIC_API_URL http://localhost:25300 The URL for the Topic API
ZEBEDEE_URL http://localhost:8082 The URL for the Zebedee

Notes:

  1. For more info, see the kafka TLS examples documentation

Healthcheck

The /health endpoint returns the current status of the service. Dependent services are health checked on an interval defined by the HEALTHCHECK_INTERVAL environment variable.

On a development machine a request to the health check endpoint can be made by:

curl localhost:25800/health

Contributing

See CONTRIBUTING for details.

License

Copyright © 2024, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.