Datahub

DataHub is an open-source metadata platform for the data stack. DataHub is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance. It supports various data sources including PostgreSQL.

Because YugabyteDB's YSQL API is wire-compatible with PostgreSQL, Datahub can connect to YugabyteDB as a data source using the PostgreSQL plugin.

Setup

You can run the Docker Compose quickStart example provided in the Datahub GitHub repository against YugabyteDB with the following changes:

  • Replace the MySql Docker image with that of YugabyteDB.
  • Specify the entrypoint command for the YugabyteDB Docker container.
  • Change port from 5432 to 5433
  • Change username and password to yugabyte.
  • Change the driver to org.postgresql.Driver.

Make changes in the following files:

  • In docker/quickstart/docker-compose-without-neo4j.quickstart.yml, change the following:

    • Change the EBEAN_DATASOURCE configuration [lines 80-84 and 126-130] as follows:

      EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver EBEAN_DATASOURCE_HOST=yugabyte:5433 EBEAN_DATASOURCE_PASSWORD=yugabyte EBEAN_DATASOURCE_URL=jdbc:postgresql://yugabyte:5433/yugabyte EBEAN_DATASOURCE_USERNAME=yugabyte
    • Change mysql-setup to postgres-setup [line 123].

    • Replace the mysql and mysql-setup container [lines 197 - 231] with yugabyte and postgres-setup container as follows:

      yugabyte: container_name: yugabyte hostname: yugabyte image: yugabytedb/yugabyte:latest command: /bin/bash /home/yugabyte/docker-entrypoint-initdb.d/yb-init.sh environment: POSTGRES_USER: ${POSTGRES_USER:-yugabyte} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-yugabyte} ports: - '5433:5433' volumes: - ./yb-setup/:/home/yugabyte/docker-entrypoint-initdb.d/ healthcheck: test: bin/ysqlsh -h `hostname -i` -U yugabyte -tAc 'select 1' -d yugabyte interval: 10s timeout: 5s retries: 20 postgres-setup: container_name: postgres-setup depends_on: yugabyte: condition: service_healthy environment: - POSTGRES_HOST=yugabyte - POSTGRES_PORT=5433 - POSTGRES_USERNAME=yugabyte - POSTGRES_PASSWORD=yugabyte - DATAHUB_DB_NAME=yugabyte hostname: yugabyte-setup image: ${DATAHUB_POSTGRES_SETUP_IMAGE:-acryldata/datahub-postgres-setup}:${DATAHUB_VERSION:-head}
  • Create a directory yb-setup in docker/quickstart/ and a script file named yb-init.sh with the following content and place it under docker/quickstart/yb-setup/ in the repository. The script runs during container initialization to launch the YugabyteDB cluster.

    bin/yugabyted start sleep 5 bin/ysqlsh -h `hostname -i` -f /home/yugabyte/docker-entrypoint-initdb.d/init.sql tail -f /dev/null
  • Copy the file docker/postgres/init.sql to docker/quickstart/yb-setup/.

Run the example

Run the example using the following command:

docker compose -f docker-compose-without-neo4j.quickstart.yml up -d

After all the containers are running, you can ingest some demo data by running ./datahub/docker/ingestion/ingestion.sh, or head to http://localhost:9002 (username: datahub, password: datahub) to access the UI.