How I built a Meltano target within 1 hour
In this blog, I briefly shared why I decided to build an integration with Meltano, the major steps to build a new target and how to use/test it. There are official documentations regarding how to build a tap(source) for Meltano, but not many for building target. Hopefully, this post will help you in your own development journey!
Motivation
Let me start from this interesting photo someone took at Data Council 2023 (shameless plug for my blogs: day0, day1, day2, day3 )
This is Douwe Maan, Founder & CEO at Meltano. I saw this smiling face every day in that week, almost every 2 hours.
I am not completely new to Meltano. I joined their Community Slack on 11/22/22 (probably at 11 😅) and got impressed for 3000+ members there.
Meltano, if you are not familiar with it yet, was originally built at GitLab, as an internal framework for better managing their Singer-based custom taps and targets. It became an open-source project in 2018 and quickly attracted attention from a growing group of upstream contributors and enterprise users.
What is amazing is there are over 500 source connectors publicly available on Meltano Hub (repo), plus many custom connectors for niche or internal sources, built with the Meltano SDK.
You may or may not know I work as a co-founder at Timeplus, essentially a streaming database, plus nice UI for real-time charts and alerts. There are a few built-in sources, such as Apache Kafka, Apache Pulsar, Redpanda, Ably, etc.
Once I am back from Data Council, one item on my TODO list is to build a sink connector for Meltano. By doing so, we can Extract data from 500+ source connectors in Meltano ecosystem, and Load such data into Timeplus, for streaming or batch Transformation. See? A perfect story for ELT, in the streaming and modern way.
Use cases? There could be thousands of combinations, e.g.
- Load thousands of CSV files from AWS S3 into Timeplus, apply filter and clean up, and create high quality dimension/lookup table in Timeplus to enrich real-time data feeds (reference)
- Load your HubSpot data into Timeplus in minutes and create near real-time live dashboards to show latest deal pipelines or email events.
- Build better products by turning your real-time user data on Amplitude into meaningful insights and real-time actions in Timeplus.
What I built
The full source code is available at
It’s listed on https://hub.meltano.com/loaders/target-timeplus
Technically it’s a Singer target for Timeplus, built with the Meltano Target SDK.
The easiest way to test it is:
git clone https://github.com/timeplus-io/target-timeplus.git
cd target-timeplus
python3.8 -m pip install --user pipx
pipx install meltano
meltano install
# create a workspace and api key for Timeplus
# https://docs.timeplus.com/docs/quickstart-ingest-api
meltano config target-timeplus set --interactive
meltano elt tap-smoke-test target-timeplus
This will load some sample data from the tap-smoke-test
extractor and send to your Timeplus workspace immediately.
Then you can query them in Timeplus with SQL and build charts and alerts.
With the rich contextual data from other data sources via Meltano, you can write SQL to query variety of data in a single engine, e.g.
SELECT time,sensor_id,sensor_temperature
FROM iot
INNER JOIN table(sensor_spec_csv) as spec USING(sensor_id)
INNER JOIN table(sensor_inventory_jira) as inventory USING(sensor_id)
WHERE sensor_temperature > spec.threshold AND inventory.remaining>5
How I built it
Okay, I lied. I spent more than 1 hour on this target-timeplus. The actual coding time was just 30 minutes. However I spent more 2 hours setting up the development environment. This is exactly why I decided to write a blog about this, to save your time if you want to build a new target without knowing too much about Meltano/Python.
I highly recommend you check https://sdk.meltano.com/en/latest/dev_guide.html first. It’s a nice guide. The only problem is that it doesn’t give you much help if you are about to build a target instead of a tap.
There were a few annoying errors when I installed `tap-smoke-test`. I was stuck almost 1 hour and ended up with downgrading from Python 3.11.3 to 3.8.16. There was also an issue that `tap-smoke-test` was not available on meltano hub. I shared those feedbacks in their Slack and happy to see they took actions immediately to address them.
I haven’t verified those fixes yet. Anyway, here are the key steps to build the new target.
- Make sure you have Python 3.8 installed.
python3.8 --version
Python 3.8.16
2. Install pipx
python3.8 -m pip install --user pipx
3. Update zsh/bash profile to add the extra path
vim ~/.zshrc
export PATH=$PATH:/Users/jove/Library/Python/3.8/bin
4. Install mltano via pipx
pipx install meltano
installed package meltano 2.17.1, installed using Python 3.8.16
These apps are now globally available
- meltano
done! ✨ 🌟 ✨
meltano --version
(2.17.1)
5. Install extra tools via pipx
pipx install cookiecutter
pipx install poetry
pipx install tox
6. Create the skeleton code for your new target via cookiecutter 🍪 🔪
cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/target-template"
destination_name [MyDestinationName]: Timeplus
admin_name [FirstName LastName]: Jove Zhong
target_id [target-timeplus]:
library_name [target_timeplus]:
variant [None (Skip)]:
Select serialization_method:
1 - Per record
2 - Per batch
3 - SQL
Choose from 1, 2, 3 [1]: 2
cd target-timeplus
poetry install
7. Update target_<name>/target.py to update the required/optional configurations for your target.
config_jsonschema = th.PropertiesList(
th.Property(
"endpoint",
th.StringType,
description="Timeplus workspace endpoint",
default="https://us.timeplus.cloud/wsId1234",
required=True
),
th.Property(
"apikey",
th.StringType,
secret=True, # Flag config as protected.
description="Personal API key",
required=True
),
).to_dict()
For example, in my case, there are only 2 config items: the server endpoint and the API key. You can use default
to set a default value and mark sensitive information with secret=True
8. Add extra SDKs to your target by changing pyproject.toml
[tool.poetry.dependencies]
..
timeplus = "1.1.1"
then you need to run poetry update
to update the Python dependencies.
9. Do the real coding in target_<name>/sink.py
You can take https://github.com/timeplus-io/target-timeplus/blob/main/target_timeplus/sinks.py as an example.
A few key functions:
__init__
to validate the connection to your server. You can useself.config["apikey"]
to access the user settings. I choose to create a new stream in Timeplus based on theschema
, if the stream doesn’t exist.start_batch
to initialize something for a new batch. In my case, I setrows
as an empty array.process_record
to add record one by one to your batchprocess_batch
to commit the batch. In my case, I just need to call theingest
API to send multiple rows to the Timeplus endpoint without repeating the column names.
Stream(env=self.env).name(self.stream_name).ingest(self.columns, self.rows)
10. Test your awesome target
You can directly edit the `meltano.yaml` to add the tap-smoke-test
sample tap.
extractors:
- name: tap-smoke-test
namespace: tap_smoke_test
pip_url: git+https://gitlab.com/meltano/tap-smoke-test.git
executable: tap-smoke-test
config:
streams:
- stream_name: animals
input_filename: https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/animals-data.jsonl
- stream_name: page_views
input_filename: https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/pageviews-data.jsonl
Then run meltano install
to install both tap-smoke-test
and your new target.
You need configure the target in the current profile (such as test
) by running meltano config target-timeplus set --interactive
Finally to run the end-to-end test ELT via
meltano elt tap-smoke-test target-timeplus
With some overwhelming stdout/stderr output, you should check your destination system to see whether the data arrives as you expected.
That’s all. Except the tricky python version and compatibility issues, the overall development experience is quite smooth. The Meltano team are good listeners and I am sure it will just get even better and better.
For your reference, a few other good examples of targets: