ETL Tools List: Overview and Pricing - DZone Big Data

Last updated: 04-15-2018

Read original article here

ETL Tools List: Overview and Pricing - DZone Big Data

https://segment.comManaging data from a massive stack of cloud applications has become a new challenge for many companies. The solution is to use an ETL tool to pipe all that data into a data warehouse that organizes and stores it in the same place.

There are a lot of great enterprise-grade tools such as Informatica, SAS, ODI, and Pentaho, as well as open-source ones like Apache NiFi and StreamSets Data Collector. But for the scope of this post, we’ll provide an ETL tools list, which highlights tools that have shown to be good for our customers — fast-growing startups.

In our list, you’ll find an overview of the ETL tools with pricing, supported data sources and data warehouses, and other valuable info.

Stitch is a simple and powerful solution, which is great for cheaply loading data from application databases. It establishes connections between such data sources as MongoDB and MySQL and SaaS tools like Zendesk and Salesforce.

Because Stitch is built on open-source Singer, users can create their own integrations with a standard JSON-based format and run them. Automated monitoring and alerting provides a simple UI to check the number of rows synced by the data source, receive immediate notifications, and get activity reports.

Segment is both an ETL and a data collection tool, which can collect events from your mobile apps, websites, and servers. You can read about how Segment routes events from different sources in our overview of the best data collection tools specifically for events data.

On the ETL side, Segment is a good option for extracting data from cloud apps like Stripe, Salesforce, or Intercom. Note that it doesn’t support application databases (such as MySQL, MongoDB, or PostgreSQL). Segment can be used alongside with Stitch, where Stitch is extracting data from application databases, and Segment, from cloud apps.

*Monthly Tracked Users: The number of anonymous and logged-in visitors that you track with Segment

Blendo focuses on integrating data from SaaS services. It supports around 30 different data sources. It can be used to handle data from all SaaS services or complement Segment. Blendo has several sources that are not supported by Segment or Stitch. All popular destinations, such as Redshift, BigQuery, or Snowflake, are supported.

Pricing is similar to Stitch and you can pay as you go, no need for an annual commitment at the high end.

Fivetran is an ETL tool that offers a data connector to extract data from database and cloud services and load it into a data warehouse. In that way, business users gain access to up-to-date, row-level data.

Fivetran allows business users to manage many integrations without having to write or maintain any code. The list of integrations includes PostgreSQL, MySQL, Redshift, and many more.

Pricing is available on request.

Alooma provides a data pipeline as a service. You can use almost any input data source and such outputs as BigQuery, Redshift, and Snowflake. A good optimization of each connector to a particular data source gives a great throughput and allows to avoid data loss and duplicates even upon third-party failures.

Alooma offers a set of features which give visibility and control of the whole ETL process. Among them, you can find real-time visualizations and query data streams, code engine, mapper, and restream for catching any errors.

You can get pricing only by request for your particular needs.

Improvado targets marketing data sources, such as Google Adwords, Facebook ads, YouTube, and more. You can explore the full catalog of supported integrations. They have the biggest list so far for marketing data sources.

Improvado can load data into their managed warehouse, which is built on PostgreSQL, or into your own data warehouse.

Pricing is available on request.

Flydata is an ETL tool that can load data only to Amazon Redshift. It supports Amazon RDS, MySQL, PostgreSQL, MariaDB, Percona, and logs in CSV/TSV/JSON as data sources.

It’s a good choice if you want to move your data into a modern DB suited for aggregate processing. Often, AWS introduces FlyData to their prospects to help companies migrate to Amazon Redshift without any technical difficulties.

Singer is an open-source project for ETL integrations. There are more than 20 open-source integrations to data sources (so-called “taps”), and more are being built all the time. If you need to develop a specific target source, you can reuse code from the existing taps and helper utilities.

Taps extract data from any source and write it to a standard stream in a JSON-based format. Targets consume data from taps and load it into a file, API, or database. For now, BigQuery, Stitch, Rakam, and some others are available.

If this ETL tools list seems too extensive, start with defining the most important criteria for your company’s needs. They can be support for data extraction, cleansing, aggregation, transformation or calculation, type and number of supported data streams, optimal relation of price, or scope of provided services.


Read the rest of this article here