Omniduct

An interface for extracting data from various data sources
90
By Daniel Frank, Matthew Wardrop

Omniduct provides uniform interfaces for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).

Documentation: http://omniduct.readthedocs.io
Source: https://github.com/airbnb/omniduct
Bug reports: https://github.com/airbnb/omniduct/issues

It provides:

  • A generic plugin-based programmatic API to access data in a consistent manner across different services (see Supported protocols).
  • A framework for lazily connecting to data sources and maintaining these connections during the entire lifetime of the relevant Python session.
  • Automatic port forwarding of remote services over SSH where connections cannot be made directly.
  • Convenient IPython magic functions for interfacing with data providers from within IPython and Jupyter Notebook sessions.
  • Utility classes and methods to assist in maintaining registries of useful services.

Omniduct has been designed such that it is convenient to use directly (each user can configure their own service definitions) or via another package (which can create a library of pre-defined services, such as for a company). For more information on how to deploy omniduct refer to Deployment.

Links