diogenes.read package

Submodules

diogenes.read.read module

This module provides functions that convert databases in external formats to Numpy structured arrays.

class diogenes.read.read.SQLConnection(conn_str, allow_caching=False, tmp_dir='.', parse_datetimes=[], allow_pgres_copy_optimization=True)

Bases: object

Connection to SQL that returns numpy structured arrays Intended to vaguely implement DBAPI 2

Parameters:
  • conn_str (str) – SQLAlchemy connection string (http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html)
  • allow_caching (bool) – If True, diogenes will cache the results of each query and return the cached result if the same query is performed twice. If False, each query will be sent to the database
  • tmp_dir (str) – If allow_caching is True, the cached results will be stored in tmp_dir. Also, where csvs will be stored for postgres servers
  • parse_datetimes (list of col names) – Columns that should be interpreted as datetimes
execute(exec_str, invalidate_cache=False)

Executes a query

Parameters:
  • exec_str (str) – SQL query to execute
  • invalidate_cache (bool) – If this SQLConnection object was initialized with allow_caching=True, identical queries will always return the same result. If invalidate_cache is True, this behavior is overriden and the query will be reexecuted.
Returns:

Results of the query in terms of a numpy structured array

Return type:

numpy.ndarray

exception diogenes.read.read.SQLError

Bases: exceptions.Exception

diogenes.read.read.connect_sql(con_str, allow_caching=False, tmp_dir='.', parse_datetimes=[], allow_pgres_copy_optimization=True)

Provides an SQLConnection object, which makes structured arrays from SQL

Parameters:
  • conn_str (str) – SQLAlchemy connection string (http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html)
  • allow_caching (bool) – If True, diogenes will cache the results of each query and return the cached result if the same query is performed twice. If False, each query will be sent to the database
  • tmp_dir (str) – If allow_caching is True, the cached results will be stored in tmp_dir. Also where csvs will be stored for postgres servers
  • parse_datetimes (list of col names) – Columns that should be interpreted as datetimes
Returns:

Object that executes SQL queries and returns numpy structured arrays

Return type:

SQLConnection

diogenes.read.read.open_csv(path, delimiter=', ', header=True, col_names=None, parse_datetimes=[])

Creates a structured array from a local .csv file

Parameters:
  • path (str) – path of the csv file
  • delimiter (str) – Character used to delimit csv fields
  • header (bool) – If True, assumes the first line of the csv has column names
  • col_names (list of str or None) – If header is False, this list will be used for column names
  • parse_datetimes (list of col names) – Columns that should be interpreted as datetimes
Returns:

  • numpy.ndarray – structured array corresponding to the csv
  • If header is False and col_names is None, diogenes will assign
  • arbitrary column names

diogenes.read.read.open_csv_url(url, delimiter=', ', header=True, col_names=None, parse_datetimes=[])

Creates a structured array from a url

Parameters:
  • url (str) – url of the csv file
  • delimiter (str) – Character used to delimit csv fields
  • header (bool) – If True, assumes the first line of the csv has column names
  • col_names (list of str or None) – If header is False, this list will be used for column names
  • parse_datetimes (list of col names) – Columns that should be interpreted as datetimes
Returns:

  • numpy.ndarray – structured array corresponding to the csv
  • If header is False and col_names is None, diogenes will assign
  • arbitrary column names

Module contents