The Read Module¶
The diogenes.read
module provides tools for reading data from
external sources into Diogenes’ preferred Numpy structured
array format.
The module can read from either:
- A local CSV file
- A remote CSV file
- Any sort of SQL database
Local CSV Files¶
We can read local CSV files using diogenes.read.read.open_csv()
.
import diogenes
sample_csv_text = 'id,name,age\n0,Anne,57\n1,Bill,76\n2,Cecil,26\n'
with open('sample.csv', 'w') as csv_in:
csv_in.write(sample_csv_text)
sample_table = diogenes.read.open_csv('sample.csv')
sample_table
is a structured array (more specifically, a record
array).
print sample_table.dtype
(numpy.record, [('id', '<i8'), ('name', 'O'), ('age', '<i8')])
print sample_table
[(0, 'Anne', 57) (1, 'Bill', 76) (2, 'Cecil', 26)]
Remote CSV files¶
We read remote CSV files with diogenes.read.read.open_csv_url()
,
using a url
remote_csv = diogenes.read.open_csv_url('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv',
delimiter=';')
print remote_csv.dtype
(numpy.record, [('fixed acidity', '<f8'), ('volatile acidity', '<f8'), ('citric acid', '<f8'), ('residual sugar', '<f8'), ('chlorides', '<f8'), ('free sulfur dioxide', '<f8'), ('total sulfur dioxide', '<f8'), ('density', '<f8'), ('pH', '<f8'), ('sulphates', '<f8'), ('alcohol', '<f8'), ('quality', '<i8')])
print remote_csv[:10]
[(7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8, 6)
(6.3, 0.3, 0.34, 1.6, 0.049, 14.0, 132.0, 0.9940000000000001, 3.3, 0.49, 9.5, 6)
(8.1, 0.28, 0.4, 6.9, 0.05, 30.0, 97.0, 0.9951, 3.26, 0.44, 10.1, 6)
(7.2, 0.23, 0.32, 8.5, 0.057999999999999996, 47.0, 186.0, 0.9956, 3.19, 0.4, 9.9, 6)
(7.2, 0.23, 0.32, 8.5, 0.057999999999999996, 47.0, 186.0, 0.9956, 3.19, 0.4, 9.9, 6)
(8.1, 0.28, 0.4, 6.9, 0.05, 30.0, 97.0, 0.9951, 3.26, 0.44, 10.1, 6)
(6.2, 0.32, 0.16, 7.0, 0.045, 30.0, 136.0, 0.9949, 3.18, 0.47, 9.6, 6)
(7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8, 6)
(6.3, 0.3, 0.34, 1.6, 0.049, 14.0, 132.0, 0.9940000000000001, 3.3, 0.49, 9.5, 6)
(8.1, 0.22, 0.43, 1.5, 0.044000000000000004, 28.0, 129.0, 0.9938, 3.22, 0.45, 11.0, 6)]
SQL databases¶
We read from, and write to, databases using
diogenes.read.read.connect_sql()
. When we pass an SQLAlchemy
connection
string to
connect_sql, we get an instance of
diogenes.read.read.SQLConnection
, which can run SQL queries
with diogenes.read.read.SQLConnection.execute()
in a way that
resembles (but does not strictly adhere to) DBAPI
2.0.
conn = diogenes.read.read.connect_sql('sqlite://')
conn.execute('CREATE TABLE sample_table (id INT, name TEXT, age INT)')
for row in sample_table:
conn.execute('INSERT INTO sample_table (id, name, age) VALUES (?, ?, ?)', row)
An important difference between most Python SQL libraries and Diogenes is that Diogenes returns queries in structured arrays.
sql_result = conn.execute('SELECT * FROM sample_table')
print sql_result.dtype
(numpy.record, [('id', '<i8'), ('name', 'O'), ('age', '<i8')])
print sql_result
[(0, u'Anne', 57) (1, u'Bill', 76) (2, u'Cecil', 26)]