What is shipyard?

Shipyard is a module to process data in a format inspired by email headers (RFC 2822).

The goal of shipyard is to have a simple, human readable and human writable replacement for CSV that works better for long data and many rows and doesn’t need difficult escaping rules for special characters.

It’s called shipyard because that word contains py and doesn’t seem to be taken yet.


Shipyard is alpha software. So far it seems to work for me (I use it to create this pages) but it may have severe bugs I didn’t noticed yet. Use it at your own risk.

Shipyard is still under development and the API may change in the future.


Shipyard only needs the Python standard lib


Obviously we need to import shipyard:
>>> import shipyard
First we open the file:
>>> input = open('')
Then we create a parser object:
>>> reader = shipyard.Parser(keep_linebreaks=False,
...                          keys=['id', 'discipline', 'year',
...                                'name', 'country', 'rationale'])

For every record the given keys are initialized with None.

Now we can iterate through the records:

>>> for record in reader.parse(input):    
...     print record['country']
United States
United States
Instead of iterating we may want to get a list of dicts:
>>> lod = reader.get_list(input)
>>> print lod     
[{u'discipline': u'Chemistry', u'name': u'Martin Chalfie', ...}, {u'discipline': u'Chemistry', u'name': u'Osamu Shimomura', ...}, ...]
Sometimes we need a dict of dicts (using the ‘id’ field as key):
>>> dod = reader.get_dict(input, key='id')
>>> print dod.keys()
[u'11', u'10', u'1', u'0', u'3', u'2', u'5', u'4', u'7', u'6', u'9', u'8']
>>> print dod[u'5'][u'rationale']
for the discovery of the mechanism of spontaneous brokensymmetry in subatomic physics
If we don’t want dicts we can use the ‘factory’ parameter:
>>> los = reader.get_list(input, factory = lambda **keys: ', '.join(keys.values()))
>>> print los[0]
Chemistry, Martin Chalfie, United States, for the discovery and development of the green fluorescentprotein, GFP, 2008, 0
Of course a class works as a factory, too:
>>> class Laureate(object):
...     def __init__(self, id, discipline, year, name, country, rationale):
... = name
>>> doo = reader.get_dict(input, key='id', factory = Laureate)
>>> print doo[u'2']      
<Laureate object at ...>
>>> print doo[u'2'].name
Roger Y. Tsien

Now let’s write a Shipyard file.

First we create a StringIO (any other file-like object will do, too):
>>> import StringIO
>>> output = StringIO.StringIO()
Next we need a Writer object:
>>> writer = shipyard.Writer(keys=('foo', 'bar'), coding='utf-8')
Now we can use write() to write a single record:
>>> writer.write(output, {'foo': 1, 'bar': 2})
>>> print output.getvalue()
foo: 1
bar: 2

Using write_many() we can write a list of records:
>>> output = StringIO.StringIO()
>>> d = [dict((('foo', i), ('bar', 2*i))) for i in range(3)]
>>> writer.write_many(output, d)
>>> print output.getvalue()
foo: 0
bar: 0

foo: 1
bar: 2

foo: 2
bar: 4

To get a encoding line we use write_coding():
>>> output = StringIO.StringIO()
>>> writer.write_coding(output)
>>> print output.getvalue()
#-*- coding: utf-8 -*-

Now let’s do everything at once using write_full():
>>> output = StringIO.StringIO()
>>> writer.write_full(output, d)
>>> print output.getvalue()
#-*- coding: utf-8 -*-

foo: 0
bar: 0

foo: 1
bar: 2

foo: 2
bar: 4


This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.


Get it using easy_install

easy_install shipyard

or download it here:

Last modified:2008-10-19 18:04:19
GPG signature:shipyard-0.02.tar.gz.gpg
Last modified:2008-10-19 18:01:06
GPG signature:shipyard_doc-0.02.tar.gz.gpg

The documentation tarball contains on offline version of the API documentation.