суббота, 9 октября 2010 г.

Transferring data from fortran to python and vice versa





There are plenty of reasons to transfer data from fortran to python. First of all python has the number of perfect tools for visualization such as matplotlib, MayaVi and others. Sometimes it’s just easier to process your data there.
And the question is how to transfer a 1 Gb matrix of doubles to python for example. Of course parsing text files is not an option for such big files. After googling a little bit I have found that fortran has stupid binary format that inserts additional data into your flat binary file. That makes parsing it a bit trickier. You can read more about it here. Luckily at least for x86 and x64 ifort compiler users there is another option.
If you choose an option form = 'binary' in fortran it will give you the normal c-like binary file output.
open(UNIT = 88, file = 'array1.bin', status='unknown', FORM = 'BINARY')
    do k = 1, nz
        do j = 1, ny
            write(88) (test_array(i,j,k), i=1, nx)
        end do
    end do
close(88)
Here is the code for reading such files from fortran
open(UNIT = 77, file = 'arr.bin', status='unknown', FORM = 'BINARY')
    do k = 1, nz
        do j = 1, ny
            read(77) (test_arr(i,j,k), i = 1, nx)
        end do
    end do
close(77)

Here is the python reading and writing functions using two different modules
import struct
import numpy as np
import string
import sys
def readarray(fname, nx, ny, nz, datatype = 'float'):
    if datatype.lower() == 'float':
        dtype_str = 'f'
        dtype_int = 4
    elif datatype.lower() == 'double':
        dtype_str = 'd'
        dtype_int = 8
    else:
        print "Unknown datatype"
        sys.exit()
    data = np.zeros((nx,ny,nz), np.float)
    fmt = str(nx) + dtype_str
    with open(fname, 'rb') as f:
        for k in xrange(nz):
            for j in xrange(ny):
                bytes = f.read(nx*dtype_int)
                data[:,j,k] = struct.unpack(fmt, bytes)
    f.close()
    return data

def readarray2(fname, nx, ny, nz, datatype = 'float'):
    if datatype.lower() == 'float':
        dtype_str = np.float32
        dtype_int = 4
    elif datatype.lower() == 'double':
        dtype_str = np.float64
        dtype_int = 8
    else:
        print "Unknown datatype"
        sys.exit()
    data = np.zeros((nx,ny,nz), np.float)
    with open(fname, 'rb') as f:
        for k in xrange(nz):
            for j in xrange(ny):
                bytes = f.read(nx*dtype_int)
                data[:,j,k] = np.fromstring(bytes,dtype = dtype_str, count=nx)
    f.close()
    return data

def writearray(fname, data):
    if data.itemsize == 4:
        datatype = 'f'
    elif data.itemsize == 8:
        datatype = 'd'
    else:
        print "Unknown datatype"
        sys.exit()
    with open(fname, 'wb') as f:
        for k in xrange(data.shape[2]):
            for j in xrange(data.shape[1]):
                s = struct.pack(datatype*len(data[:,j,k]), *data[:,j,k])
                f.write(s)
    f.close()

def writearray2(fname, data):
    with open(fname, 'wb') as f:
        for k in xrange(data.shape[2]):
            for j in xrange(data.shape[1]):
                data[:,j,k].tofile(f)
    f.close()

As you can check second approach is faster. I have got this timings for 0.8Gb binary file:
Using struct it takes to read
20489 function calls in 80.889 CPU seconds
Using numpy array it takes to read
20489 function calls in 44.707 CPU seconds
Using struct it takes to write
30726 function calls in 101.300 CPU seconds
Using numpy array it takes to write
10246 function calls in 18.771 CPU seconds

Would appreciate any commetns on this.