Binary Sonar Data Formats¶
Overview - (How PyHum reads Humminbird files)¶
PyHum reads data from binary files created by a Humminbird unit (using the ‘Record’ function). Humminbird units write out the following file formats:
- *.DAT files which contain basic information about the sonar, time, position and sonar settings. It does this on the first ping, so the time and position refer only to the instant the recording is initiated. The full list of parameters in this file is below
- *.SON files which contain the 8-bit sonar data (echograms)
- *.IDX files (1 per SON file) which contain indices of successive pings in the corresponding SON file
One set of data in PyHum consists of up to 4 *.SON files and 1 *.DAT file. If present, the software will use the *.IDX files to more efficiently read in the echogram data from the *.SON files.
How PyHum reads the binary data¶
The program that reads the data is ‘pyread’ which is statically compiled from Cython source code (_pyread.pyx), for speed. You can compile this module independently using the ‘cython’ command and a c-compiler. For example, using the gcc compiler on linux:
cython pyread.pyx
gcc -c -fPIC -I/usr/include/python2.7/ pyread.c
gcc -shared pyread.o -o pyread.so
The pyread module gets called by the read module in the following way:
import pyread
data = pyread.pyread(sonfiles, humfile, c, model, cs2cs_args)
where
- ‘sonfiles’ is a list of strings containing the paths to the *.SON files
- ‘humfile’ is a string contining the filepath to the .DAT file
- ‘c’ is the estimated speed of sound (typically 1500 m/s for salt water, 1450 m/s for freshwater)
- ‘model’ is the Humminbird unit (units currently known to be supported by PyHum: 798, 898, 998, 1198, and 1199) and ‘cs2cs_args’ is a string projected coordinate system (for example, “epsg:26949” is Arizona Central State Plane). See the documentation for the module ‘pyproj’ for more details.
The output variable, ‘data’, is a python class that returns sonar data.
data.gethumdat() returns data in .DAT file. This is a python dictionary object containing the following keys.
- ‘water_code’: 0=’fresh’ (freshwater), 1=’deep salt’ (deep saltwater), 2=’shallow salt’ (shallow saltwater), otherwise = ‘unknown’
- ‘sonar_name’: a numeric code reported by the instrument
- ‘unix_time’: unix (epoch) time in seconds
- ‘utm_x’: UTM x coordinate
- ‘utm_y’: UTM y coordinate
- ‘filename’: string containing the name of the DAT file
- ‘numrecords’: number of pings in the SON files
- ‘recordlens_ms’: length of time between successive records
- ‘linesize’: number of bytes in a line (ping and associated info)
- ‘water_type’: a string corresponding to the water code (see above)
- ‘lat’: latitude, WGS84 degrees
- ‘lon’: longitude, WGS84 degrees
data.getportscans() returns compiled scans from the port side sidescan sonar. This is a 2D (time, distance) numeric array composed of 16-bit floats
data.getstarscans() returns compiled scans from the starboard side sidescan sonar. This is a 2D (time, distance) numeric array composed of 16-bit floats
data.getlowscans() returns compiled scans from the low-frequency (e.g. 83 kHz) side sidescan sonar. This is a 2D (time, distance) numeric array composed of 16-bit floats. Note that a downward scan is taken only every other side scan, so there are half the number of pings in these data as the sidescan data
data.gethiscans() returns compiled scans from the high-frequency (e.g. 200 kHz) side sidescan sonar. This is a 2D (time, distance) numeric array composed of 16-bit floats. Note that a downward scan is taken only every other side scan, so there are half the number of pings in these data as the sidescan data
data.getmetadata() returns a list of metadata compiled per ping. This is a python dictionary object containing the following keys. Each variable is a numeric array of floats.
- ‘lat’: latitude, WGS84 degrees
- ‘lon’: longitude, WGS84 degrees
- ‘spd’: vessel speed, in metres per second
- ‘time_s’: unix (epoch) time in seconds
- ‘e’: easting coordinate in metres, in projection given by input variable “cs2cs_args”
- ‘n’: northing coordinate in metres, in projection given by input variable “cs2cs_args”
- ‘dep_m’: depth in metres
- ‘caltime’: time elapsed in seconds since start of recording
- ‘heading’: course-over-ground heading in degrees
Decoding the .DAT file¶
All data are big-endian.
- byte 1 = spacer
- bytes 2-5 = integer, water code
- bytes 6-7 = spacer
- bytes 8-11 = character, sonar name
- bytes 12-23 = spacer
- bytes 24-27 = character, unix time
- bytes 28-31 = character, utm x coordinate
- bytes 32-35 = character, utm y coordinate
- bytes 36-45 = character, filename
- bytes 46-47 = spacer
- bytes 48-51 = character, number of records
- bytes 52-55 = character, record length, milliseconds
- bytes 56-59 = character, line size
- bytes 60-65 = spacer
A note on Humminbird positions¶
Humminbird records position in ‘World Mercator Meters’ with no UTM zone (epsg code 3395). Where X = easting, Y = northing, the following formula is used to convert to WGS84:
latitude = atan(tan(atan(exp(X/ 6378388.0)) * 2.0 - 1.570796326794897) * 1.0067642927) * 57.295779513082302
longitude = Y * 57.295779513082302 / 6378388.0
where ‘atan’ is the inverse tangent in radians, ‘tan’ is the tangent in radians, ‘exp’ is the exponential. Note that coordinate transforms using standard geospatial libraries such as Proj.4 (pyproj) and GDAL, are, for some unknown reason, always too inaccurate to use.
A note on Humminbird depths¶
PyHum applies a time-varying-gain to estimated depth soundings:
tvg = ((8.5*10**-5)+(3/76923)+((8.5*10**-5)/4))*c
where ‘c’ is the speed of sound in water. Corrected depth then becomes:
depth = tan( 0.4363323 * depth)- tvg
Decoding the .SON files¶
Record structure
The fast way is to use byte indices in IDX files as the start and stop positions (in bytes) of each ping. PyHum will use this way by default. If the IDX files are absent or corrupted, then PyHum will revert to a slightly slower method that finds the start of each record according to the location of header codes. Records are always preceeded by the string of integers: [192,222,171,33,128]. The program finds this header code using the Knuth-Morris-Pratt string-matching algorithm.
Records are composed of ‘header’ data (containing positions, etc) followed by ‘ping’ data (composed of time-series of echo levels as 8-bit integers). In order to correctly parse out the two, you need to know how many bytes the header contains. Unfortunately, this length varies with instrument. So far, we have encountered data files with between 67 and 72 bytes per header packet. In the pyhum program, you pass it a variable saying what model of humminbird unit the data comes from. It uses this information in the following way:
if model==798:
headbytes=72
elif model==1199:
headbytes=68
else: #tested so far 998, 1198
headbytes=67
SON file header packet¶
First, we describe the data format of the header packet. All data are big-endian.
Common to all units
- byte 1 = spacer
- bytes 2-5 = character, record number
- byte 6 = spacer
- bytes 7-10 = character, time in milliseconds
- bytes 11 = spacer
- bytes 12-15 = character, UTM x coordinate
- bytes 16 = spacer
- bytes 17-20 = character, UTM y coordinate
- bytes 21 = spacer
- bytes 22-23 = short integer, GPS quality flag (0 = good, 1=bad)
- bytes 24-25 = short integer, heading in tenths of a degree
- byte 26 = spacer
- bytes 27-28 = short integer, GPS quality flag (0 = good, 1=bad)
- bytes 29-30 = short integer, speed in cm / s
- bytes 31-36 = spacer
- bytes 37-40 = character, depth in cm
- byte 41 = spacer
- byte 42 = integer, beam number (=0, 50 or 83 kHz; =1 200 kHz; =2 SI Poort; =3 SI Starboard)
- byte 43 = spacer
- byte 44 = integer, volt scale
- byte 45 = spacer
- bytes 46-49 = character, frequency in Hz
Then, the data structure is different for different models.
1199 model
- bytes 50-64 = spacer
- bytes 45-68 = character, sentence length
- bytes 69 = spacer
798 model
- byte 50 = spacer
- bytes 51-62 = character, spacer
- byte 63 = spacer
- bytes 64-67 = character, sentence length
- bytes 68-73 = spacer
898, 998 or 1198 models
- bytes 50-54 = spacer
- bytes 55-58 = character, spacer
- bytes 59-63 = spacer
- bytes 64-67 = character, sentence length
- bytes 68 = spacer
SON file sonar data packet¶
The sonar data is all bytes in the packet after the header bytes have been read in. The data is big endian, unsigned char (8-bit integers).