Buoy System Handbook
<<< Previous	Processing Overview	Next >>>

Data Sources

There are currently four data sources for the buoy system,

Cell Phone. Data telemetered via cell phone constitutes the backbone of the buoy data ingest.
1. Data transmitted back to PhoG group windows PC named gorry on an hourly basis. This doesn't include the Jordan Basin buoy M (this will change as of M0103), but does include unprocessed optics data.
2. MATLAB runs within a cron job on a linux workstation micmac, reading buoy-specific files from gorry.
3. Data is split out into constituent data streams, which may include met, doppler, and others. A list of the datastreams and a description of the structure of each can be found elsewhere.
4. As the campbell data is read in, it is also archived so that it may be manually examined later. Bigelow currently uses these ascii archives to perform their own processing.
5. Each data stream is processed and archived to NetCDF files describing the time series data. The raw data is archived separately from the processed data, and this particular data stream is known as sensor-raw and realtime. The most recent observation for most parameters is used to update a MySQL database table that is accessed by the web server. These entries are overwritten upon each new observation.
The script that initiates the cell phone processing is currently ${GOMOOS_ROOT}/bin/process_buoys.sh.
GOES. As a backup to the cell phone system, many buoys are now outfitted with GOES transmitters. This constitutes an equivalent datastream to that provided by the cell phones. It is a smaller datastream due to bandwidth limitations, but the most important geophysical parameters are usually transmitte. For more information on the GOES satellite system, see [ Nestlebush ]. The script that initiates the GOES processing is currently ${GOMOOS_ROOT}/bin/process_goes.sh. The basic processing scenario is as follows...
1. Every hour a buoy transmits to the GOES satellite system. There is a schedule that is followed, where the buoys usually transmit in windows of 30 seconds. As soon as this transmission window has passed, it is safe to initiate a download from UMaine.
2. The download from the Wallops Data Center is accomplished via a a perl script. The Expect perl module is employed to automate this process. Each buoy has its own unique id, referred to as a platform_id. The data is retrieved in a binary format. The perl script which accomplishes this task is currently ${GOMOOS_ROOT}/bin/retrieve_goes_buffer.pl.
3. A MATLAB session launches, decodes the binary input, and classifies the data into the various buffer streams. After this point, the processing is indistinguishable from the cell phone processing. However, the data retrieved via GOES is archived separately from the cell phone data so as to correctly distinguish its origins. It is in fact referred to as the goes-sensor-raw and goes-realtime datastream.
Bigelow Laboratory. The general scheme followed upon receipt of any data is to archive it "as-is" into NetCDF format (in the "sensor-raw" files) . This is done with the optics data as well. But after the initial ingest of data via the cell phones, the data made available to Bigelow Laboratory in its original Campbell format. Bigelow Laboratories initiates a pull on this data (specifically the optics and SBE37 buffers) via HTTP (soon via rsync on the NetCDF files?) and runs their own processing there. This data is then made available vi HTTP and pulled pack down to Micmac for NetCDF archival (in the "realtime" files). Some minor quality control checks are also performed.
The script that does this is ${GOMOOS_ROOT}/bin/process_optics.sh.
NDBC. Data from the last 45 days for the buoys 44005, 44007, 44011, 44013, 44018 and C-MAN stations IOSN3, MISM1, and MDRM1 are retrieved via HTTP from NDBC. NOAA is also in the process of putting out drifters, which can be imported as well (44585)...
The script that does this is ${GOMOOS_ROOT}/bin/process_noaa_buoy.sh.

There is a core of processing routines that all of these sources feed into. For example, all aanderaa data regardless of source feeds into a routine called aanderaa_processing_stream.m. The other instrument data streams have similarly named core processing routines. Therefore, in order to add aanderaa data from a hypothetical new data stream, all one would have to would be to package up the input data to match that which is expected by aanderaa_processing_stream.m.

Quality control is guided by the valid_range attribute that is present for each measured parameter in its NetCDF file. For example, temperature on the Aanderaa has a valid range of [-0.5, 30] °C. Each datum is checked against its valid range by a single MATLAB routine. Datums falling outside this range have their quality flags subsequently modified. Each day, a separate process examines how many times a parameter exceeded its valid range and reports it. This information can be used to determine whether or not a valid range needs to be changed. The valid range attribute is a part of the standard set of metadata for NetCDF files and need not be the same across all NetCDF files. For example, it may be determined that a different valid range exists for temperatures at 50 meters depth as opposed to temeperatures at 2 meters depth.

Parameters can also be marked as completely invalid and without need of any range checking. Being without need of range checking can be important if sensor values are fluctuating wildly. Some of those fluctuation values might well be within the stated valid range, but are nonetheless no less questionable. By marking such a parameter as invalid (done with the is_dead flag), this situation is easily dealt with.