Process Details

Description:

The general flow of the system starts with sources. Three sources exist, Event based Local Data Manager, ASCII Socket Based Packets and XML HTTP GET requests. Both the ASCII Socket and the XML Requests have independent python daemons running to constantly listen to a socket or a threaded timer to request a XML resource. When a valid observation is found on either feed method the script parses the observation and constructs a message for the ‘INFLOW’ exchange. The Local Data Manager integrates with multiple upstream providers(NWS, NOAA, Unidata), when a new version of a requested dataset is available the LDM daemon is notified and the dataset is downloaded. The only authentication for LDM is IP based white-listing, so the requesting IP address is critical to being allowed to connect. Once the file is downloaded the LDM daemon executes a python script to parse the resource and constructs a message for the ‘INFLOW’ exchange. The message queue server is configured to fanout ‘INFLOW’ into two queues a ‘Rtqueue’ and ‘ToBeQc’ for the two products lines “Real-time” Data Files and Quality Controlled Data Files respectively.

 

A product daemon is setup as a consumer for each of the two desired products, and configured as a call back as packets arrive within the queue. The “Real-time” Data File product pulls the message and inserts directly into a NetCDF file via the netCDF4 python module. The Quality Controlled Data File product pulls the last 5 measurements for a given station and preforms a quick quality control on the incoming packet. Additional quality tags are added to the packet and the packet is passed to the ‘OUTFLOW’ exchange. A final daemon rests subscribed to the ‘Final’ queue, taking the Quality Controlled messages and constructing another NetCDF file. Additionally to speed up the QC process the latest 5 observations are stored in an additional file name indexed by station id.

Once the files are created they are instantly available via the OpenDAP interface and web portal. Additional derivative products such as models, plots, animations can be constructed in most programming languages off a unified data source.

 

Limitations:

Current System: Intel Core2 Duo E8400 @3ghz Ram: 4gig All services on one machine

CPU: Avg Total Usage 73% over 24 hour period.

Inflow: Up to Inflow 200/s via local messaging, not a uniform discovery typical peaks to 4-5k when the LDM feeds come downstream. We are only looking at corner of Illinois, so we throw out a lot of data to make the process manageable.

Outflow: Quality Controlled 60/s

No Quality Control 150/s