bufrpy¶
Bufrpy is a pure-Python BUFR message decoder. BUFR messages are typically used to transmit meteorological observations. Bufrpy was developed to work with NWCSAF RDT and HRW data, but there should not be any reason why it could not be used with any BUFR messages, with the following limitations:
- Messages with multiple data subsets are not supported
- Compressed messages are not supported
- Operator descriptors are not supported
- Sequence descriptors are not supported
Adding support for any of the above should be feasible, but we have not encountered such data in our use of the library.
Examples¶
Usage examples
Examples¶
RDT Contour Extraction¶
import bufrpy
from bufrpy.template import safnwc
import re
import itertools
import sys
def to_rings(bufrmsg):
""" Return a list of cloud contours as closed rings, in lat/lon ordinate order"""
out = []
subsets = bufrmsg.section4.subsets
for subset in subsets:
for cloudsystem in subset.values[19]: # cloud systems are at index 19
contour = cloudsystem[0] # contour is at index 0 of cloud system
point_vals = list(itertools.chain(contour, itertools.islice(contour,1))) # close contour
points = []
for (lat, lon) in point_vals:
points.append((lat.value, lon.value)) # extract values
out.append(points)
return out
def to_wkt(rings):
out = []
for ring in rings:
# Change ordinate order, convert to strings, join and stick in a POLYGON
points = [" ".join(map(str, reversed(latlon))) for latlon in ring]
edge = ",".join(points)
out.append("POLYGON((" + edge + "))")
return "\n".join(out)
if __name__ == '__main__':
template = safnwc.read_template(open(sys.argv[1], 'rb'))
msg = bufrpy.decode_file(open(sys.argv[2], 'rb'), template)
rings = to_rings(msg)
wkt = to_wkt(rings)
print wkt
Data Model¶
Data model
Data Model¶
Bufrpy data model matches that of the BUFR messages fairly closely, though not as closely as libbufr or the Python libbufr wrappers. Typically BUFR messages are stored in a file so that the file contains a single BUFR message. Some applications (e.g. SAFNWC HWR) store multiple messages in a single file, requiring some tricks to read them.
The top-level data structure in bufrpy is Message that corresponds to one BUFR message. A BUFR message contains 6 sections, some of which contain metadata and some the actual data. The most important sections are numbers 3 and 4. They contain data descriptors and the data itself respectively. Logically BUFR files contain at least one subset of data. Each subset follows the same structure, defined by the data descriptors in section 3. In bufrpy, sections are attributes of Message. Section 4 contains the data subsets as a list of BufrSubset objects.
Logically each subset is a linear sequence of data values. All metadata about the data is stored in the descriptors. However, there are certain looping constructs (replication descriptors) that allow repetition of a sequence of data items. This leads to hierarchical logical structure. Physically the data is still stored in a linear sequence (or in interleaved fashion if a message has multiple data subsets). In bufrpy the data of each dataset is represented by a list of BufrValue objects. A replicated sequence in the data is represented by a list of lists of BufrValue objects so that the outer list contains one list for each repetition, and the inner lists are equivalent to values of a BufrSubset. Hierarchical data is thus naturally represented as nested lists.
API Documentation¶
API documentation covers the public API
API¶
Reading BUFR messages¶
- bufrpy.decode_file(f, b_table)¶
Decode BUFR message from a file into a Message object.
Parameters: - f (file) – File that contains the bufr message
- b_table (Mapping|Template) – Either a mapping from BUFR descriptor codes to descriptors or a Template describing the message
- bufrpy.decode(stream, b_table, skip_data=False)¶
Decode BUFR message from stream into a Message object.
See WMO306_vl2_BUFR3_Spec_en.pdf for BUFR format specification.
Parameters: - stream (ByteStream) – Stream that contains the bufr message
- b_table (Mapping|Template) – Either a mapping from BUFR descriptor codes to descriptors or a Template describing the message
- skip_data (bool) – Skip decoding data? Can be used to get only extract metadata of a file to e.g. analysis of decoding errors.
- bufrpy.decode_all(stream, b_table)¶
Decode all BUFR messages from stream into a list of Message objects and a list of decoding errors.
Reads through the stream, decoding well-formed BUFR messages. BUFR messages must start with BUFR and end with 7777. Data between messages is skipped.
Parameters: - stream (ByteStream) – Stream that contains the bufr message
- b_table (Mapping|Template) – Either a mapping from BUFR descriptor codes to descriptors or a Template describing the message
BUFR message representation¶
BUFR Message¶
BUFR Message is represented as a Message.
- class bufrpy.Message¶
Represents a complete BUFR message, of either edition 3 or edition 4.
Variables: - section0 (Section0) – Section 0, start token and length
- section1 (Section1v3|Section1v4) – Section 1, time and source metadata
- section2 (Section2) – Section 2, optional metadata, not processed
- section3 (Section3) – Section 3, message structure
- section4 (Section4) – Section 4, message contents
- section5 (Section5) – Section 5, end token
The Message contains several sections. Of these, sections 1, 3 and 4 are the most interesting. Section 1 contains message-level metadata, section 3 describes message structure and section 4 contains the actual data. Note that section 1 comes in multiple variants. The variant used depends on which edition of the BUFR specification was used to code the message.
- class bufrpy.Section1v3¶
Section 1 of a BUFR edition 3 message.
Variables: - length (int) – Length of Section 1
- master_table_id (int) – Master table identifier
- originating_centre (int) – Originating/generating centre
- originating_subcentre (int) – Originating/generating subcentre
- update_sequence_number (int) – Update sequence number
- optional_section (int) – 0 if optional section not present, 1 if present
- data_category (int) – Data category (identifies Table A to be used)
- data_subcategory (int) – Data subcategory
- master_table_version (int) – Master table version
- local_table_version (int) – Local table version
- year (int) – Year (originally of century, converted to AD by adding 1900)
- month (int) – Month
- day (int) – Day
- hour (int) – Hour
- minute (int) – Minute
- class bufrpy.Section1v4¶
Section 1 of a BUFR edition 4 message.
Variables: - length (int) – Length of Section 1
- master_table_id (int) – Master table identifier
- originating_centre (int) – Originating/generating centre
- originating_subcentre (int) – Originating/generating subcentre
- update_sequence_number (int) – Update sequence number
- optional_section (int) – 0 if optional section not present, 1 if present
- data_category (int) – Data category (identifies Table A to be used)
- data_subcategory (int) – Data subcategory
- local_subcategory (int) – Local subcategory
- master_table_version (int) – Master table version
- local_table_version (int) – Local table version
- year (int) – Year (four digits)
- month (int) – Month
- day (int) – Day
- hour (int) – Hour
- minute (int) – Minute
- second (int) – Second
- class bufrpy.Section3¶
Section 3 of a BUFR message.
Section 3 contains descriptors that describe the actual data format. Descriptors are instances of one of the descriptor classes: descriptors.ElementDescriptor, descriptors.ReplicationDescriptor, descriptors.OperatorDescriptor, descriptors.SequenceDescriptor.
Variables:
BUFR Template¶
BUFR Template is represented as a Template.
- class bufrpy.template.Template[source]¶
BUFR message template
Describes a BUFR message by listing the descriptors that are present in the message. The same descriptors must be present in the same order in Section 3 of the actual message. The template includes descriptions, length, units and other information required to interpret the descriptor references in the message.
Possible descriptor types are ElementDescriptor, ReplicationDescriptor, SequenceDescriptor and OperatorDescriptor.
Variables: - name – Name of the template
- descriptors – List of descriptors
BUFR Descriptors¶
BUFR descriptors describe the data of the message. Atomic data elements are described by ElementDescriptor, multiple repetitions of a set of descriptors by ReplicationDescriptor, fixed sequences of elements by SequenceDescriptor and more complex structures by OperatorDescriptor.
- class bufrpy.descriptors.ElementDescriptor[source]¶
Describes single value
Data element described with an ElementDescriptor is decoded into a BufrValue, with either textual or numeric value.
Variables: - code (int) – Descriptor code
- length (int) – Length of data, in bits
- scale (int) – Scaling factor applied to value to get scaled value for encoding
- ref (float) – Reference value, subtracted from scaled value to get encoded value
- significance (str) – Semantics of the element
- unit (str) – Unit of the element, affects interpretation of the encoded data in case of e.g. textual vs. numeric values
- class bufrpy.descriptors.ReplicationDescriptor[source]¶
Describes a repeating collection of values
Data described with a ReplicationDescriptor is decoded into a list of lists of values. The outer list has one element per replication and the inner lists one element per replicated field.
Variables:
- class bufrpy.descriptors.SequenceDescriptor[source]¶
Describes a fixed sequence of elements in compact form
Similar to a replication with count 1, but the encoded form is more compact, since the sequence of fields is implicit. Except that at least in NWCSAF Templates the constituent elements of the sequence are also present in the template.
Variables: - code (int) – Descriptor code
- length (int) – Length of data, sum of lengths of constituent descriptors
- descriptor_codes (int) – Sequence containing constituent descriptor codes
- significance (str) – Meaning of the sequence, always empty string
- descriptors – Sequence containing constituent descriptors.
BUFR Values¶
BUFR values are represented a instances of BufrValue. The values of one BUFR message subset are contained in a BufrSubset.
- class bufrpy.value.BufrValue[source]¶
Contains single value
Contains single value, both in raw and decoded form, plus a link to its descriptor.
Variables: - raw_value (str|int) – Raw value. Either as hex-encoded string for textual values or an unsigned integer for numeric values
- value (str|int|float|None) – Decoded value. Value decoded according to its descriptor. Textual values are strings and numeric values floats or ints. Missing value is indicated by None.
- descriptor (ElementDescriptor) – The descriptor of this value
Reading BUFR tables¶
- bufrpy.table.libbufr.read_tables(b_line_stream, d_line_stream=None)[source]¶
Read BUFR table(s) in from libbufr text file(s).
The return value is a dict that combines the tables read.
Parameters: - b_line_stream – Iterable of lines, contents of the B-table file
- d_line_stream – Iterable of lines, contents of the D-table file
Returns: Mapping from FXY integers to descriptors
Return type: dict
Raises: - NotImplementedError – if the table contains sequence descriptors
- ValueError – if the table contains descriptors with illegal class (outside range [0,3])
Reading BUFR templates¶
- bufrpy.template.safnwc.read_template(line_stream)[source]¶
Read SAFNWC message template into a Template
SAFNWC template lines look as follows:
1 001033 0 0 8 Code table Identification of originating/generating centre
Parameters: line_stream – Lines of SAFNWC template file Returns: the template as Template Return type: Template Raises ValueError: if the template contains a desccriptor outside range [0,3]
JSON encoding/decoding¶
- bufrpy.to_json(msg)¶
Convert a BUFR message into JSON-encodable form.
The conversion preserves descriptors in Sections 3 and data in Section 4 but not the other metadata.
The resulting JSON is considerably larger than BUFR messages themselves, but compresses to smaller size with e.g. gzip. The JSON is also faster to read and self-descriptive, i.e. a separate descriptor table is no longer necessary.
Parameters: msg (Message) – Message to convert Returns: Dict that can be converted to JSON using the standard json module Return type: dict
- bufrpy.from_json(json_obj)¶
Convert an object decoded from JSON into a BUFR message
The conversion reads partial contents of BUFR sections 3 and 4 from a dict generated by to_json().
The resulting bufrpy.Message contains only section 3 and 4, and the sections only contain descriptors and data, respectively.
Parameters: json_obj (dict) – JSON object to decode Returns: BUFR message with original descriptors and data Return type: Message