Data model

class firehose.model.Analysis

The Analysis class represents one invocation of a code analysis tool.

It corresponds to the <analysis> XML element, the top-level element of a Firehose XML document.

metadata

Metadata

results

A list of Result objects, representing the various issues, failures, and other information found during the analysis.

customfields

CustomFields or None

Here is the pertinent part of the XML schema:

  <start>
    <!-- Results from the invocation of an analysis tool -->
    <element name="analysis">
      <ref name="metadata-element"/>
      <element name="results">
        <zeroOrMore>
          <choice>
            <ref name="issue-element"/>
            <ref name="failure-element"/>
            <ref name="info-element"/>
          </choice>
        </zeroOrMore>
      </element>
      <optional>
        <ref name="custom-fields-element"/>
      </optional>
    </element>
  </start>
__init__(self, metadata, results, customfields=None):
Parameters:
classmethod from_xml(cls, fileobj)

Parse XML from fileobj, and return an Analysis instance representing the data seen there.

to_xml(self)

Generate an ET.ElementTree() representing the data within self.

to_xml_bytes(self)

Generate a bytes instance containing an XML serialization of the data within self.

Results

class firehose.model.Result

Result is a base class

There are three subclasses:

  • an Issue represents a report from the analyzer about a possible problem with the software under test.
  • an Info represents additional kinds of information generated by an analyzer that isn’t a problem per-se e.g. code metrics, licensing info, etc.
  • a Failure represents a report about a failure of the analyzer itself (e.g. if the analyzer crashed).
class firehose.model.Issue(Result)

An Issue represents a report from the analyzer about a possible problem with the software under test.

It corresponds to the <issue> XML element within a Firehose XML document.

cwe

(int or None): The Common Weakness Enumeration ID (see http://cwe.mitre.org/index.html ) e.g. “131” representing CWE-131 aka “Incorrect Calculation of Buffer Size” http://cwe.mitre.org/data/definitions/131.html

testid

(str or None): Each static analysis tool potentially has multiple tests, with its own IDs for its own tests. These can be captured here, as free-form strings.

location

(Location): Where is the problem?

message

(Message): A message summarizing the problem.

notes

(Notes or None): Additional descriptive details.

trace

(Trace or None): An optional list of events that describe the circumstances leading up to a problem.

severity

(str or None): Each static analysis tool potentially can report a “severity”, which may be of use for filtering.

The precise strings are likely to vary from tool to tool. To avoid data-transfer issues, support storing it as an optional freeform string here.

See: http://lists.fedoraproject.org/pipermail/firehose-devel/2013-February/000001.html

customfields
(CustomFields or None): A given tool/testid may have additional key/value pairs that it may be useful to capture.
write_as_gcc_output(self, out)

Write the issue in the style of a GCC warning to the given file-like object.

>>> issue.write_as_gcc_output(sys.stderr)
examples/python-src-example.c:40:4: warning: ob_refcnt of '*item' is 1 too high [CWE-401]
was expecting final item->ob_refcnt to be N + 1 (for some unknown N)
due to object being referenced by: PyListObject.ob_item[0]
but final item->ob_refcnt is N + 2
examples/python-src-example.c:36:14: note: PyLongObject allocated at:         item = PyLong_FromLong(random());
examples/python-src-example.c:37:8: note: when PyList_Append() succeeds
get_cwe_str(self)

Get a string giving the CWE title, or None:

>>> issue.get_cwe_str()
'CWE-131'
get_cwe_url(self)

Get a string containing the URL of the CWE id, or None:

>>> issue.get_cwe_url()
'http://cwe.mitre.org/data/definitions/131.html'
class firehose.model.Info(Result)

An Info represents additional kinds of information generated by an analyzer that isn’t a problem per-se e.g. code metrics, licensing info, cross-referencing information, etc.

It corresponds to the <info> XML element within a Firehose XML document.

infoid

(str or None): an optional free-form string identifying the kind of information being reported.

location

Location or None

message

Message or None

customfields

CustomFields or None

class firehose.model.Failure(Result)

A Failure represents a report about a failure of the analyzer itself (e.g. if the analyzer crashed).

If any of these are present then we don’t have full coverage.

For some analyzers this is an all-or-nothing affair: we either get issues reported, or a failure happens (e.g. a segfault of the analysis tool).

Other analyzers may be more fine-grained: able to report some issues, but choke on some subset of the code under analysis. For example cpychecker runs once per function, and any unhandled Python exceptions only affect one function.

It corresponds to the <failure> XML element within a Firehose XML document.

failureid

(str or None): Each static analysis tool potentially can identify types of way that it can fail.

Capture those that do here, as (optional) free-form strings.

location

Location: Some analysis tools may be able to annotate a failure report by providing the location within the software-under-test that broke them.

For example, gcc-python-plugin has a gcc.set_location() method which can be used by a code analysis script to record what location is being analyzed, so that if unhandled Python exception happens, it is reported at that location. This is invaluable when debugging analysis failures.

message

Message: A summary of the failure.

customfields

CustomFields or None: Every type of failure seems to have its own kinds of data that are worth capturing:

  • stdout/stderr/returncode for a failed subprocess
  • traceback for an unhandled Python exception
  • verbose extra information about a cppcheck failure

etc. Hence we allow a <failure> to optionally contain extra key/value pairs, based on the failureid.

Metadata

class firehose.model.Metadata

Holder for metadata about an analyzer invocation.

It corresponds to the <metadata> XML element within a Firehose XML document.

generator

Generator

sut

Sut or None

file_

File or None

stats

Stats or None

class firehose.model.Generator
name

str

version

str or None

class firehose.model.Stats

Stats is an optional field of Metadata for capturing stats about an analysis run.

wallclocktime

float: how long (in seconds) the analyzer took to run

Describing the software under test

Warning

this part of the schema may need more thought/work

class firehose.model.Sut

Base class for describing the software-under-test.

class firehose.model.SourceRpm(Sut)

It corresponds to the <source-rpm> XML element within a Firehose XML document.

name

str

version

str

release

str

buildarch

str

class firehose.model.DebianBinary(Sut)

Internal Firehose representation of a Debian binary package. This Object is extremely similar to a SourceRpm.

It corresponds to the <debian-binary> XML element within a Firehose XML document.

name

str: the binary package name.

version

str: should match Upstream’s version number

release

str or None: should be the Debian package local version. This should only be omited if the package is a Debian Native package.

buildarch

str: valid entries include amd64`', ``kfreebsd-amd64, armhf, hurd-i386, among others for Debian.

class firehose.model.DebianSource(Sut)

Internal Firehose representation of a Debian source package. This Object is extremely similar to a SourceRpm, but does not include the buildarch attribute.

It corresponds to the <debian-source> XML element within a Firehose XML document.

name

str: should be the source package name

version

str: should match Upstream’s version number

release

str or None: if given, should be the Debian package local version. This should only be omited if the package is a Debian Native package.

class firehose.model.Message

Summary text aimed at a developer. This is required for an Issue, but is also can (optionally) be provided by a Failure or Info.

It corresponds to the <message> XML element within a Firehose XML document.

text

str

class firehose.model.Notes

Additional optional descriptive details for a Result or for a State.

It corresponds to the <notes> XML element within a Firehose XML document.

text

str

Describing source code

class firehose.model.Location

A particular source code location.

It corresponds to the <location> XML element within a Firehose XML document.

file

File

function

Function or None. The function (or method) containing the problem.

This is optional. Some problems occur in global scope, and unfortunately, some analyzers don’t always report which function each problem was discovered in. Given that function names are less likely to change than line numbers, this is something that we should patch in each upstream analyzer as we go.

We can refer to either a location, or a range of locations within the file:

point

Point or None

range_

Range or None

class firehose.model.File

A description of a particular source file.

It corresponds to the <file> XML element within a Firehose XML document.

givenpath

str: the filename given by the analyzer.

This is typically the one supplied to it on the command line, which might be absolute or relative.

Examples:

  • “foo.c”
  • “./src/foo.c”
  • “/home/david/libfoo-1.0/src/foo.c”
abspath

(str or None): Optionally, a record of the absolute path of the file, to help deal with collating results from a build that changes working directory (e.g. recursive make).

hash_

(Hash or None)

class firehose.model.Hash

An optional value within File, allowing the report to specify a hash value for a particular file.

This can be used for tracking different versions of files when collating different reports and e.g. for caching file content in a UI.

It corresponds to the <hash> XML element within a Firehose XML document.

alg

str: the name of the hash algorithm.

TODO: what naming convention?

hexdigest

str: the hexadecimal value of the digest (lower-case hexdigits, without any leading 0x).

class firehose.model.Function

Identification of a particular function within source code.

It corresponds to the <function> XML element within a Firehose XML document.

name

str: the name of the function or method.

class firehose.model.Point

Identification of a particular line/column within a source file.

It corresponds to the <point> XML element within a Firehose XML document.

line

int: the 1-based number of the line containing the point

column

int: 1-based number of the column

Note

GCC uses a 1-based convention for source columns, whereas Emacs’s M-x column-number-mode uses a 0-based convention.

For example, an error in the initial, left-hand column of source line 3 is reported by GCC as:

some-file.c:3:1: error: ...etc...

On navigating to the location of that error in Emacs (e.g. via next-error), the locus is reported in the Mode Line (assuming M-x column-number-mode) as:

some-file.c   10%   (3, 0)

i.e. 3:1: in GCC corresponds to (3, 0) in Emacs.

class firehose.model.Range

Identification of a range of text within a source file.

It corresponds to the <range> XML element within a Firehose XML document.

start

(Point)

end

(Point)

Capturing the circumstances leading up to a problem

class firehose.model.Trace

An optional list of events within an Issue that describe the circumstances leading up to a problem.

It corresponds to the <trace> XML element within a Firehose XML document.

See example of a trace.

states

list of State

class firehose.model.State

A state within a Trace.

location

Location

notes

Notes or None

Other data

class firehose.model.CustomFields(OrderedDict)

A big escape-hatch in the data model: support for arbitrary, ordered key/value pairs for roundtripping data specific to a particular situation. e.g. debugging attributes for a particular failure

It corresponds to the <custom-fields> XML element within a Firehose XML document.