We want to slurp the results from static code analysis into a database, which means coercing all of the results into some common interchange format, codenamed “firehose” (which could also be the name of the database).
The idea is a common XML format that all tools can emit that:
- describes a warning
- gives source-code location of the warning: filename, function, line number.
- optionally with a CWE identifier
- potentially with other IDs and URLs, e.g. the ID “SIG30-C” with URL https://www.securecoding.cert.org/confluence/display/seccode/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers
- optionally describes code path to get there (potentially interprocedural across source files), potentially with “state” annotations (e.g. in the case of a reference-counting bug, it’s useful to be able to annotate the changes to the refcount).
together with a simple Python API for working with the format as a collection of Python objects (creating, write to XML, read from XML, modification, etc)
The data can be round-tripped through both XML and JSON.
There is a RELAX-NG schema for validating XML files.
References to source files in the format can include a hash of the source file itself (e.g. SHA-1) so that you can uniquely identify which source file you were talking about.
This format would be slurped into the DB for the web UI, and can have other things done to it without needing a server: e.g.:
- convert it to the textual form of a gcc compilation error, so that Emacs etc can parse it and take you to the source
- be turned into a simple HTML report locally on your workstation
Projects using Firehose:
- mock-with-analysis can rebuild a source RPM, capturing the results of 4 different code analysis tools in Firehose format (along with all source files that were mentioned in any report).
- The “firehose” branch of cpychecker can natively emit Firehose XML reports