Experimental Scancode Integration
Starting from version 1.4.0 Solicitor can be integrated with the tool ScanCode to include detailed information gathered from the "deep license scan" performed by ScanCode. This includes detected Licenses, Copyrights and Notice-Files.
|The current integration with ScanCode is experimental: The used ScanCode parameters, interfacing and curations logic and all parts of the data persistence are experimental and thus might result in insufficient quality of results. The current workflow and implementation is subject to change in future versions without further notice.|
The general workflow when integrating with ScanCode consists of the following 3 steps:
Execute Solicitor in a "classic" way i.e. just based on the data provided via the Readers as described in Reading License Information with Readers. Besides the normal reports/documents generated this will also create scripts for downloading the needed OSS source codes and run Scancode.
Download source codes and run ScanCode by executing the generated scripts. The downloadad sources and ScanCode results will be saved to a directory tree in the local filesytem.
Execute Solicitor a second time. For all ApplicationComponents where ScanCode information is available (stored in the local directory tree) the license data as obtained from the Readers is replaced by this information. The data model is enriched with the found copyright and notice file information. Reports (see Reporting and Creating output documents) are now based on the ScanCode data (where available).
The scripts generated by Solicitor to download sources and run ScanCode are in Bash syntax. So either run it on a system using natively Bash (linux) or install an appropriate environment (e.g. Git Bash) if you are using a windows environment.
Download and install ScanCode from https://github.com/nexB/scancode-toolkit/releases. Make sure that the executable is included in the search PATH for executables.
As the ScanCode integration is still experimental it is currently deactivated by default.
To enable it set system property
(See Built in Default Properties for information how to do so.)
If this feature flag is not activated then Solicitor will not try to attempt to read ScanCode information from the local file system.
Solicitor 1st run
Execute Solicitor in a classic way. As part of the report creation step this will generate two scripts:
output/scancode_PROJECTNAME.sh(for downloading the sources, also calls
output/scancodeScan.sh(for running ScanCode on the downloaded sources)
Scripts will include all ApplicationComponents with exception of those where
normalizedLicenseType was set to
Download Sources and run Scancode
Change to directory
output and execute
This will download all sources and process them via ScanCode.
This might take several hours to complete.
Results are stored in subdirectory
Source of the directory
output and is organized in a tree structure given by the PackageURL of the ApplicationComponents.
Solicitor 2nd run
Execute Solicitor a second time.
After reading the component/license information from the Readers (but before starting the rule engine)
Solicitor will try to look up ScanCode information from the directory tree in
output/Sources for all processed ApplicationComponents. If information is found for an ApplicationsComponent the following is done:
License information (including URL of license text) as obtained from the Readers is replaced by the license info found by ScanCode
Copyrights are taken from ScanCode results
Info on NOTICE file is taken from the ScanCode results
If the ScanCode results contain information about a project URL then this is stored as
Main target of the additional information obtained from ScanCode is currently the new report
Attributions_PROJECTNAME.html which lists
all ApplicationComponents (excluding those which are not OSS licensed)
with all found copyrights
and all licenses
including all different license texts
and contents of all found NOTICE files
The data obtained from ScanCode might be affected by false positives (wrongly detected a license or copyright) or false negatives (missed to detect a license or copyright). To compensate such defects there are two mechanisms: Applying Curation information from a "curations" file or changing the License information via the decision table rules.
To define curations you might create a file
output/curations.yaml containing the following structure:
artifacts: - name: pkg/npm/@somescope/somepackage/1.2.3 (1) url: https://github.com/foo/bar (2) licenses: (3) - license: MIT (4) url: https://raw.githubusercontent.com/foo/bar/LICENSE (5) copyrights: (6) - (c) 2021 Donald Duck (7) - "(c) 2019 Mickey Mouse <http://mickey.mouse>" (8) - name: pkg/npm/@anotherscope/anotherpackage/4.5.6 (9) . . .
|1||Path of the package information as used in the file tree. Derived from the PackageURL.|
|2||URL of the project, will be stored as
|3||Licenses to set. Optional. If defined then all found licenses will be replaced by the list of licenses given here.|
|4||SPDX identifier of license.|
|5||URL pointing to license text.|
|6||Copyrights to set. Optional. If defined then all found copyrights will be replaced by the list of copyrights given here.|
|7||A single copyright.|
|8||Another copyright. Note that due to YAML syntax any string containing
|9||Further packages to follow.|
Decision table rules
As for license information obtained from the Readers the license information from ScanCode can also be altered using decision table rules. A new attribute
origin was introduced in the
RawLicense entity as well as condition field in decision table
origin attribute in
Rawlicense either contains the string
scancode if the license information came from ScanCode or it contains the (lowercase) class name of the used Reader.
Using the Extended comparison syntax it is possible to qualify whether a rule should apply for licenses found by ScanCode or not:
|Value of condition Origin||rule applies for …|
… licenses obtained from ScanCode information
… licenses obtained from normal Readers
… in both cases