The Apache-2.0 license, and the Apache Individual Contribution License Agreement, both remind contributors that they are responsible for disclosing any copyrighted materials in submitted contributions that are not their original creation. This is as true when using generative AI tooling, as it is when using materials from public websites or code from other open source projects.
When disclosing these materials, contributors should also identify the licensing for these materials. The ASF maintains a 3rd Party Licensing Policy that provides guidance on which licenses are acceptable, along with instructions on the treatment of 3rd Party Works.
While in general, content generated by a non-human (e.g., machine or monkey) is not copyrightable, if content consists of some portions generated by AI and other portions authored by a human, the portions authored by a human may be copyrightable.
As explained by the following U.S. Copyright Office Registration Guidance (3/16/2023):
“For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are ‘independent of’ and do ‘not affect’ the copyright status of the AI-generated material itself.”
These portions authored by a human may simply come from the prompt the human provided or subsequent changes they make. However, a prominent concern with generative AI is the risk of reproducing portions of materials that they were trained on, some of which may be copyrightable subject matter. Thus, a recommended practice when using generative AI tooling is to use tools with features that identify any included content that is similar to parts of the tool’s training data, as well as the license of that content.
Given the above, code generated in whole or in part using AI can be contributed if the contributor ensures that:
When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by:
Finally, please note that while the above seems like a reasonable set of guidelines in June 2023, this is a rapidly evolving area. Whatever we recommend to PMCs today, policies will need to be re-evaluated and updated in response to:
We will continue communicating with PMC and ASF members as updates to this FAQ get discussed and merged in.
The above text should apply for documentation as well. However, the most popular tooling for documentation, ChatGPT, has restrictive licensing, so caution should be applied.
As with documentation, the above principles would still apply. Though with images being a non textual form, the details quickly become complex. We expect this to continue to be a rapidly evolving area.
Refer to the 3rd Party Licensing Policy as with any other contribution.