AI model licensing

There are two main categories of open-source licenses. Firstly, permissive licenses such as Apache 2.0 and MIT allow the license holder significant freedom to use the code even as part of proprietary software that the license holder is developing [1]. Restrictive licenses such as GPL family of licenses impose terms such as requiring derivative work to be similarly licensed, also known as "copyleft" [2].

There are also similarities when comparing licenses in these two main categories. Open source licenses allow the license holder to obtain the source code and to use that source code in creating derivative products [3]. The original owner retains ownership of the work and the license and attribution information must be included with the software [4].

Restrictive licenses pose the greatest challenge when incorporating in proprietary software due to the uncertainty of determining whether a software "contains or is derived from" an open source software [5]. Of note, software loosely coupled through dynamic linking or plug-ins is most contentious [6]. Challenges also arise when a project uses open source code from multiple different "copyleft" licenses since each license requires derivative code to be released under its own license [7].

The OpenAI API is open sourced under the MIT License [8]. This is likely done so developers can reuse and repackage the code when writing various other applications which access the OpenAI API. However, OpenAI's model itself is not open source, it operates using a Software as a Service model. Thus, anyone wishing to use OpenAI's capabilities will need to be connected to the Internet and may need to pay for tokens depending on their usage volume.

In contrast, DeepSeek open sourced their model under the MIT License [9]. Thus, anyone with a powerful computer could download the entire 640GB model onto their own machine and run it independently. The MIT license permits derivative work such as distillation, which is an attempt to reduce the size of the model so it works on less powerful computers, while attempting to retain as much intelligence as possible [10]. It also permits further training to increase the model's knowledge [11].

Since OpenAI did not open source their model, it would not be possible for others to perform distillation or further training on their model in a manner similar to DeepSeek. Futhermore, users are also subjected to any future business decisions from OpenAI such as increase in cost of tokens or retiring old models. Lastly, there is also a lack of security. Users are sending their queries to OpenAI's service and do not have any foolproof means to prevent OpenAI from using that data for other purposes.

[1] Baker E and SR, Open-Source Software (Practical Law (Westlaw) 2024) https://uk.practicallaw.thomsonreuters.com/6-376-6421, pp. 6

[2] ibid, pp. 9

[3] ibid, pp. 3

[4] ibid, pp. 12

[5] ibid, pp. 20

[6] ibid, pp. 16

[7] Baker E and SR, Open-Source Software Governance (Practical Law (Westlaw) 2023) https://uk.practicallaw.thomsonreuters.com/3-501-0318, pp. 1

[8] OpenAI, 'LICENSE' https://github.com/openai/openai-openapi/blob/master/LICENSE accessed 9 Feb 2025

[9] DeepSeek, 'DeepSeek R1' https://huggingface.co/deepseek-ai/DeepSeek-R1 accessed 9 Feb 2025

[10] ibid

[11] ibid