BLOG

We Built An AI For GSA, And Then We Turned It Loose On Some Software License Agreements

post-thumbnail
September 23, 2020
John Janek

Dev Technology posted a press release yesterday about our 1st place win in the GSA Artificial Intelligence and Machine Learning Challenge. In the following article, reposted from LinkedIn, I discuss why our solution matters and how it might be used. View the original article here.

In the government technology space it is impossible to take two steps (or clicks in our remote-first world) without running into some comment about how AI is going to change everything. Across the private sector advanced computing, including AI, is having a huge impact. AI today powers your virtual backgrounds, your finances, your retirement, likely your phone, and perhaps even your car. In fact, if there is a problem with AI today it is simply that as a collective grouping of technology it is far too broad to be so casually lumped together.

While there have been efforts to bring established and emergent AI into government, one of the most interesting places where AI can have a tremendous role to play is in the effectiveness of the federal civilian workforce that executes the work on behalf of the taxpayer. Whenever you see AI put into the role of accelerator and assistant, moving low-value effort to high-value effort, there is tremendous follow-on advantages to the individuals who do the work, the office deploying the AI, and ultimately the taxpayer getting more value.

In July, GSA put together a unique challenge asking those of us outside government to use their data to improve the government contracting experience. How might the government better assess software license agreements and their acceptability to the government using AI and Machine Learning?

What does the government care about EULAs?

The End User License Agreement, or EULA, is typically the blurb we all just blindly accept. For years it has been a dirty little secret in the tech industry. The EULA is where users give up their data rights, invite Big Tech into their lives, even agree to settle disputes through remediation. In fact, if there’s a place where you’ve probably agreed to things you never would have said yes to in any normal circumstance, it was clicking the “I agree” button to get to the next page of whatever app you were signing up for.

If it is a problem for consumers, you can imagine it is a huge problem for larger organizations. Especially organizations like the government who use that software to deal with the private information of millions of citizens daily. Assigning data rights that might be specifically prohibited by law creates all sort of serious problems and legal wrangling. The whole issue is made even worse by the fact that every site, every app, every update brings with it new language, and it is nearly impossible to keep any sort of reasonable pace with tech while waiting for legal to review agreements.

With the problem clearly understood and the challenge posted with some great guardrails, it was time for the Dev Technology DevLab to spin up a team to response. Led by our Technical Director for AI/ML, Josh Powers, the team put together a great solution. The prototype, called EULA Check, works by entering some EULA language via text, PDF, or Word document, and based on a robust training data set provided by GSA, EULA Check will give you some great insights into whether it might be acceptable or not to the government.

One important note: EULA Check is not a lawyer or contracting officer. It is a computer using data provided by GSA to give you a basic interpretation of inputs and should not be construed in any way to represent legal or contracting advice. Use it to supplement subject matter expertise or just for fun.

The shakedown cruise

Of course, it isn’t possible to create a value-add like EULA Check and not want to immediately start throwing things at it. What licenses should we check? Where should we start? Some proprietary licenses written by very big companies went into the machine and a lot of it came back with more red than green. There had to be a license which is perfectly acceptable based on the government’s model. As it happens, there is: the MIT License.

The MIT License (https://www.mit-license.org) is perhaps one of the most famous EULAs ever invented because it is simple, open, and to the point. Plugging the MIT License into EULA Check produced exactly the results expected: green across the board, completely acceptable.

This of course, triggered a whole other line of thinking: “How do the most popular open source licenses look to our tool?”

EULA Check has, out of the box, two great features. The first is the web interface. Even if running it on a local workstation it is fast and easy to use. You can copy and paste text right into the interface or feed it entire PDF or Word documents. It automatically parses and analyzes the text.

The other way to use EULA Check is via the command line, and this is where it becomes a very powerful tool for crunching raw data. Load up a bunch of documents in a directory and EULA Check parses them all out, creating a single json file. With a little python magic, it is easy to flip it from json to excel, and get a lot of data, easily accessible and ready to consume, pass around, or put into a presentation.

From this json output:

To this formatted excel spreadsheet:

It took about five minutes to download a selection of the most popular Open Source Software licenses, and less than a minute for EULA Check to produce an interpretation of the data.

The TL;DR results

EULA Check parsed and analyzed 7 Open Source Software licenses. Here are the results:

It should be noted that a clause as determined by the software may be a preamble or single sentence. Anything which largely is whole and complete.

The great news is that many of the open source licenses are ready for use in government today. They don’t contain any of the traps, assignments, flow downs, or associations that create legal problems for the government. For the clauses that did cause problems, a critical analysis shows the likely culprit. For example, the GPLV3 license assigns responsibility to the licensee to ensure enforcement and flow down and is technically text listed outside of the main license text (i.e. it was copied from the web page). The government doesn’t typically like to play that role, as the transfer of rights and responsibilities is largely outside of the scope of the government’s purview as an acquisition mechanism.

Wow, this is amazing! How can I do this?

EULA Check is online, if you’d like a demo of the web interface you can reach out to joshua.powers@devtechnology.com and set up a discussion. EULA Check will parse the first ten clauses of anything you give it. If you want to tackle more exhaustive analysis you can grab the full source code as part of the GSA submission on GitHub here: https://github.com/GSA/ai-ml-challenge-2020/pull/17. Everything, including an excellent tuned RoBERTa model is part of that submission and out in the open.

Of course, if you just want to see the Open Source License data, json, and xlsx, you can go here: https://github.com/DevTechnology/eulacheck_analysis_public.

Give it a try today! We’d love to see what you come up with, what questions you have, and where you want to go next. Download our solution, set it up, run your own analysis and push the json files to the repo. If enough people respond, a follow-up blog might be in order.

Special thanks to…

Joshua Powers, Sherri Elliott, Niroop Gonchikar, Michael Oduyebo, and Zach Lawrence for their hard work and GSA for putting together a great, fun challenge.

Epilogue…about those proprietary licenses

While previewing this blog with some colleagues it was brought up that without some contrast, the advantage of the open source license might end up a buried lead. Given this training data set that EULA Check uses is based on thousands of evaluations by real GSA contracting specialists, we decided that it was useful to analyze some proprietary examples as well for context.

The Apple MacOS X “Catalina” and 2017 Microsoft Windows EULA were used as our test use cases. Here are the results.

At a glance, these two flagship proprietary licenses are longer than most of their open source counterparts. They also flagged significantly more unacceptable clauses. Again, it is important to note that this isn’t an expert analysis of the text, rather it is the application of the software to determine patterns which are likely problematic based on previous assessment. At the bare minimum, if you want software that the government is least likely to challenge, using an Open Source Software license may be the way to go.

avatar

John Janek

Dev Technology

Chief Technologist