![]() ![]() This generated quite good results, in part because the first 1024 characters of a law typically include its title, which tends to include a number of keywords that the algorithm can use. Our approach was to use the first 1024 characters of a law to classify it. After a short wait for it to process the data, the API was now ready to accept queries. The training set was simply entered into a spreadsheet, and then uploaded to the IBM Watson NLU API. Including them would have risked training the algorithm on keywords that would look very specific but have nothing to do with the category on hand. This meant avoiding words such as specific names of countries or people, or dates. Care had to be taken to take chunks of text that were specific enough to train the NLP algorithm about the domain the law refers to, while being generic enough to not over-train the algorithm. This was the most complicated and labor-intensive part of the implementation process. We then went through each of those laws and picked out chunks of text that we thought were relevant to the category. Sebastian: Well, we went through our database to find several laws that we thought were representative of each category, from finance to cybersecurity to environment-based laws. ![]() Sean: Interesting, so what did you do next? ![]() This does exactly what we want out of the box: it allows us to upload training data and then to classify natural language text based on that. IBM suggested that we try out the IBM Watson Natural Language Understanding (NLU) API. We needed a system that could analyze and process our text data, and then categorize it in preset bins. The problem is that very few of our sources provide any sort of categorization metadata, and those that do all use slightly different categories, so simply grabbing this data during indexing was out. With so many laws in our database, discoverability is always an issue, so we thought this could be a great feature to add to our site. Sebastian: Recently one of our clients approached us about adding categories to our law metadata, in order to make it easier for them to find the laws that are relevant to their business use case, of monitoring specific types of laws (such as those in healthcare and cybersecurity) to maintain regulatory compliance. Do you have any recent examples to share about how the team is using Watson? We do all of this with a very small team, and none of it would be possible without the amazing AI-powered cloud services provided by the Watson platform. We help make laws searchable and accessible in English. We index, process, and translate nearly 2 million laws from nearly 100 countries, from Brazil to China to France to Italy and more, using machine translation. Sebastian: At Global-Regulation, it is our mission to democratize access to laws from across the globe. Sean: Sebastian, tell us about Global-Regulation and what your team does. I recently spoke with CTO Sebastian Dusterwald to discuss how Global-Regulation uses Watson NLP technology to translate laws into English. In the legal services industry, Global-Regulation is using NLP and machine translation to build the most comprehensive world law search engine. ![]()
0 Comments
Leave a Reply. |