Reinforcement Learning for Legal Drafting

Reinforcement learning involves recording user activity to implement a self-improving system. For example, if a user rejects a suggestion made by a language model, it records the output and assigns a negative score to it. And vice versa for a useful suggestion.

A reinforcement learning system for document drafting optimizes for the probability of making a suggestion that is accepted by the user (and is therefore, probably, useful). It reverse-engineers what makes a suggestion likely to be accepted, rejected or modified using the relevant data. Critically, it should also know when not to make a suggestion.

Reinforcement learning can be used to make legal drafting more adaptive. Over thousands of interactions, the system learns to prioritize the phrasing and structure that align with the drafting standards, which correspond to (or, are weighted by) the practice area, jurisdiction, and type of document that is being drafted. This ensures that our suggestions intrinsically incorporate these details, rather than relying on being able to detect all relevant context, and cram it into a language model input.

To tweak the output, the language model workflow needs adjustment. You can directly train the model, or "fine-tune" it, and this can improve performance for specific tasks. However, this can also lead to a loss of general performance in other areas, and a very large quantity of data is required to effectively train the frontier models used for most tasks.

Reinforcement learning can tune for macro characteristics (like practice area), or micro characteristics (like the drafting style of an individual). In either case, reinforcement learning can significantly improve the alignment of our language model output, and is a major step towards a custom solution for legal document drafting.