Abstract:Generative AI holds the promise of enabling a range of sought-after capabilities and revolutionizing workflows in various consumer and enterprise verticals. However, putting a model in production involves much more than just generating an output. It involves ensuring the model is reliable, safe, performant and also adheres to the policy of operation in a particular domain. Guardrails as a necessity for models has evolved around the need to enforce appropriate behavior of models, especially when they are in production. In this paper, we use education as a use case, given its stringent requirements of the appropriateness of content in the domain, to demonstrate how a guardrail model can be trained and deployed in production. Specifically, we describe our experience in building a production-grade guardrail model for a K-12 educational platform. We begin by formulating the requirements for deployment to this sensitive domain. We then describe the training and benchmarking of our domain-specific guardrail model, which outperforms competing open- and closed- instruction-tuned models of similar and larger size, on proprietary education-related benchmarks and public benchmarks related to general aspects of safety. Finally, we detail the choices we made on architecture and the optimizations for deploying this service in production; these range across the stack from the hardware infrastructure to the serving layer to language model inference optimizations. We hope this paper will be instructive to other practitioners looking to create production-grade domain-specific services based on generative AI and large language models.

What problem does this paper attempt to address?

This paper attempts to solve the problem of how to ensure that generative AI models operate reliably, safely, and efficiently in specific fields (such as education) and comply with operating policies when they are deployed in the production environment. Specifically, the focus of the paper is as follows: 1. **Ensuring the safety and appropriateness of the model**: Especially in the sensitive field of education, the appropriateness and safety of content are crucial. For example, data privacy regulations (such as FERPA and COPPA) must be adhered to to ensure that the content is harmless and suitable for students of different age groups. 2. **Establishing an efficient guardrail model**: The guardrail model is used to ensure that the behavior of generative AI conforms to the norms and requirements of specific fields. This includes defining clear requirements and expectations to guide the training and deployment of the model. 3. **Optimizing model performance**: In order to achieve low latency and high throughput in practical applications, the paper explores how to improve model performance by means of optimizing the model architecture, reducing parameters, and minimizing inference costs. 4. **Meeting service - level agreements (SLA)**: The paper describes how to ensure that the model meets strict performance indicators in the production environment, such as requests per second (QPS), latency, and availability. ### Main contributions - **Requirement definition**: The requirements of the safety and appropriateness system are clearly defined in order to provide appropriate judgments for text inputs of different lengths. - **Model optimization**: A method for optimizing large - language models (LLM) is proposed to make them more suitable for production use and perform well in education - related benchmark tests. - **Deployment optimization**: It is studied how to efficiently deploy LLM on GPU to meet different service - level agreements (SLA) and strike a balance between computing resources and performance. ### Conclusion By constructing and deploying a production - level guardrail model for the K - 12 education platform, the paper shows how to ensure the safety and appropriateness of generative AI in practical applications while maintaining high performance and reliability. This work has important reference value for other practitioners who hope to develop and deploy generative AI systems in specific fields.

Building a Domain-specific Guardrail Model in Production

NetGuard: Protecting Commercial Web APIs from Model Inversion Attacks Using GAN-generated Fake Samples

Building Guardrails for Large Language Models

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

Challenges in Guardrailing Large Language Models for Science

Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings

PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing

Enhancing Guardrails for Safe and Secure Healthcare AI

The Need for Guardrails with Large Language Models in Medical Safety-Critical Settings: An Artificial Intelligence Application in the Pharmacovigilance Ecosystem

Current state of LLM Risks and AI Guardrails

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Benchmarking LLM Guardrails in Handling Multilingual Toxicity

HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Safeguarding Large Language Models: A Survey

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

ML-On-Rails: Safeguarding Machine Learning Models in Software Systems A Case Study

Conditioning Predictive Models: Risks and Strategies