Chapter 2 — AI Life Cycle Risk Management — Part C: AI Implementation, Maintenance, and Decommissioning

On this page

2.10AI Deployment and Implementation
2.11Robustness and Scalability Considerations
2.12Monitoring and Managing Model Drift
2.13Change Management in AI Systems
2.14Decommissioning AI Solutions

Part C: AI Implementation, Maintenance, and Decommissioning

After the AI model is properly trained and fine-tuned, it is ready for deployment into production and to be made available for use by the enterprise. However, oversight of the solution does not end with implementation and deployment. Ongoing maintenance of the system ensures the model and the solution continue to perform as expected and meet the goals of the organization. If necessary, further refinements should be made to the model. It is also important to determine when an AI model or solution has reached its end-of-life and needs to be decommissioned.

2.10 AI Deployment and Implementation

Deployment into live production involves piloting, checking compatibility with legacy systems, ensuring regulatory compliance, managing organizational change, conducting predeployment security assessments (e.g., penetration testing), and evaluating user experience. Actions taken during this phase help ensure that an AI solution is implemented per the previously established plan.

Piloting is a critical step in the deployment process, allowing the AI solution to be tested in a controlled environment before full-scale implementation. This phase helps identify potential issues, assess the solution’s performance, and gather feedback from users. It serves as a proof of concept, ensuring that the solution meets the desired requirements and can operate effectively under real-world conditions.

Compatibility with existing systems is another crucial aspect to consider. AI solutions often need to be integrated with an organization’s infrastructure and software. Ensuring interoperability with existing systems helps avoid disruptions and increases the solution’s overall usability. Successful integration may involve technical adjustments, custom integrations, or even updates to existing systems.

Compatibility and integration considerations should extend to third parties and downstream deployers of an AI solution. Providing clear and detailed instructions to those responsible for the deployment ensures that the AI solution is implemented correctly and efficiently. The instructions should cover all aspects of deployment, including installation, configuration, troubleshooting, and ongoing maintenance.

Detailed technical documentation not only helps with the issuance of instructions to third parties and downstream deployers but also demonstrates compliance. The documentation should be created throughout the development process and retained/maintained throughout the life cycle of the AI solution. Documentation can include instructions, risk assessment, regulatory compliance mapping, etc.

Throughout the entire process, the implementing organization should ensure that it manages organizational change using its established change management process. This includes staff training considerations to ensure new roles and responsibilities are understood and proper awareness/buy-in is achieved. Throughout the process, user experience should be monitored and evaluated through established feedback structures that trigger actions by AI solution owners to make applicable changes. Effective communication and change management strategies help ensure that the transition to a newly implemented AI solution is smooth and successful.

2.10.1 APIs and AI Solutions

While not a new technology, APIs gained popularity and interest with the rise of applications and e-commerce in the early 2000s.¹¹⁶ With the rise of AI technologies, including GenAI and AI agents, APIs have become essential in most AI deployments. APIs are used to connect AI with the critical data and systems it needs to access to make decisions and complete tasks. They serve as bridges that send credentials to the AI solution and systems they need to connect to. They connect chatbots to customer data, ensuring that the chatbot has access to important details, like account history or inventory databases.¹¹⁷ APIs can increase the reliability and accuracy of AI results, especially in text or sentiment analysis,¹¹⁸ but weak API key management may expose sensitive AI functions. In short, APIs enable natural and context-driven interactions between AI systems and humans.

Risk associated with the use of APIs for AI implementations include:¹¹⁹

Data leakage and network security—APIs serve as entry points to systems and software. Likewise, they also transmit sensitive data across the network. This can leave the network and connected systems and software vulnerable. Using secure by design and privacy by design principles can ensure that these concepts are considered when designing APIs and AI solutions. A zero trust approach to authentication, authorization, and security can also safeguard the system.
Ethical concerns—As with an AI-enabled technology, ethics must be considered when using APIs to transmit data. The use of APIs can complicate explainability considerations when a biased or inaccurate decision is made, as it is more difficult to understand how the AI makes its decision.
Technical considerations—AI and APIs require different skillsets from developers. Ensuring the team has the right balance of API and AI skills is crucial.

2.11 Robustness and Scalability Considerations

Ensuring the robustness and scalability of AI systems is critical to maintaining their reliability, security, and effectiveness as usage grows or operational requirements evolve. Robustness refers to an AI system’s ability to maintain performance and safety despite unexpected inputs, environmental changes, or adversarial attacks. Scalability addresses the capacity of AI solutions to handle increasing workloads, data volumes, or complexity without degradation in service quality or security.

2.11.1 Robustness Against Failures and Attacks

Robust AI systems are designed to anticipate and manage unexpected conditions, including erroneous or malicious data inputs. Key techniques to enhance robustness include:

Data input validation and sanitization—Implementing strict validation checks and sanitization processes can help prevent data input attacks such as prompt injections, which can manipulate AI outputs or cause system failures. This is especially important for GenAI models that interact with user-generated prompts.
Resource management controls—During inference, AI systems should enforce limits and throttling mechanisms to avoid infinite processing loops or skewed resource consumption that could lead to denial of service (DoS) conditions.
Error handling and safe failure—Proactive error detection and handling mechanisms enable AI systems to fail safely when outputs do not meet defined accuracy, precision, recall, trust, or safety metrics. Safe failure prevents the generation of harmful or misleading outputs and supports system resilience.
Adversarial testing—Regular adversarial tests, including token manipulation and jailbreak attacks, help identify vulnerabilities and improve the system’s resistance to sophisticated threats. See 2.7.2 Adversarial Training for more information.
HITL oversight—Incorporating human judgment at critical decision points ensures that AI outputs are validated and verified, reducing the risk of undetected errors or unethical outcomes.
Monitoring latency and uptime—Continuous monitoring of latency and uptime helps ensure system availability.

These robust measures contribute to the overall trustworthiness of AI systems by minimizing the risk of operational failures and security breaches that could impact users or the organization.

2.11.2 Scalability of AI Solutions

As AI adoption expands, systems must scale effectively to accommodate increased demand and evolving business needs. Effective scalability planning ensures that AI systems continue to deliver value and maintain security and performance standards as organizational demands increase. See 2.1.2 Scalability Considerations for AI Solutions for more information.

2.12 Monitoring and Managing Model Drift

Drift detection is a core component of model life cycle management, ensuring that deployed models remain relevant and aligned with evolving business or regulatory needs. Model drift refers to the degradation of ML model performance due to changes in data or in the relationships between input and output variables. Model drift can negatively impact model performance, resulting in faulty decision making and bad predictions. Drift can also result from external factors such as changing societal norms, regulatory updates, or shifts in economic conditions that alter the relevance of training data.

2.12.1 Identification of Model Drift

Risk practitioners should be familiar with the different kinds of drift that can affect a system. Concept drift focuses on the relationship between input and output changes, whereas data drift focuses solely on input changes. Regular monitoring of AI systems’ performance and trustworthiness enhances organizations’ ability to detect and respond to drift and thus sustain an AI system’s value once deployed. Ensuring a HITL is part of monitoring outputs helps to address these concerns proactively. There are AI drift detecting and monitoring tools that automatically detect when a model’s accuracy decreases (drifts) below a preset threshold.¹²⁰

To ensure that models behave as expected, it is important to understand how a model makes decisions. It is also essential to set quantitative thresholds (e.g., model accuracy below 85%) that trigger retraining or review protocols to address drift promptly. Questions to ask when monitoring for model drift include:

Which features are important for predicting?
What is the relationship between the input features and the target predictions?
Did the model learn anything unexpected?
Does the model specialize or learn something from a specific training data segment?
Does the model generalize?

Upon detection of drift, organizations must have predefined protocols to mitigate its impact and restore model accuracy. Common mitigation strategies include:

Model retraining—Updating the model with new, representative datasets that reflect current data distributions and patterns. Retraining can be scheduled periodically or triggered automatically when drift thresholds are exceeded.
Model adjustment or fine tuning—Modifying model parameters or architectures to better capture evolving data characteristics without full retraining
Data preprocessing updates—Revising data normalization, feature engineering, or input validation steps to align with changes in data sources or formats
Human oversight and validation—Engaging domain experts to review model outputs and retraining results to ensure alignment with business objectives and ethical standards
Integration with change management—Documenting drift events and mitigation actions within the AI change management framework to maintain traceability and track updates, improvements, or patches applied to the model over time

2.13 Change Management in AI Systems

Conventional software applications and AI systems share common goals and objectives to ensure that:¹²¹

All parties involved with and impacted by the change are informed of what is happening.
Impact on the quality and availability of the service is minimal.

These shared goals and objectives are met through shared commonalities for controlled and structured change processes and procedures, the role and importance of testing, rollback plans, and stakeholder management practices.

AI solutions bring with them new considerations and additional controls specific to AI change management risk should be implemented. AI-specific risk includes retraining models, updating parameters, or changing data pipelines.

2.13.1 Data Dependency

AI fundamentally depends on data to produce its core functionality and instructions. This makes AI systems very sensitive to changes in both the underlying data on which they are trained to make inferences and changes in the data sources and preprocessing. The training is curated with very specific requirements for its structure, quality, and relevance. Therefore, the model expects specific data attributes as inputs for inference; as a result, unexpected changes in the data input attributes could create problems for the AI system.

Organizations should document the data input requirements a model needs to ensure alignment with the data inputs fed to the model in production. Changes to the data input may result in additional preprocessing to standardize and normalize the data input within expected tolerances. Changes to the data source and data preprocessing steps could require additional attention to minimize the impact on AI system performance.

Changes may occur that an organization did not initiate or control. These include changes to the underlying data due to real-world circumstances, such as seasonality, user preferences, or introduction of new data classes, that can impact the AI system and model performance. It is important for the organization to include data drift monitoring as part of a robust AI change management program.

2.13.2 AI Model Changes

The model or models used to make correlations or predictions, generate content, or provide other outputs are at the core of an AI system. The behavior of an AI model can be modified or tuned based on parameters during an inference. For example, GPT4 models allow for post-training to modify parameters like temperature and penalties. This single factor can dramatically change the variances a model allows to output results and provide “creative” answers. In conventional software applications, these variances might be variables that the software allows the user to set.

AI system behavior can be altered by changing the underlying model or models. Some AI systems may use only one AI model, but others use several models to accomplish the system’s overall goal. As AI researchers develop and train new model versions, models can be swapped in and out. In this situation, the output of a model can be dramatically altered. The opaqueness of AI models can vary significantly from one to another, based on data sources, training, or even responsible and ethical practice differences between AI developers of models.

Governance of the underlying AI model(s) should be included as part of any organizational change management program.

2.13.3 Regulatory and Societal Impact

Expansion of AI over the past few years has resulted in increased visibility and scrutiny. Regulations are only beginning to catch up to the advancement of AI technology and its application and impact on human lives. The regulatory environment continues to evolve around the understanding and use of prescribed guardrails to ensure the safety of AI. Some industries are more heavily affected than others by the use, adoption, and regulatory enforcement of AI.

Society has both embraced and eschewed the use of AI, and opinions continue to shift as individuals acclimate. Some fear the potential displacement of the workforce due to AI. While it is too early to tell if widespread adoption of AI will remain, the patterns of technological innovation through the decades have exhibited some form of material impact.¹²² For example, the rise of the personal computer eliminated clerical work but also created jobs in the new IT industry. One study noted that up to 3.5 million jobs were eliminated in the United States over the decades since the introduction of the PC and the rise of the internet, but more than 19 million jobs in computer-related industries were created.

Organizations should be advised to include workforce impact analyses in their AI system deployment and change management processes. Changes include reskilling employees to leverage AI and rethinking what work is to be done and how, which could lead to new career opportunities. Employees may need support and time to transition to new jobs, as workers did when prior technological innovations disrupted the status quo.

Change management practices should regularly monitor the rapidly evolving landscape over AI regulations and shifting societal sentiment regarding the use and adoption of AI. Incorporation of these external factors may influence an organization’s development and use of AI over time.

2.13.4 Emergency Changes

When a bug or vulnerability is found in a conventional software application, the developer can create an emergency fix or patch to address the specific bad section of code. Urgent changes to an AI system cannot be made as easily. The training of a new AI model may take hours to weeks. That kind of timeframe may not fit well into the hours to days allotted to address critical bugs or security vulnerabilities expected in conventional software applications. Emergency rollbacks should be tested periodically to ensure feasibility.

To address urgent or critical changes, an organization will need to consider alternatives, such as:

Rollback to an earlier AI model—If the discovery of a critical issue is correlated with a model change, the strategy to rollback to an earlier model could work, assuming the problem did not exist in the training dataset of the previous AI model.
Implement input/output validation—While the model may not be easily changed, the data input preprocessing and output postprocessing could be implemented to shield the model from problematic input and inoculate against bad output. This could be accomplished with additional software implemented outside of the AI model on a timely basis.

Regardless of the solution selected, key stakeholders should review and approve all emergency changes. The approval process should mimic change management approval protocols that already exist in the organization.

2.13.5 Configuration Management

Configuration management can look very different for an AI system, due to some of the inherent differences that separate AI from conventional software applications. Tracking key configuration items and persisting in set values are important in both situations. AI systems have configurable postproduction parameters that should be tracked and monitored for deviations from the standard using tools such as MLflow or Kubeflow. AI systems and models are very sensitive to data changes. Maintaining consistency between the data preprocessing during the training phase and the data preprocessing during the postproduction inference stage is essential for reliable and consistent model performance. Beyond the configurable items, the same tokenization framework should be used in training and postproduction.

2.13.6 Model Maintenance and Continuous Learning

Continuous learning enables AI systems to adapt to new information and changing conditions without requiring complete redevelopment. This approach supports the model’s ability to remain current and effective in dynamic operational contexts. However, continuous learning must be carefully managed to avoid unintended consequences, such as the introduction of bias or overfitting to recent data. Continuous learning must be monitored for unintended reinforcement of biases.

A key issue addressed by model maintenance is model drift, which occurs when the statistical properties of input data or the relationship between inputs and outputs change over time, leading to reduced model performance. Monitoring for drift and performance degradation is essential to trigger timely retraining or adjustment of the model and maintain its predictive quality. See 2.12 Monitoring and Managing Model Drift for more information.

Effective model maintenance also requires governance practices, including version control, documentation of changes, and integration with organizational change management processes. This ensures that updates are traceable, tested, and aligned with risk management frameworks. Additionally, robust access controls must be maintained to protect the model and its training data from threats such as data poisoning or unauthorized modifications. See 2.11.1 Robustness Against Failures and Attacks for more information.

2.14 Decommissioning AI Solutions

Decommissioning an AI solution marks the formal conclusion of its life cycle and requires careful planning and execution to ensure a safe and orderly phase-out. This process extends beyond simply shutting down the AI model’s inference service; it involves addressing derivative models and related components that may persist within the environment.

Reasons for decommissioning an AI solution include:¹²³

The solution reaches its end-of-life.
Identified risk exceeds enterprise risk tolerance or appetite.
Risk mitigation for identified or emerging risk is not feasible given the enterprise’s current capacities or regulatory requirements.

A comprehensive decommissioning plan should be developed leveraging the organization’s established change management framework. This plan must clearly outline the steps necessary to phase out the AI solution safely, including considerations for data migration or deletion, integration with other systems, and continuity of business operations during the transition period. Effective communication with all relevant stakeholders is critical to set expectations, manage concerns, and minimize operational disruption throughout the decommissioning process.

Key elements of a robust decommissioning process include:¹²⁴

Planning—Define the scope and timeline for retirement, identify dependencies and downstream impacts, and establish roles and responsibilities for personnel involved in the decommissioning activities.
Communication—Engage stakeholders early and continuously to inform them of the decommissioning schedule, potential impacts, and mitigation strategies. This includes users, business units, IT teams, and external partners, as applicable.
Execution—Implement the phased shutdown of AI components, ensuring that data is securely migrated or disposed of in accordance with organizational policies and regulatory requirements. Maintain operational continuity by coordinating with integrated systems and processes.
Risk management—Monitor and address any risk arising from the decommissioning, such as data loss, service interruptions, or compliance gaps. Document lessons learned to improve future decommissioning efforts.
Archiving—A comprehensive decommissioning plan must include provisions for archiving the final production model, its documentation, and its final performance metrics. This is important for regulatory inquiries, legal discovery, and historical accountability, ensuring the organization can explain a model’s decisions years after it has been retired.

By proactively managing the retirement of AI solutions, organizations can optimize their technology investments, reduce exposure to obsolete or unsupported systems, and maintain compliance with applicable legal and regulatory obligations. Establishing clear policies and procedures for AI decommissioning also supports accountability and helps preserve organizational trustworthiness during system transitions.

2.14.1 Data Disposition and Security in Decommissioning

The decommissioning of AI systems necessitates careful management of all associated data to ensure compliance with regulatory requirements, maintain data privacy, and mitigate security risk. Data disposition during AI system retirement involves secure handling, migration, or deletion of data used by or generated from AI models, aligned with organizational policies and legal obligations. Figure 2.10 outlines some data disposal considerations for AI.

Figure 2.10—AI Data Disposal Considerations

Consideration	Description
Data handling and migration	Security and access controls should be applied to datasets, artifacts, and outputs created by the artificial intelligence (AI) solution being decommissioned. Encryption can help reduce data leakage until the data can be properly disposed of.
Data retention	Clear data retention policies should be established and followed as per the classification level assigned to the data.
Data destruction	Data destruction must be conducted in accordance with established retention schedules and organizational data governance frameworks. Secure deletion techniques should be employed to irreversibly remove data that is no longer required, minimizing residual risk from forgotten or orphaned data. Checkpoints should be removed from active storage and backups; cryptographic erasure should be documented where physical deletion is not possible. Documentation of data destruction activities is necessary to provide audit trails and demonstrate compliance.
Governance and accountability	Effective data disposition requires governance structures that oversee the decommissioning process, including policies addressing ancillary data such as model predictions, explanations, intermediate feature representations, and credentials. Organizations should track accountability for data handling decisions and consider the downstream impacts on individuals, groups, and communities affected by AI system retirement. AI ethics review boards may provide oversight on decommissioning impacts. Transparent communication with stakeholders and adherence to ethical frameworks support trust and compliance.