Glossary

Note

Because term definitions may evolve due to the changing technological environment, please see www.isaca.org/glossary for the most up-to-date terms and definitions.

A

Accuracy: The fraction of predictions that a classification model predicted correctly. In multiclass classification, accuracy is defined as correct predictions divided by total number of examples. In binary classification, accuracy is defined as true positives plus true negatives divided by total number of examples.
Active learning: A training approach in which the algorithm chooses some of the data from which it learns. Active learning is particularly valuable when labeled examples are scarce or expensive to obtain. Instead of blindly seeking a diverse range of labeled examples, an active learning algorithm selectively seeks the specific range of examples that it needs for learning.
AI Agent: In artificial intelligence (AI), an autonomous program or system designed to perceive and interact with its environment to make decisions and take actions to achieve specific goals. Also referred to as an intelligent agent.
AI designer: A professional who creates and implements artificial intelligence solutions
AI observability: The practice of monitoring and analyzing models to ensure they are reliable, effective, and correct
Algorithm: A finite set of well-defined, unambiguous rules for the solution of a problem in a finite number of steps. It is a sequence of operational actions that lead to a desired goal and is the basic building block of a program
Artificial general intelligence: A type of AI characterized by its capability to undertake any intellectual task that a human can perform, such as sensory perception, fine motor skills, problem solving, navigation, natural language understanding, creativity, and social-emotion engagement
Artificial intelligence (AI): An advanced computer system that can simulate human capabilities, such as analysis, based on a predetermined set of rules
Artificial narrow intelligence: A form of AI that is limited to a specific domain or area of knowledge and simulates human cognition
Average precision: A metric for summarizing the performance of a ranked sequence of results. Average precision is calculated by taking the average of the precision values for each relevant result in a ranked list (each result in the ranked list where the recall increases relative to the previous result).

B

Backpropagation: An algorithm for iteratively adjusting the weights used in a neural network system. Backpropagation is often used to implement gradient descent.
Bayes’ Theorem: An equation for calculating the probability that something is true if something potentially related to it is true. If P(A) means “the probability that A is true” and P(A|B) means “the probability that A is true if B is true,” then Bayes’ Theorem tells us that P(A|B) = (P(B|A)P(A)) / P(B).
Bayesian network: Graphs that compactly represent the relationship between random variables for a given problem. These graphs aid in reasoning or decision-making in the face of uncertainty. These networks are usually represented as graphs in which the link between any two nodes is assigned a value representing the probabilistic relationship between those nodes.
Bias: In machine learning (ML), systemic errors or distortions either in algorithms, data, or models that lead to prejudiced results
Boosting: A machine learning technique that iteratively combines a set of simple and not very accurate classifiers (referred to as “weak” classifiers) into a classifier with high accuracy (a “strong” classifier) by upweighting the examples the model is currently misclassifying

C

Calibration layer: A post-prediction adjustment typically used to account for prediction bias. The adjusted predictions and probabilities should match the distribution of an observed set of labels.
Checkpoint: Data that capture the state of the variables of a model at a particular time. Checkpoints enable exporting model weights and performing training across multiple sessions. Checkpoints also enable training to continue past errors (e.g., job preemption).
Chi-square test: An analysis technique used to estimate whether two variables in a cross-tabulation are correlated. A chi-square distribution varies from normal distribution based on the “degrees of freedom” used to calculate it.
Classification: The identification of two or more categories in which an item belongs
Clustering: An algorithm for dividing data instances into groups—not a predetermined set of groups, but groups identified by the execution of the algorithm because of similarities found among the instances. The center of each cluster is known as the “centroid.”
Coefficient: A number or algebraic symbol prefixed as a multiplier to a variable or unknown quantity (e.g., x in x(y + z), 6 in 6ab)
Confidence interval: A range specified for an estimate to indicate margin of error, combined with a probability that a value will fall in that range
Continuous feature: A floating-point feature with an infinite range of possible values; contrasts with discrete feature
Continuous variable: A variable whose value can be any of an infinite number of values, typically within a particular range
Convenience sampling: The use of a dataset that is not gathered scientifically in order to run quick experiments. Later on, it is essential to switch to a scientifically gathered dataset.
Convergence: A state reached during training in which training loss and validation loss change very little or not at all with each iteration after a certain number of iterations
Correlation: The degree of relative correspondence between two sets of data. The correlation coefficient is a measure of how closely the two data sets correlate.
Covariant: A measure of the relationship between two variables whose values are observed at the same time. Whereas variance measures how a single variable deviates from its mean, covariance measures how two variables vary in tandem from their means.
Cross-validation: A mechanism for estimating how well a model will generalize to new data by testing the model against one or more nonoverlapping data subsets that are withheld from the training set

D

Data

Representations of facts, concepts or instructions in a manner suitable for communication, interpretation or processing by humans or by automated means. In the simplest terms, data are pieces of information. (ISACA)
Qualitative or quantitative-based information that can be recorded, communicated and analyzed (CMMI)

Data analysis

Obtaining an understanding of data by considering samples, measurement and visualization. Data analysis can be particularly useful when a data set is first received, before the first model is built, and is crucial for understanding experiments and debugging problems with the system.

Data augmentation

Artificially boosting the range and number of training examples by transforming existing examples to create additional examples

Data frame

A popular data type for representing data sets in pandas. A data frame is analogous to a table. Each column of the data frame has a name (a header), and each row is identified by a number.

Data mining

The use of computers to analyze large data sets to look for patterns that assist people in making business decisions

Data science

A new branch of science used to extract knowledge and insights from large and complex data sets. Data science work often requires knowledge of both statistics and software engineering.

Data structure

A particular arrangement of data units, such as an array or a tree

Data wrangling

The conversion of data, often through the use of scripting languages, to make data easier to manage

Decision boundary

The separator between classes learned by a model in a binary class or multiclass classification problems

Decision trees

A tree structure to represent a number of possible decision paths and an outcome for each path

Deep learning

A multilevel algorithm that gradually identifies things at higher levels of abstraction, e.g., image classification

Deep model

A type of neural network containing multiple hidden layers

Dependent variable

In artificial intelligence (AI), the outcome predicted by a model, which is influenced by other independent variables

Dimension reduction

A technique to extract one or more dimensions that capture as much of the variation in the data as possible

Dimensionality

In statistics, it refers to how many attributes a dataset has

Discrete variable

A variable whose potential values must be one of a specific number of values. Also known as discrete feature.

Discriminative model

A model that predicts labels from a set of one or more features. A discriminative model defines the conditional probability of an output based on the features and weights.

Downsampling

Reducing the amount of information in a feature to train a model more efficiently

Dynamic model

A model that is trained online in a continuously updating fashion—that is, data are continuously entering the model

F

F1 score: A metric that measures the harmonic mean of precision and recall to ensure that both are equally represented, especially when dealing with imbalanced data classes. F1 score = 2 × (Precision × Recall) / (Precision + Recall)
False negative (FN): An example in which the model mistakenly predicted the negative class
False positive (FP): An example in which the model mistakenly predicted the positive class
Feature: The machine-learning expression for a piece of measurable information about something. For example, if researchers store the age, annual income and weight of a set of people, they are storing three features about them.
Feature cross: A synthetic feature formed by crossing (i.e., taking a Cartesian product of) individual binary features obtained from categorical data or from continuous features via bucketing. Feature crosses help represent nonlinear relationships.
Federated learning: A distributed machine-learning approach that trains machine-learning models using decentralized examples residing on devices such as smartphones
Feedforward neural network (FFN): A neural network without cyclic or recursive connections. For example, traditional deep neural networks are feedforward neural networks.
Few-shot learning: A machine-learning approach, often used for object classification, designed to learn effective classifiers from only a small number of training examples

G

Garbage in, garbage out (GIGO): The concept of data that is nonsensical, or flawed, especially as it relates to the computational sciences
Generalization: The ability of a model to make correct predictions on new, previously unseen data, as opposed to the data used to train the model
Generative artificial intelligence (AI): A branch of artificial intelligence (AI) that, by using models to learn underlying patterns and relationships within data, addresses the creation of new and diverse content, such as images, text, audio, or code
Gradient boosting: A machine-learning technique for regression and classification problems that produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion, like other boosting methods, and generalizes them by allowing the optimization of an arbitrary differentiable loss function.

H

Hallucination: In generative artificial intelligence (AI), the generation of false or misleading information presented as fact
Hidden layer: A synthetic layer in a neural network between the input layer (the features) and the output layer (the prediction). Hidden layers typically contain an activation function (e.g., ReLU) for training. A deep neural network contains more than one hidden layer.
Human in the loop (HITL): A design approach in which humans are empowered to supervise, validate, or override automated or semi-automated systems or processes in order to achieve responsible outcomes
Hyperparameter: A parameter that specifies the details of the learning process

I

Inference: The process of making predictions by applying the trained model to unlabeled examples in machine learning

K

K-means clustering: A data-mining algorithm to cluster, classify or group N objects based on their attributes or features into K number of groups (so-called clusters)
K-nearest neighbors: A machine-learning algorithm that classifies things based on their similarity to nearby neighbors. The algorithm execution is refined by picking how many neighbors to examine (k) and some notion of distance to indicate how near the neighbors are.
Keras: A popular Python machine-learning API

L

Label: In supervised learning, the answer or result portion of an example
Latent variable: A variable that is not directly observed, but rather inferred (through a mathematical model) from other variables that are observed (directly measured)
Linear regression: A mathematical technique to look for a linear relationship when starting with a set of data points that do not necessarily line up nicely. A linear relationship is one in which the relationship between two varying amounts, such as price and sales, can be expressed with an equation that can be represented as a straight line on a graph.
Logistic regression: A model similar to linear regression but where the potential results are a specific set of categories, instead of being continuous

M

Machine learning (ML)

A program or system that builds (i.e., trains) a predictive model from input data

Machine learning model

The model artifact created by the machine learning training process. The process of training a machine learning model involves providing a machine learning model algorithm (that is, the learning algorithm) with training data.

Model

A method to describe a given set of components and how those components relate to each other to describe the main workings of an object, system, or concept
In machine learning (ML), the outcome of a training process. A trained model can automatically process data that was not used for its training to perform a specific set of tasks.

Model card

Files that accompany AI models and provide concise information about the model details (e.g., architecture, training dataset), performance metrics, and even use case limitations

N

N-gram

The analysis of sequences of “n” items (typically, words in natural language) to look for patterns. The value of “n” can be anything. An n-gram is used to construct statistical models of documents (e.g., when automatically classifying them) and to find positive or negative terms associated with a product name.

Naive Bayes classifier

A collection of classification algorithms based on Bayes’ Theorem. It is a family of algorithms that share a common principle that every feature being classified is independent of the value of any other feature.

Neural network

Networks, inspired by the structure of the human brain, that learn by processing data through three layers (input, hidden, and output). They can be trained to match any input to various outputs, including binary ones, making them versatile tools in deep learning (DL) for tasks like image recognition.

Neuromorphic computing

Also known as neuromorphic engineering, an approach to computing that mimics how the human brain works

Neuron

A neural network node that typically takes in multiple input values and generates one output value

Node (neural network)

A neuron in a hidden layer

Noise

Data transmission or data set disturbances, such as static, that cause messages to be misinterpreted by the receiver

Normal distribution

A probability distribution that, when graphed, is a symmetrical bell curve with the mean value at the center. The standard deviation value affects the height and width of the graph. Also known as “Gaussian distribution.”

Normalization

The elimination of redundant data
The process of converting an actual range of values into a standard range of values, typically -1 to +1 or 0 to 1

Null hypothesis

If the proposed model for a data set indicates that the value of “x” affects the value of “y,” then the null hypothesis—i.e., the model compared against the proposed model to check whether “x” really is affecting “y”—will find that the observations are all based on chance and that there is no effect. The smaller the P-value computed from the sample data, the stronger the evidence is against the null hypothesis.

O

Observation: The receipt of messages through electronic, sensory or vibrational signals and the human senses
Offline inference: The process of generating a group of predictions, storing those predictions and then retrieving those predictions on demand
One-shot learning: A machine-learning approach often used for object classification that is designed to learn effective classifiers from a single training example
Outlier: Extreme values that might be errors in measurement and recording or accurate reports of rare events
Overfitting: A model of training data that, by taking too many of the data quirks and outliers into account, is overly complicated and will not be as useful as it could be to find patterns in test data

P

P value: The probability, under the assumption of no effect or no difference (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed
Pandas: A Python library for data manipulation that is popular with data scientists
Perceptron: Neural network that approximates a single neuron with n binary inputs. It computes a weighted sum of its inputs and fires if that weighted sum is zero or greater.
Perplexity: One measure of how well a model is accomplishing its task
Predictive analytics: The analysis of data to predict future events, typically to aid in business planning. Predictive analytics incorporates predictive modeling and other techniques. Machine learning may be considered a set of algorithms to help implement predictive analytics.
Predictive modeling: The development of statistical models to predict future events
Preprocessing: Procedures and techniques for cleaning, transforming, and formatting data to enhance its quality and suitability for analysis and modeling
Principal component analysis: An algorithm that looks at the direction with the most variance and then determines that as the first principal component. This is very similar to how regression works in that it determines the best direction to map data.

Q

Quantum AI: AI solutions that use quantum computing to enhance machine learning and data processing through the use of quantum bits (qubits) to increase processing power

R

Random forest: An ensemble approach to finding the decision tree that best fits the training data by creating many decision trees and then determining the average one. The random part of the term refers to building each of the decision trees from a random selection of features; the forest refers to the set of decision trees.
Recurrent neural network: A neural network that is intentionally run multiple times, where parts of each run feed into the next run
Regression model: A type of model that outputs continuous (typically floating-point) values
Reinforcement learning: A class of machine-learning algorithms in which the process is not given specific goals to meet but, as it makes decisions, is instead given indications of whether it is doing well or not

S

Scalar: A quantity that has magnitude but no direction in space, such as volume or temperature
Scaling: A commonly used practice in feature engineering to tame the range of values of a feature to match the scale of other features in the data set
Sensitive attribute: A human attribute that may be given special consideration for legal, ethical, social or personal reasons
Serial correlation: The relationship between a variable and a lagged version of itself over various time intervals. Repeating patterns often show serial correlation when the level of a variable affects its future level.
Spatiotemporal data: Time series data that also include geographic identifiers, such as latitude-longitude pairs
Strata, stratified sampling: Sampling technique used to divide the units into homogeneous groups (strata) and draw a simple random sample from each group
Supervised learning: A type of machine learning algorithm in which a system is taught to classify input into specific, known classes

T

Tableau: A commercial data visualization package often used in data science projects
Telemetry: The process of recording and transmitting instrument readings
TensorFlow: A large-scale, distributed, machine-learning platform
Test set: The subset of the data set used to test a model after the model has gone through initial vetting by the validation set
Time series data: Time series data have measurements of observations accompanied by datetime stamps
Training: The process of determining the ideal parameters comprising a model

U

Underfitting: When a machine learning (ML) model is too simple to capture the complexities and underlying structure of the data, resulting in poor performance for both the training and text data
Unsupervised learning: A class of machine-learning algorithms designed to identify groupings of data without knowing what the groups will be in advance

V

Variance

The measure of how much a list of numbers varies from the mean (average) value. It is frequently used in statistics to measure how large the differences are in a set of numbers. It is calculated by averaging the squared difference of every number from the mean.
In artificial intelligence (AI), the measure of how much the model varies when it is trained on different subsets of the training data. A model with high variance is very sensitive to the training data and might tend to overfit.

Vector

An ordered set of real numbers, each denoting a distance on a coordinate axis. These numbers may represent a series of details about a single person, movie, product or the entity being modeled.

Vector space

A collection of vectors, e.g., a matrix

W

Weight: A coefficient for a feature in a linear model or an edge in a deep network
Width: The number of neurons in a particular layer of a neural network