Paul Scott (Scott 1983) developed a Taxonomy of Machine Learning Systems (MLS).
According to Scott, a learning system, LS, is a system which changes its behavior in response to modifications to its memory. It operates within an environment and consists of several components. Essential to this definition is that a learning system has a memory which evolves over time. The memory is comprised of a set of representations that organize the knowledge gained from its experiences. The learning system must be able to incorporate experiences based on its interpretation of events, performance of tasks, and their subsequent evaluation. Otherwise, it functions as a purely deterministic automaton. Additionally, it may or may not interact with its environment to assess its performance based on the completion of tasks.
Scott identified both structural and functional components of an MLS. The structural components are:
- RS a representation scheme
- E a set of possible experiences
- Q a set of values of representations
- M a set of internal memory states
The functional components are:
- P a learning procedure
- G an experience or problem generator
- V a representation evaluator or critic
So, a learning system can be described by a 4-tuple:
LS = {S, P, G, V}
where S is the structural description of the learning system: S = { RS, E, Q, M }.
With is description in mind, we can describe an eightfold model of learning systems:
The eight types can be organized along three functional components. The division along P leads to two groups:
- Self-organizing Group: (PVG, PG, PV, P)
- Teachable Group: (G, V, VG, Rote)
The eight types of learning systems are (with names assigned by Scott):
- PVG Discovery Systems
- PG Reinforcement Systems
- PV Conceptual Clustering
- P Classical Conditioning
- VG Skeptical Learning
- V Advice-taking Systems
- G Inquiry Systems
- () Passive Systems (learning by being told)
For the last category, we use the term rote learning.
An event generator G is a procedure that creates a set of events for input to the learning procedure of the learning system. Let us assume that there is a generator G which creates events ej for LS. G may be internal or external to LS. The event generator may be a human (teacher), a program, or an automated sensor or device.
In the case of the human, he may not know the intended objective of the learning activity. So, it is possible for him to produce examples which do not contribute to the learning process. The event generator – be it human or program – may be distinct and independent from the critic who evaluates the results.
The four types for which G is an internal component are labeled G, VG, PG, and PVG in the above model.
G has the general form:
G: W x Ri* x M -> E x M’ (for all i)
where W may be supervised or unsupervised
- M, M’ represent memory state sets
- E is the event history
- Ri* is a representation history
Each instance of an execution of G takes the current representation and the current memory, interprets the event, and determines whether it must modify the representation and the memory.
A learning procedure P is a procedure that constructs or modifies the representations contained in memory. The changes to representations in memory are made through transitions of the representations. P has the general form:
P: E x M -> M’
Given a set of events E, P takes the memory M from its current state to a new state M’. Elements in M may have one or more representations or structures. Typically, P consists of a set of operators that transform elements
of M, e.g., pi(ej, mk) -> mk’.
An evaluation procedure V assesses the results of a transition and provides an evaluation. The evaluation may be performed internally by the learning system or externally by the world and provided to the learning system as an input. The evaluation is used to determine whether to accept or reject the transition. V may be loosely viewed as determining whether a particular change is “good” or “bad”. However, such evaluations need to be made in a larger context which must be explicitly specified in any description of V.
V has the general form:
V: E x M -> Z
where Z is a set of possible evaluations that V is able to make.
V can be simple or sophisticated. Some possibilities are:
- Z may be some binary measure equating to “good” or “bad”
- Z may be a numerical value – selected from a discrete or continuous interval
- Z may be a complex structure that is used in making the transition
So, there are different types of learning systems. Think about this brief definition. Under what submodel would statistic ML fall?