Developing an ML System

If you decide you want to develop an ML system, you need to decide on how to implement the component's of Scott's model.

Representations for data and knowledge come in many forms. We have talked about a few of those over the past two weeks. The representation should be a natural fit for the domain data and the domain knowledge. As we noted in lecture 4, there are both quantitative and qualitative approaches, e.g., the numbers vs. symbols dichotomy, and the possibility of a hybrid approach.

Some representations include: decision trees, version spaces (Mitchell), kNN classification, rules of various types, first order logic statements, graphs/networks, blackboard systems, tuples (such as RDF and XML), etc.

For the memory, you need to consider both a working or transient memory as well as a persistent memory. Working memory often holds temporary or intermediate results, while persistent memory holds accumulated knowledge including the solutions to specified problems.

G is the event generator that creates the inputs to the learning procedure. G may be internal or external to the MLS. in school, teachers represent external generators to the students - at least through high school and, perhaps, parts of college. For a sensor system, say a radar, it could be the periodic pulse-and-return event of the radar system. The return is the event that the MLS would see as an input and process. An internal example is a kmeans clusterer or a neural network. Both receive an initial external event, but subsequent events are the outcomes of applying the event generation procedure(s) internally to the system For kmeans, it is the succeeding recomputation of the clusters, while for NN, it is the results of one pass through the network.

The learning procedure P is the mechanism by which you system "learns" (if it does - see the initial sentence in the previous blog. P, which may consist of multiple functional mechanisms, constructs or modifies the representations in memory (either one or both). In effect, P encompasses a set of transformations. For example, one iteration of a neural network where the representations is the values and weights at each layer of that iteration.

The memory progress through states with each state representing the values of the elements of the representation scheme. Capturing these different states can allow the user to "walk back" from the resulting solution to see how the solution was derived.

The evaluation procedure V assesses the result of each stage of the learning process, e.g., the transformation of data as represented in memory. This evaluation can be performed internally or externally. An example of internal evaluation in kmeans is the set of clusters resulting from one iteration of the algorithm. A typical evaluation is what percentage of the the elements in each cluster have moved from another cluster. A typical stopping criterion is when no points have moved from the previous stage. An external evaluation could be the user entering a command to terminate or a value indicating how P did.

Stopping critieria are often specified externally in order to avoid systems "running way" through oscillation over results that are "almost correct, but not good enough yet" or that continue generate representations that never meet a threshold for algorithm or reasoning termination.

I suggest that that this is a good way to start thinking about your project, whether you choose an MLS approach or other type, for your project. If you have questions or need assistance working through this, I will be at Foundation about 8:30 AM each class day, and can probably stay after the class ends at 11:30 AM.

Leave a Reply Cancel reply