How should an agent explore its environment? How can it learn to select appropriate actions when their effects are only apparent in the future? When is it useful for the agent to build a model of the dynamics of its world, rather than simply to learn a reactive strategy? How can an agent take advantage of the idea that similar situations will require similar reactions? What happens if the agent is unable to completely perceive the state of its environment? In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. In many cases, we have developed new ways of viewing the problem that are, perhaps, more consistent with the AI perspective. We begin by introducing the theory of Markov decision processes (MDPs) and partially observable Markov decision processes POMDPs. We then outline a novel algorithm for solving POMDPs off line and show how, in many cases, a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a simple example.
Bisherige Ansätze zur Realisierung von Robustheit in natürlichsprachlichen Systemen beruhen im Kern auf zwei fundamentalen Funktionsprinzipien:
Da der vorgeschlagene Ansatz auf einer expliziten Verwaltung der verfügbaren Ressourcen zur Lösung der jeweils gegebenen Disambiguierungsprobleme beruht, eröffnet er letztendlich sogar die nicht ganz unbegründete Aussicht, daß sich auf dieser Grundlage ganz unterschiedliche Robustheitsphänomene, wie die Toleranz gegenüber Eingabefehlern und das flexible Reagieren bei externem Zeitdruck, auf sehr ähnliche Verarbeitungsprinzipien zurückführen lassen.
Intelligent models rest on a foundation of elementary logic. Intelligent systems call for organizationally closed networks of interacting processes. An interesting step in the evolution from intelligent models to intelligent systems is to approach logic itself as a system of autonomous elementary processes, called distinctions.
Distinction networks (dnets) represent logical expressions as directed graphs of nodes and links. Distinction nodes act as fine- grain logical agents with knowledge only of their local connectivity, behaving as a generalized NOR operator. Links represent logical dependency. Dnets differ from circuits in that logical values are mapped onto the existence of links between nodes rather than onto signals propagating over links. Nodes can then enact boolean reduction independently and asynchronously based on their local context of connectivity. The entire dnet operates in parallel to arrive at valid logical conclusions and reduced boolean functions. Dnet deduction is equifinal but non-deterministic. The map from a conventional basis for logic {if, true} onto distinctions is many-to-one, demonstrating that dnets are conceptually simpler than propositional logic. This simplicity is a consequence of confounding the existence of a communication channel with the message on that channel. The structure of the network embodies logical semantics, while the organization of the network is closed and homogeneous.
The computational complexity of the Tautology problem for propositional calculus occurs in dnets as the necessity to identify duplicated graph structures. Even in this simple domain, nodes must cooperate with neighbors in order to identify common networks of connectivity. Cyclic loops in dnets introduce primitive models of time and recursion, generating a boolean waveform calculus and complex boolean values with strong similarities to complex numbers.
The problem of signal to symbol transformation and vice versa in a systematic fashion has been an outstanding problem both in Robotics as well as in Artificial Intelligence. It is a very simple observation: We prescribe instructions/commands to autonomous robots in a symbolic way, yet the perception-action cycle is best modeled with the use of continuous mathematics. The reason why the perception-action cycle is modeled in continuous fashion is that agents interacting with the physical external world form a dynamic system. The formulation of this problem leads to nonlinear differential equations. During the last five years progress has been made in understanding autonomous mobile agents and their control in the task of going to a goal position and avoiding obstacles. For a single goal such problem can be formulated purely in terms of planning dynamics. The discrete aspects of navigational tasks derive from the requirement to traverse a given area, while passing a set of ahead-specified landmarks. The sequential as well as the conditional nature of this problem can be nicely described via a task description language. The language captures the dynamic nature of individual processes as well as interactions between them and enables a systematic partition of the signal into symbols and analysis of the composition of more complex behaviours. Within this framework we can specify a variety of cooperative behaviors and define different levels of coupling between multiple agents, the simplest one being just navigating individually to a set of prespecified locations, or navigating while keeping in formation. In the first scenario the agents treat each other as moving obstacles, while in the second case they have to communicate and share the information about obstacles in their field of view. In this paper we shall present some results to this effect.