Abstract
The main purpose of the present work is to present and discuss a model for a decision maker who is using feedback information and is subjected to goal uncertainty. The decision maker is allowed to choose one of a finite number of courses of action and observe one of a finite number of possible outcomes resulting from his choice.
Two alternatives are available to 'learn' the best course of action. One is to assign a subjective score to each course of action, outcome pair and follow the courses of action that lead to the highest current estimates of the expected score. This approach quickly establishes a dominant course of action which under certain circumstances may not be optimal. The other alternative forces much more switching between courses of action and no dominant action arises.
These alternatives are combined in a two-phase approach to zeroing in on the optimal course of action. In phase one the decision maker cycles until enough information has accu mulated. In phase two he follows the highest expected utilities. The algonthm is written out in detail in the appendix.
Get full access to this article
View all access options for this article.
