In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space before convergence is achieved. Although the basic methods are now reasonably well understood, it is precisely the structural simplicity of the reinforcement learning principle – learning through experimentation – that causes these excessive demands on the learning agent. Additionally, one must consider that the agent is very rarely a tabula rasa: some rough knowledge about characteristics of the surrounding environment is often available. In this paper, I present methods for embedding a priori knowledge in a reinforcement learning technique in such a way that both the mathematical structure of the basic learning algorithm and the capacity to generalise experience across the state-action space are kept. Extensive experimental results show that the resulting variants may lead to good performance, provided a sensible balance between risky use of prior imprecise knowledge and cautious use of learning experience is adopted.