The needs of entertainment industry in the field of personal computers always require more realistic impressions of games. For this purpose manufacturers produce more powerful graphical devices, of which the compute capabilities have long overtaken central processing units (CPU). Such compute capabilities are achived through the capacity of many simple units, which do not contain good branching predictions, but better deal with the consequences through a huge number of active threads. In this diploma thesis I use the advantage of graphical devices for general-purpose programming for the needs of machine learning. The most important steps are the right choice of compute architecture and algorithm. I have chosen CUDA architecture, where I implemented backpropagation algorithm as a sampling method in an artificial neural networks area. In graphical device implementation I used several different optimization approaches to achieve more rapid execution. The purpose of thesis is to achieve as effective and fast concrete learning algorithm implementation on graphical device and thus to maximize speed up compared to the CPU implementation. At present-day fastest graphical devices I achieved more than 50-times speed up compared to the CPU implementation.