The core contribution of this paper is a three-fold improvement of the Haar discrete wavelet transform (DWT). It is modified to efficiently transform a multiclass- (rather than numerical-) valued function over a multidimensional (rather than low dimensional) domain, or transform a multiclass-valued decision tree into another useful representation. We prove that this multidimensional, multiclass DWT uses dynamic programming to minimize (within its framework) the number of nontrivial wavelet coefficients needed to summarize a training set or decision tree. It is a spatially localized algorithm that takes linear time in the number of training samples, after a sort. Convergence of the DWT to benchmark training sets seems to degrade with rising dimension in this test of high dimensional wavelets, which have been seen as difficult to implement. This multiclass multidimensional DWT has tightly coupled applications from learning "dyadic" decision trees directly from training data, rebalancing or converting preexisting decision trees to fixed depth boolean or threshold neural networks (in effect parallelizing the evaluation of the trees), or learning rule/exception sets represented as a new form of tree called an "E-tree", which could greatly help interpretation/visualization of a dataset.