Abstract We present an enhanced multi-precision squaring algorithm for low-end RISC microcontrollers. Generally, they have many general-purpose registers and limited bus size (8–32bits). The proposed scheme employs a new technique, “lazy doubling” with optimizing computing sequences; so, it is significantly faster than the previous algorithms. Mathematical analysis shows that the number of clocks required by the proposed algorithm is about 67% of those required by the carry-catcher squaring algorithm. To the best of our knowledge this is known to be the fastest squaring algorithm. Experimental results on the ATmega128 microprocessor show that our algorithm is about 1.5 times faster than the carry-catcher squaring algorithm in terms of the number of clocks required. As squaring is a key operation in public key cryptography, the proposed algorithm can contribute to lowering power consumption in secure WSNs (wireless sensor networks) or secure embedded systems.