For more than twelve years, I am evaluating COTS architectures in order to re-design existing algorithms or design new algorithms for a given functionnality. Here is the quite exhaustive list of the evaluated architecture
Hewlett Packard PA-RISC: PA7000, PA8200 and PA8500 for signal & image processing (Canny-Deriche and FGL IIR recursif filters)
Digital Equipment Corporation : for signal & image processing (Canny-Deriche and FGL IIR recursif filters)
Texas Instrument DSP TMS320C80 for motion detection and Markov Random Field regularization
Texas Instrument DSP TMS320C62 for signal & image processing (Canny-Deriche and FGL IIR recursif filters) and Connected Component Labeling (Light Speed Labeling)
IBM PowerPC G3, G4 and G5 for signal & image processing (Canny-Deriche and FGL IIR recursif filters) and wavelet transform (JPEG 2000)
Intel Pentium 1, MMX, Pro, 2, 3, 4 for signal & image processing (Canny-Deriche and FGL IIR recursif filters)
Intel Core 2 Duo and Quad (Conroe, Penryn, Nehalem, Westmere, Bloomfield, i7) for signal & image processing (Harris Point of Interest)
AMD Althon for signal & image processing (Canny-Deriche and FGL IIR recursif filters)
These benchmarks give me some expertise to improve implementations with a better pipeline execution and cache handling.
Geek attitude: I have also benchmarked for the fun the following processors:
65C02 (Apple2c) for prime number generation
ARM3 (Archimedes A540) for prime number generation
Saturn (Hewlett Packard calc HP28S, HP48S) for XOR Vigenere crpytography