CUDA architecture. New CUDA compatible GPUs are implemented as a set of multiprocessors. Each multiprocessor has several ALUs (Arithmetic Logic Unit) that, at any given clock cycle, execute the same instructions but on different data. Each ALU can access (read and write) the multiprocessor shared memory and the device RAM.
Manavski and Valle BMC Bioinformatics 2008 9(Suppl 2):S10 doi:10.1186/1471-2105-9-S2-S10