Simplified view of the nVidia G80 Architecture. This figure, inspired by a similar figure in  shows how the GPU is organized into several (N) multiprocessors, each containing multiple (M) stream processors that simultaneously execute the same instruction. Each processor can access the texture cache very quickly, but reads and writes to the onboard RAM have high latency.
Schatz et al. BMC Bioinformatics 2007 8:474 doi:10.1186/1471-2105-8-474