performance - Memory bandwidth achievable on a single core -


on modern multi-core platforms parallel performance of memory bandwidth bounded applications not scale number of cores. usually, speedup observed number of cores, after performance saturates. synthetic example well-known stream benchmark, used report achievable memory bandwidth, i.e., memory bandwidth @ saturation point.

consider following results of stream benchmark (triad) on single xeon e5-2680 peak memory bandwidth of 42.7gb/s (ddr3-1333):

1  core  16 gb/s 2  cores 30 gb/s 3+ cores 36 gb/s 

stream scales 1 2 cores, above 3 cores performance constant.

my question is: determines memory bandwidth can achieved single cpu core? since question broad, narrow down above mentioned architecture: how can predict stream 1 thread give me 16 gb/s specs of e5-2680, or looking @ hardware counters etc?

for single core major factor cpu frequency , cpu micro architecture, speed of single core make requests bus , how cpu can predict memory location you're going access. cpu designers go great lengths make things appear faster , hide effect of latencies, if memory access random , code execution depends on data you'll have factor memory access latency, whereas if read bunch of data , add you'll have bandwidth. single core, absolute ceiling clock speed.

for multi-threaded access bottleneck bus , ram architecture on motherboard , north bridge. depend on motherboard. can have 50% slower dram 4 of them in parallel , achieve speedup. or vice versa.

the question broad. if want know more memory programmer's perspective @ what every programmer should know memory. has in-depth description of various factors.

it's in-depth topic.

ps, prediction, it's not quite possible, or not quite practical. measurement better, unless have access very detailed specs of cpu, chipset, motherboard , ram, , it's educated guess. you're better off measuring in real life, under particular workload.


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -