performance - Memory bandwidth achievable on a single core -
on modern multi-core platforms parallel performance of memory bandwidth bounded applications not scale number of cores. usually, speedup observed number of cores, after performance saturates. synthetic example well-known stream benchmark, used report achievable memory bandwidth, i.e., memory bandwidth @ saturation point.
consider following results of stream benchmark (triad) on single xeon e5-2680 peak memory bandwidth of 42.7gb/s (ddr3-1333):
1 core 16 gb/s 2 cores 30 gb/s 3+ cores 36 gb/s
stream scales 1 2 cores, above 3 cores performance constant.
my question is: determines memory bandwidth can achieved single cpu core? since question broad, narrow down above mentioned architecture: how can predict stream 1 thread give me 16 gb/s specs of e5-2680, or looking @ hardware counters etc?
for single core major factor cpu frequency , cpu micro architecture, speed of single core make requests bus , how cpu can predict memory location you're going access. cpu designers go great lengths make things appear faster , hide effect of latencies, if memory access random , code execution depends on data you'll have factor memory access latency, whereas if read bunch of data , add you'll have bandwidth. single core, absolute ceiling clock speed.
for multi-threaded access bottleneck bus , ram architecture on motherboard , north bridge. depend on motherboard. can have 50% slower dram 4 of them in parallel , achieve speedup. or vice versa.
the question broad. if want know more memory programmer's perspective @ what every programmer should know memory. has in-depth description of various factors.
it's in-depth topic.
ps, prediction, it's not quite possible, or not quite practical. measurement better, unless have access very detailed specs of cpu, chipset, motherboard , ram, , it's educated guess. you're better off measuring in real life, under particular workload.
Comments
Post a Comment