Ultimate Overclocking

By Illya Tsemenko. Posted

Overclocking can take many forms and shapes, from a useful everyday boost of a few hundred MHz (think of tuning a car engine to get better acceleration performance) to insane cryo-cooled rigs for international OC competitions (drag-racing cars with jet engines, no less).

Overclocking is not limited just to personal computers: the same concept applies well for nearly any digital system, including mobile processors and embedded SOC (system-on-chip) devices, such as the Raspberry Pi 3.

One of the common ways to get performance gains is by increasing the running frequency of the processor, memory and storage interface. More frequency = faster computation. That comes at a price: increased power consumption and possibly, reducing stability, as an erroneous operation is more likely under stress.

Overclocking the Pi

Where do we start? First, we study the overall system design: what components are present on computer’s PCBA and how we can increase their performance.
The heart of the Pi is the Broadcom BCM2837 SoC, which has a quad-core Cortex-A53 ARM processor and a VideoCore IV GPU.

A separate 1GB LPDDR2 SDRAM memory chip – EDB8132B4PB-8D-F from Elpida (Micron) – is located on the underside. Memory is allocated dynamically between CPU and GPU use, depending on the settings in the raspi-config tool.

Connectivity is provided by the Broadcom BCM43438 WiFi/Bluetooth chip and SMSC LAN9514 USB/LAN hub. This is only important to us because we need to keep these chips alive during overclocking so we can keep communicating with the Pi.
Now it’s time to check the nominal clocks and voltages for each of the components. The nominal max CPU clock speed is 1200MHz with active power management; the nominal max GPU 3D- core clock is 300MHz (200MHz for the 2D clock); and the LPDDR2 memory clock is 400MHz (800 MT/s), 2.5 ns cycle time.

All clocks are generated internally by the PLL section of the SoC, which takes the 19.2MHz input clock from a tiny oscillator on the back of the board and multiplies that to get a higher frequency.

The Pi takes all its power from a single input – micro-USB connector ‘PWR IN’ – and has on-board regulators to generate low voltages required by the CPU, GPU, LPDDR2 memory, and peripherals. It’s important to know what these voltages are, as higher clocks may require higher voltages to maintain stability and error-free operation.
Now, with this knowledge about Pi 3 clocks and voltages, the first step would be to install some benchmarks to establish the baseline performance, after which we can try to use existing knobs to increase clocks/voltages to compare how much performance difference we gain. It’s important to test actual performance, not just reported MHz speed, because if the CPU overheats it could reduce actual running clock to lower values and we’d get worse performance than expected, often even lower than a non-overclocked result. The very same methods apply to ‘traditional’ PC overclocking and benchmarking.

An important note on the actual used sample results: due to variation in manufacturing processes, every piece of silicon, be it processor core or memory, has a different margin (overclocking ability before data gets corrupted). This means that theoretical chip A may overclock to 1400MHz, but chip B of the same model and in the same condition could reach only 1350MHz. Chip C on the other hand may be capable of running 1450MHz. To test this in practice, we’ll use not just one Pi 3 module, but five of them to find the best specimen!

Frequency and voltage control settings

To adjust clocks and voltages in a traditional computer, you have a special BIOS Setup interface. Boot settings, low-level device configurations, various memory settings, and power management settings are often available in the PC BIOS. The Pi has much less room and power in its internal bootloader, so actual overclocking settings, like clocks, are set by a special kernel configuration file, located on the FAT section of the Pi’s SD card: /boot/config.txt.
All these clocks are separate with their own clock generation, so they can be adjusted independently. There are also a few additional settings that can be tweaked.

Voltage Source Usage Controller Note
+5 VDC Input External PSU Mail power input Can be alternatively sourced from pin header 38 .
+3.3 VDC +5 VDC Main I/O voltage Diodes PAM2306 Switching VREG Channel 1
+1.8 VDC +5 VDC DRAM/CPU/GPU voltage Diodes PAM2306 Switching VREG Channel 2
PLL, +3.3 VDC|SOC LDO Internal PLL voltage Internal from BCM2837 Linear regulator for PLL clock power .
CPU, +1.0 VDC|+5 VDC Main CPU voltage Richter RT8088A Switching VREG .

*armfreqmin – Minimum value of
arm_freq used in low power state. Default is 600 for Pi 3.

*corefreqmin – Minimum value of
core_freq used in low power state. Default is 250 for Pi 3.

*sdramfreqmin – Minimum value of sdram_freq used in low power state. Default is 400 for Pi 3.

*temp_limit – Thermal limit protection threshold. Sets clocks/voltages to default once reached. Default limit is 85°C.

*sdram_schmoo=0x02000020 – Memory training tweak.

*overvoltage - Processor logic voltage offset, in 25 mV/bit steps. Allowed range from -16 to 8 (8 * 25 = 1.200 VDDCORE(DEFAULT) + 0.2 = 1.400 V).

*overvoltagesdramp – memory cell level voltage offset. Same range maths as overvoltage.

*overvoltagesdrami – memory I/O voltage offset. Same range maths as overvoltage.

*overvoltagesdramc – memory logic level voltage offset. Same range maths as overvoltage.

*force_turbo – Disable dynamic low power states for RPI SoC. This setting also voids your warranty.

*bootdelay – Some owners have reported this to be helpful in the case of SD card data corruption when used with forceturbo.

Warning: Overriding the templimit and forceturbo settings will void the warranty on a Raspberry Pi 3 computer.
Let’s download some benchmarks and run them to see if we can get the Pi to go fast, using these adjustment knobs.
Due to a PLL maximum clock range limitation at 3200MHz on the Pi 3, there is no known way of pushing real CPU frequency past 1600MHz. If one were to configure a higher value, it would result in the processor running with a much lower clock speed, as shown by a simple performance test:

--------- HWBOT Prime 0.8.3 ----------
Processor detected:
ARMv7 Processor rev 4 (v7l) BCM2835
Estimating speed... 4x 1,650MHz @ -86.187 C
976 MB memory
Running benchmark using 4 threads.
Starting benchmark...
Warm up phase:   .......................
Benchmark phase: .......................
All done! Current CPU temperature: -84.035 C
Score: 304.56.

The score here should be around 520, if the clock 1650MHz were correct, but it’s even 44% slower than the nominal 1200MHz clock – about equal to half of the desired frequency, 825MHz CPU clock.

Benchmark software setup

First, download the latest Raspbian Stretch image. For all testing presented here, we used the version from 7 September 2017 with kernel 4.9.41.

Linux rpi-oc1 4.9.41-v7+ #1023 SMP Tue Aug 8 16:00:15 BST 2017 armv7l GNU/Linux

It’s worth setting the CPU power governor to performance mode, to favour less switching from idle state to full-performance state:

echo "performance" | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

A handy script to measure current CPU frequency and report die temperature:

root@rpi-oc1:/home/pi# cat ./check_cpu_speed_temp.sh
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
/opt/vc/bin/vcgencmd measure_temp

It is also working well to report negative temperatures (below 0 °C), which is very handy for extreme overclocking.

root@rpi-oc1:/home/pi# ./check_cpu_speed_temp.sh
600000
temp=-35.1’C

The first number shown is the CPU frequency; the second, temp value is the SoC thermal sensor reading. The frequency drops to 600MHz when the Pi 3 is idle, to save energy, but will jump to the max value once a workload is running. Now we’re ready to install the benchmarks.

HWBOT Prime benchmark

There is also an overclocking guide for HWBot Prime on the HWBOT.org website, which we can use to measure performance in prime numbers calculation. To set up the benchmark, download the JAR file:

wget http://downloads.hwbot.org/hwbotprime.jar

To execute this Java-based benchmark, we need to pre-install the openJDK from openjdk.java.net.

apt-get install openjdk-8-jre

Starting the benchmark is just a simple call for .jar from a Java environment:

java -jar hwbotprime.jar
--------- HWBOT Prime 0.8.3 ----------
Processor detected:
ARMv7 Processor rev 4 (v7l) BCM2835
Estimating speed... 4x 1,200MHz @ 61.224 C
970 MB memory
Running benchmark using 4 threads.
Starting benchmark...
Warm up phase: .................  done!
Benchmark phase:  ..............  done!
All done! Current CPU temperature: 72.522 C
Score: 440.96.

Here, our example score is 440.96. The utility also reports CPU temperature, which can be handy for checks.

Note that the OpenJDK library build version can have a very big impact on the score, so test results can be compared only when using the same package versions. You may see many faster results online, obtained using a different JDK version.
It’s a good idea to run the benchmark multiple times to make sure that scores are consistent. Some small variation, of a few percent, is normal.
If the module is unstable, you can get random locks or kernel panic messages, or just bad scores. A few examples are presented in the logs below:

Message from syslogd@rpi-oc1 at Oct 25 17:41:40 ...
kernel:[   97.266669] 7fe0: 62442dfc 62442e08 00000000 76f19950 80000010 63208a94 55555d80 55555547
Message from syslogd@rpi-oc1 at Oct 25 17:41:40 ...
kernel:[   97.296319] Code: 0a00000a f57ff05b e2853028 f593f000 (e1932f9f)
............................... done!
All done! Current CPU temperature: 9.576 C
Score: 260.77.

In this case, the score is half what it’s supposed to be. Often tests just crash due to processor instability.

Sysbench benchmark

The benchmark utility sysbench wiki.gentoo.org/wiki/Sysbench allows you to benchmark processor, memory, file I/O, and mutex performance on Linux platforms. It runs in a command-line interface as a console tool. To install this benchmark in the Raspbian OS, we use the apt-get tool:

apt-get install sysbench

Once installation is successful, it can be executed using a single command with the desired test parameters. In this article, this benchmark will be used to test memory speed. Our test will allocate a memory buffer and then read/write from it. This is then repeated until the provided volume (--memory-total-size) is reached. Users can provide multiple threads (--num-threads), different sizes in buffer (--memory-block-size) and the type of requests (read or write, sequential or random).

sysbench --test=memory --num-threads=4 --memory-access-mode=rnd --memory-total-size=800M run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 4
Doing memory operations speed test
Memory block size: 1K
Memory transfer size: 800M
Memory operations type: write
Memory scope type: global
Threads started!
Done.
Operations performed: 819200 (2263497.25 ops/sec)
800.00 MB transferred (2210.45 MB/sec)

For comparison reasons, it’s important to keep the same settings across the benchmark, so we know we’re comparing apples with apples.

Clock domain Parameter in /boot/config.txt Minimum Default Maximum
CPU/logic clock arm_freq 100 1200 1600
2D GPU/L2 cache clock core_freq 250 400 600+
Video decoder clock h264_freq . 300 .
Imaging pipeline clock jsp_freq . 300 .
3D engine clock v3d_freq . 300 .
LPDDR2 memory clock sdram_freq 200 450 600+

OpenQuake graphics test

The graphics core in a Raspberry Pi is powerful enough to run a special version of the famous Quake 3 FPS! So we can use it to benchmark combined processor and graphics core performance.

wget http://www.berryterminal.com/dl/ioquake3_99.1.36-rpi01_armhf.deb
sudo dpkg -i ./ioquake3_99.1.36-rpi01_armhf.deb
sudo apt-get install openarena
sudo apt-get clean
cp /opt/vc/lib/libbrcmEGL.so /lib/libEGL.so
cp /opt/vc/lib/libbrcmGLESv2.so /lib/libGLESv2.so

Make sure you have set a GPU memory size of at least 224 MB in raspi-config, otherwise the game won’t start.
To start the benchmark, just run

/usr/games/openarena.

The Raspberry Pi 3 gets rather hot running bare metal, without any heat-sinks in still air. A few thermal images taken with a Fluke Ti32 camera reveal temperature gradients well. The memory does not get hot at all, barely differing from the board surface temperature. However, the Broadcom SoC runs around +47°C at idle, going up to a scorching +75°C under full load.

Based on these images, there is no need to have dedicated heat-sinks for the memory chip, as it would be cooled from PCB thermal conduction once we get main the SoC colder.

RaspPi overclocking 2

Overclocking results

Our initial check was to see what max frequency the Pi can boot into console. To perform all CPU and memory speed benchmarks, a plain headless configuration was used. That means the Pi was connected to a network over the Ethernet port, with sshd running to provide access to the console remotely.

To avoid limitations from power supply input, a high-end EVGA NEX 1500W PSU was used as a power source, which can supply a serious 25 A on +5V output. Measured voltage was +5.120 V at the Pi pins. The connection between the Pi and PSU was made using a short cable with AWG18 wires.

Now we know which Pi is the best, we can take that one for a full modification and cooling workout. So all further testing was performed on the promising unit #4. The thermal image of a Pi overclocked to 1500MHz reveals a hot spot at +92°C! With default throttling settings, that is 7°C over the throttle limit temperature, when the CPU speed drops to reduce stress.

Improving cooling – air

During typical operation without overclocking, the Raspberry Pi 3 does not require special heat-sinks or additional cooling. However, with the overclocked settings, especially with increased voltage for the processor, it will get too hot for reliable operation.

Attaching a simple aluminium-finned heat-sink and additional airflow from a fan can provide much better thermal conditions, securing better stability and overclocking headroom. The design of the PCB is quite friendly to this simple modification, as there are no tall components in close proximity to the processor. A thin sticky thermal pad for the heat-sink attachment will do the job.

With this simple heat-sink treatment, our Pi was able to run around 1500MHz in loops, stable enough to pass any performance benchmarks multiple times.
Thermals are now much better, with the SoC area reduced to around +57 °C, instead of over +90 °C.

Our best score in the HWBot Prime benchmark was around 15% faster, and the memory benchmark yielded a 26% performance increase. Can it go further?

Benchmark test CPU Frequency GPU/L2 Frequency DRAM Frequency Result Temperature
HWBot Prime Default Default Default 440 +69.3
Sysbench memory 800MB Default Default Default 2025.5 MB/sec +49.3

Voltage modification

To improve stability under extreme operation with a 1600MHz processor clock, VDD_VCORE voltage was supplied externally from an EVGA EPOWER V module, programmed to 1.500 V. The EVGA EPOWER V can supply up to 2.000 V, which is plenty of headroom for our purposes.

This way we are also not limited by the over_voltage range 1.400 V maximum limit (setting 8) and can apply arbitrary high voltages. To connect an external power source, you’ll need to hook thick enough wire to the C163 positive terminal. It’s easy to spot by looking at the connection with a little power inductor. This is confirmed by the schematic section as well.

The external source is connected to this point by AWG18 wire. Since nominal current is barely a few amps, just one wire would work well enough. Also, return ground wire is important, so we’ve used a large HDMI connector body to get a low-resistance ground connection.

Another benefit of using an external core voltage supply is that this voltage is not controlled by the Pi’s dynamic power management, so we will have constant and stable voltage on the rail, no matter whether the CPU is idle or busy crunching numbers.

Extreme cooling - liquid nitrogen

Now the heat-sink was replaced with a massive Kingpincooling.com F1 extreme cooling evaporator block. To make it fit the Raspberry Pi, we had to remove the J8 pin header and the J3 and J4 FPC and the J7 A/V connector. We also coated both sides of the PCBA with petroleum jelly to avoid water condensation and ice shorting components on the board.

Thermal grease was applied on top of the CPU and heavy copper was just standing on top of it. The bottom side was supported by a small rubber mat to keep everything flat and steady.

The benefit of using a massive copper block for LN2 cooling instead of a smaller tube is the thermal response time of such a system. It will take minutes for a small CPU to warm such a large block of cold copper, so the operation of the whole jig is simple: no need to pour liquid nitrogen all the time to keep a stable temperature. Instead just give it a splash to keep temperatures within the desired window. Extreme overclockers use the same method to cool traditional PC processors and graphic cards, but at a faster rate, as thermal loading of a modern multi-core Intel/AMD CPU or Nvidia/AMD GPU can reach hundreds of watts.

Since the thermal image camera cannot capture temperatures below -30°C, we will have to use software temperature reporting as a base measure, together with a Fluke 52-II thermometer, to stabilise container temperature during the benchmark runs.
The Broadcom SoC chip has a cold-bug around -100°C (by software-monitoring value), which means that it will stop working once the temperature drops below that. Since the boiling point of liquid nitrogen is -196°C, we need to maintain variable temperature control. This can be done manually, by pouring an amount of liquid nitrogen onto the evaporator block. Experimentally, it was determined to keep the copper block temperature at around -120 to -140°C, maintaining a stable Pi temperature close to -80 to -90 °C.

Getting 1550MHz stable didn’t require very cold temperatures: the chip was functioning fine even at just -20°C. 1600MHz was stable once temperatures were below -45 to -50°C.

The Quake 3D game demo also ran flawlessly at the maximum CPU frequency. However, due to software configurations, there was no benchmark data to compare with. Fastest run logs are presented below.

RasPi overclocking 1

HWBOT Prime test

This test completed without issues at the maximum 1600MHz clock, at a pretty impressive temperature of -88.34°C, and gave an overall score of 514.53. The memory test yielded 819 200 operations performed (2 839 246.15 ops/sec) with a total of 800.00MB transferred (2772.70 MB/sec).

Are these numbers worth the nitrogen used? Not really, but it was fun to see how can it run and what the limits are of ARM-based computer overclocking. If the CPU clock frequency could be increased higher than 1600MHz, we would be likely to see a much bigger impact from going to extreme overclocking.
But until then, that’s all for now.

Benchmark test CPU Frequency GPU/L2 Frequency DRAM Frequency Result Temperature
HWBot Prime 1550MHz 450 MHz 550 MHz 504 -49.0
HWBot Prime 1600MHz 450 MHz 550 MHz 512 89.9
HWBot Prime 1600MHz 500 MHz 550 MHz 514 -86.8
Sysbench CPU 20000 1550MHz 450 MHz 550 MHz 71.5 sec -25.2
Sysbench CPU 20000 1600MHz 450 MHz 550 MHz 69.3 sec -77.4
Sysbench memory 800MB 1550MHz 450 MHz 550 MHz 2582.1 MB/sec -21.0
Sysbench memory 800MB 1600MHz 500 MHz 600 MHz 2772.7 MB/sec -86.3

Summary and conclusion

Overclocking is fun as hobby, but it can also be useful in practical applications. Jack Zimmermann hsmag.cc/heFhWO demonstrates an excellent example of how overclocking a Raspberry Pi 3 in the role of a Stratum-1 NTP time server helps to reduce the uncertainty of GPS time synchronisation. Another possible use for an overclocked Raspberry Pi is various cross-platform emulators and game console emulators, where performance is never enough.

With the help of a little liquid nitrogen, we managed to overclock a Raspberry Pi 3 Model B to its maximum 1600MHz limit without much trouble. This article reveals a few basic bits about overclocking theory and methods. It’s not rocket science, so anyone can do it, and getting hold of cryogenic liquids is far from mandatory. In the end, it’s another way to have some fun with this capable little microcomputer.

From HackSpace magazine store

Subscribe

Subscribe to our newsletter