How To Access Registers In C++

This article discusses a C++ scheme for accessing hardware registers in an optimal way.

Exploiting C++'southward features for efficient and rubber hardware annals admission.

This article originally appeared in C/C++ Users Journal, and is reproduced by kind permission.

Embedded programmers traditionally use C every bit their linguistic communication of choice. And why non? It'south lean and efficient, and allows you to get every bit close to the metal as you want. Of form C++, used properly, provides the same level of efficiency as the best C lawmaking. Merely we can also leverage powerful C++ features to write cleaner, safer, more elegant low-level code. This commodity demonstrates this past discussing a C++ scheme for accessing hardware registers in an optimal mode.

Demystifying Register Access

Embedded programming is often seen equally black magic by those non initiated into the cult. It does require a slightly different mindset; a resource constrained environment needs small, lean lawmaking to get the most out of a slow processor or a tight retention limit. To understand the approach I nowadays we'll first review the mechanisms for register access in such an environment. Hardcore embedded developers can probably skip ahead; otherwise here's the view from 10,000 feet.

Nearly embedded code needs to service hardware direct. This seemingly magical human action is not that hard at all. Some kinds of register need a little more fiddling to get at than others, but you certainly don't demand an eye-of-newt or any voodoo dances. The verbal mechanism depends on how your circuit board is wired upwards. The common types of register access are:

Retentiveness mapped I/O The hardware allows us to communicate with a device using the aforementioned instructions as retentivity access. The device is wired upwards to live at memory accost n; register i is mapped at address n, register 2 is at n+1, register iii at n+2, and then on.
Port mapped I/O Certain devices present pages of registers that you lot have to map into memory past selecting the correct device 'port'. You might apply specific input/output CPU instructions to talk to these devices, although more often the port and its selector are mapped straight into the memory address space.
Bus separated It'southward harder to control devices connected over a non-memory mapped passenger vehicle. I ² C and I2S are common peripheral connection buses. In this scenario you must either talk to a defended I ^two C control chip (whose registers are memory mapped), telling it what to transport to the device, or you manipulate I ⁱⁱ C command lines yourself using GPIO ^{[ one ]} ports on another retentiveness mapped device.

Each device has a data sheet that describes (amongst other things) the registers it contains, what they do, and how to use them. Registers are a fixed number of bits broad - this is unremarkably determined by the type of device you are using. This is an important fact to know: some devices will lock up if yous write the incorrect width information to them. With stock-still width registers, many devices cram several bits of functionality into one register as a 'bitset'. The data canvas would describe this diagrammatically in a similar style to Figure 1.

Registers in a sample UART line driver device

Figure 1. Registers in a sample UART line driver device

So what does hardware access code look like? Using the simple instance of a fictional UART line driver device presented in Figure 1, the traditional C-way schemes are:

Direct memory pointer access . It's not unheard of to encounter register access lawmaking like Listing 1, but we all know that the perpetrators of this kind of monstrosity should be taken outside and slowly tortured. It's neither readable nor maintainable.

Arrow usage is usually made bearable by defining a macro proper name for each register location. There are two distinct macro flavours. The first macro fashion defines bare memory addresses (as in Listing 2). The only real reward of this is that you can share the definition with assembly code parsed using the C preprocessor. As you tin encounter, its use is long-winded in normal C code, and prone to error - you accept to get the cast right each time. The alternative, in Listing three, is to include the cast in the macro itself; far nicer in C. Unless there's a lot of associates code this latter approach is preferable.

We use macros because they accept no overhead in terms of code speed or size. The culling, creating a physical arrow variable to describe each annals location, would have a negative impact on both code performance and executable size. However, macros are gross and C++ programmers are already smelling a rat here. There are plenty of problems with this delicate scheme. It'south programming at a very depression level, and the code'southward existent intent is non clear- information technology'south hard to spot all register accesses as you browse a office.
Deferred assignment is a beautiful technique that allows you to write lawmaking like Listing 4, defining the register location values at link time. This is non commonly used; it's cumbersome when you lot accept a number of large devices, and not all compilers provide this functionality. It requires you lot to run a flat (not virtual) retentiveness model.
Use a struct to describe the register layout in retentivity, as in Listing v. There's a lot to exist said for this approach - it'due south logical and reasonably readable. All the same, it has 1 big drawback: it is not standards-compliant. Neither the C nor C++ standards specify how the contents of a struct are laid out in retention. You are guaranteed an exact ordering, but you don't know how the compiler volition pad out non-aligned items. Indeed, some compilers have proprietary extensions or switches to determine this behaviour. Your code might work fine with one compiler and produce startling results on another.
Create a function to access the registers and hide all the gross stuff in there. On less speedy devices this might be prohibitively slow, but for almost applications information technology is perfectly adequate, especially for registers that are accessed infrequently. For port mapped registers this makes a lot of sense; their access requires complex logic, and writing all this out longhand is tortuous and easy to become wrong.

Information technology remains for us see how to manipulate registers containing a bitset. Conventionally we write such code by hand, something similar Listing 6. This is a cinch mode to cause yourself untold grief, tracking down odd device behaviour. It'south very easy to dispense the wrong flake and get very disruptive results.

#define UART_RX_BYTES 0x0e uint32_t uart_read() {   while ((*UART_RXCTL & UART_RX_BYTES) == 0)        // manipulate hither   {     ; // wait   }   render *UART_RXBUF; }

Listing 6

Does all this sound messy and error prone? Welcome to the world of hardware devices. And this is merely addressing the device: what y'all write into the registers is your own concern, and function of what makes device command so painful. Data sheets are often cryptic or miss essential information, and devices magically require registers to exist accessed in a sure order. In that location volition never be a silver bullet and you lot'll e'er accept to wrestle these demons. All I can hope is to make the fight less biased to the hardware's side.

A More Modern Solution

So having seen the land of the art, at least in the C earth, how can we move into the 21st century? Being good C++ citizens we'd ideally avert all that nasty preprocessor use and find a way to insulate us from our ain stupidity. By the end of the commodity y'all'll have seen how to practice all this and more. The real beauty of the following scheme is its simplicity. It's a solid, proven arroyo and has been used for the last five years in production code deployed in tens of thousands of units beyond iii continents. Hither'due south the recipe…

Step one is to junk the whole preprocessor macro scheme, and ascertain the device's registers in a practiced old-fashioned enumeration. For the moment nosotros'll call this enumeration Register. We immediately lose the ability to share definitions with assembly code, but this was never a compelling do good anyway. The enumeration values are specified as offsets from the device's base retentivity accost. This is how they are presented in the device's datasheet, which makes it easier to cheque for validity. Some data sheets evidence byte offsets from the base of operations accost (and then 32-scrap annals offsets increment past 4 each time), whilst others prove 'word' offsets (and then 32-bit register offsets increment by one each time). For simplicity, we'll write the enumeration values however the datasheet works.

The next step is to write an inline regAddress function that converts the enumeration to a physical address. This function will be a very simple adding determined past the type of offset in the enumeration. For the moment we'll presume that the device is retentiveness mapped at a known fixed address. This implies the simplest MMU configuration, with no virtual retention address space in performance. This mode of operation is not at all uncommon in embedded devices. Putting all this together results in Listing 7.

static const unsigned int baseAddress =       0xfffe0000; enum Registers {   STATUS = 0x00, // UART condition register   TXCTL  = 0x01, // Transmit control   RXCTL  = 0x02, // Receive control   . then on ... }; inline volatile uint8_t *regAddress       (Registers reg) {   return reinterpret_cast<volatile          uint8_t*>(baseAddress + reg); }

Listing 7

The missing part of this jigsaw puzzle is the method of reading/writing registers. Nosotros'll do this with 2 unproblematic inline functions, regRead and regWrite , shown in Listing eight. Existence inline, all these functions tin piece of work together to make great, readable register access lawmaking with no runtime overhead whatever. That's mildly impressive, just we tin can do so much more.

inline uint8_t regRead(Registers reg) {   return *regAddress(reg); }  inline void regWrite       (Registers reg, uint8_t value) {   *regAddress(reg) = value; }

Listing 8

Different Width Registers

Up until this point you could achieve the same upshot in C with judicious employ of macros. Nosotros've not yet written anything groundbreaking. Only if our device has some 8-bit registers and some 32-flake registers we can describe each set in a different enumeration. Let's imaginatively call these Register8 and Register32. Thanks to C++'s strong typing of enums, now we tin can overload the register admission functions, as demonstrated in Listing 9.

// New enums for each register width enum Registers8 {   STATUS = 0x00, // UART condition register   ... and so on ... }; enum Registers32 {   TXBUF = 0x04,  // Transmit buffer     ... and so on ... };  // Ii overloads of regAddress inline volatile uint8_t *regAddress       (Registers8 reg) {   return reinterpret_cast<volatile uint8_t*>         (baseAddress + reg); } inline volatile uint32_t *regAddress       (Registers32 reg) {   return reinterpret_cast<volatile uint32_t*>         (baseAddress + reg); } // Two overloads of regRead  inline uint8_t regRead(Registers8 reg) {   return *regAddress(reg); }  inline uint32_t regRead(Registers32 reg) {   return *regAddress(reg); } ..similarly for regWrite ...

Listing 9

At present things are getting interesting: we still need merely type regRead to access a register, but the compiler will automatically ensure that we go the correct width register access. The only fashion to do this in C is manually, by defining multiple read/write macros and selecting the correct one by hand each time. This overloading shifts the onus of knowing which registers require 8 or 32-chip writes from the programmer using the device to the compiler. A whole form of error silently disappears. Marvellous!

Extending to Multiple Devices

An embedded arrangement is composed of many dissever devices, each performing their allotted job. Perchance you take a UART for command, a network chip for advice, a sound device for aural warnings, and more. We demand to define multiple register sets with different base addresses and associated bitset definitions. Some large devices (similar super I/O fries) consist of several subsystems that work independently of one another; we'd as well similar to keep the register definitions for these parts singled-out.

The classic C technique is to augment each block of register definition names with a logical prefix. For instance, we'd ascertain the UART transmit buffer like this:

#define MYDEVICE_UART_TXBUF        ((volatile uint32_t *)0xffe0004)

C++ provides an ideal replacement mechanism that solves more than only this aesthetic blight. We tin can group register definitions within namespaces. The nest of underscored names is replaced past :: qualifications - a better, syntactic indication of relationship. Because the overload rules honour namespaces, we can never write a register value to the wrong device block: it's a syntactic error. This is a simple fob, but it makes the scheme incredibly usable and powerful.

Proof of Efficiency

Perhaps you think that this is an obviously a practiced solution, or yous're merely presuming that I'chiliad right. Yet, a lot of old-school embedded programmers are not and then easily persuaded. When I introduced this scheme in ane company I met a lot of resistance from C programmers who could merely not believe that the inline functions resulted in code as efficient as the proven macro technique.

Register access method	Results (object file size in bytes)
Register access method	Unoptimised	Optimised
C++ inline function scheme	1087	551
C++ using #defines	604	551
C using #defines	612	588

The only style to persuade them was with hard data - I compiled equivalent code using both techniques for the target platform (gcc targeting a MIPS device). The results are listed in the table below. An inspection of the motorcar code generated for each kind of register access showed that the code was identical. You can't fence with that! It'due south particularly interesting to note that the #define method in C is slightly larger than the C++ equivalent. This is a peculiarity of the gcc toolchain - the assembly listing for the two master functions is identical: the deviation in file size is downward to the glue around the function code.

Namespacing also allows u.s.a. to write more than readable lawmaking with a judicious sprinkling of using declarations within device setup functions. Koenig lookup combats excess verbiage in our code. If we have annals sets in two namespaces DevA and DevB , we needn't quality a regRead call, just the annals proper name. The compiler tin can infer the correct regRead overload in the correct namespace from its parameter blazon. Yous just have to write:

uint32_t value = regRead(DevA::MYREGISTER);      // note: non DevA::regRead(...)

Variable Base Addresses

Not every operating environment is every bit simplistic as we've seen so far. If a virtual retentiveness organisation is in use so yous can't directly access the physical memory mapped locations - they are hidden backside the virtual address space. Fortunately, every Bone provides a mechanism to map known physical retentivity locations into the electric current procedure' virtual address space.

A elementary modification allows usa to conform this retentiveness indirection. We must change the baseAddress variable from a uncomplicated static const pointer to a real variable. The header file defines it every bit extern, and before whatsoever register accesses you must arrange to ascertain and assign it in your lawmaking. The definition of baseAddress will exist necessarily system specific.

Other Usage

Here are a few extra considerations for the utilize of this register admission scheme:

Merely as we use namespaces to split device definitions, it's a good idea to choose header file names that reflect the logical device relationships. Information technology's best to nest the headers in directories respective to the namespace names.
A real bonus of this annals access scheme is that you can hands substitute alternative regRead/regWrite implementations. It'due south easy to extend your code to add register admission logging, for example. I take used this technique to successfully debug hardware problems. Alternatively, y'all can set up a breakpoint on register admission, or introduce a brief delay after each write (this quick change shows whether a device needs a interruption to action each register consignment).
Information technology'due south important to understand that this scheme leads to larger unoptimised builds. Although it's remarkably rare to not optimise your code, without optimisation inline functions are not reduced and your code volition grow.
There are still means to abuse this scheme. Y'all tin pass the wrong bitset to the wrong register, for example. But information technology's an order of magnitude harder to go annihilation wrong.
A pocket-sized sprinkling of template code allows united states to avoid repeated definition of bitRead / bitWrite . This is shown in Listing 11.

Conclusion

OK, this isn't rocket science, and at that place'south no scary template metaprogramming in sight (which, if you lot've seen the average embedded developer, is no bad affair!) But this is a robust technique that exploits a number of C++'s features to provide prophylactic and efficient hardware register access. Not simply is it supremely readable and natural in the C++ idiom, it prevents many common register access bugs and provides extreme flexibility for hardware access tracing and debugging.

I have a number of proto-extensions to this scheme to make it more generic (using a healthy dose of template metaprogramming, amidst other things). I'll gladly share these ideas on request, but would welcome some discussion about this.

Do Overload readers run into any ways that this scheme could be extended to make it simpler and easier to utilize?

^{[ 1 ]} Full general Purpose Input/Output - assignable control lines not specifically designed for a detail data bus.