Variable Data Types for Arduino/C++

Data Types in Arduino/C++

You’ll find yourself likely using the same three or four data types in most of your programs but it’s important to at least be aware of the rest. Sometimes choosing a variable is specific to the data that it holds, other times it can be specific to the size of the data it holds and then there are other times when certain types are used simply because they are structured in a way that either optimizes your code or just makes it look a lot cleaner. I will try to make note of those “convenience” variables as it can be helpful in understanding which data type you should choose.

Here is a breakdown of the Arduino variable types and their sizes up front but we’ll discuss the details following

Type Size Range
boolean 1 byte 0 to 1
char 1 byte -128 to 127
unsigned char 1 byte 0 to 255
int 2 bytes -32,768 to 32,767
unsigned int 2 bytes 0 to 65,535
word 2 bytes 0 to 65,535
long 4 bytes -2,147,483,648 to 2,147483,647
unsigned long 4 bytes 0 to 4,294,967,295
float 4 bytes 3.4028235E-38 to 3.4028235E+38
double 4 bytes 3.4028235E-38 to 3.4028235E+38
string 1 byte + # of chars N/A
array 1 byte + (sizeOfType * # of elements) N/A
enum N/A N/A
struct N/A N/A
pointer N/A N/A
void N/A N/A

 

Numeric

Numeric type variables are specific in that they can only hold numbers… go figure. The thing to know here is that there are also two types of numbers, integers and floating point numbers. Integers are those whole numbers, they do not have a decimal while floating point have a decimal (the floating point). When choosing a numeric variable be sure to keep in mind that mathematical operations between two integers and/or being stored in an integer type are rounded “toward zero”. Note that when I say “toward zero” all that means is that basically the decimal portion of the resulting value will be lopped off so while 2.4 or 2.6 will be stored as 2 for positives, -2.4 or -2.6 will become -2 for negative integers.

Going forward we are going to be talking about bytes and bits so if you get confused, take a look at the article about Bytes, bits and nibbles an hopefully that will clear some things up for you. As a quick refresher, a byte is representative of a single character and a byte is made up of bits, more specifically a byte is made up of the specific number of bits required to encode a single character in memory. These days, a byte is always 8 bits and bits have two possible values (0 or 1) so a byte can hold a range of values from 0 to (2^8)-1 or from 0 to 255.

Integer

One thing to note here is that a few types have an unsigned option. As previously mentioned, a byte can hold a range of values from 0 to 255 but by default this value will be given a sign, think positive or negative. That sign is stored in one of the 8 bits so you are left with 2^7 possible values which leaves a range of 0 to 127 but with the sign you’ve also added the possibility of a range from 0 to -128. The reason why we get -128 to 127 rather than -127 to 127 is a topic for another day but essentially it comes down to this, you wouldn’t want +0 and -0 as options as those are both equal. With the algorithm used to manage signed numbers (two’s complement), that -0 essentially becomes the one extra space you see in the negative range for signed numbers. A quick way to remember the range might be… when you are calculating the size of a data type by counting the possible bit positions you are referring to the unsigned value (only positive values) and if you want the range of bits for the signed value divide that number by two (and round of course) and add the option for negative values. Therefore, an unsigned byte has a range from 0 to 255 but a signed byte has a range from -128 to 127 because (255 / 2 = 127.5) and as we just discussed, the negative range just has one extra value.

For integer values we have the following types with respect to their size. I will only comment on the types with notable features to help clear up what I feel might be confusing.

  • boolean

The intent of boolean is to hold true or false. In reality, boolean is simply a single byte value and true/false are constants that correlate to 1 and 0 respectively. In truth, any variable can be evaluated as true or false, false is represented by null or 0 and true is anything else. The idea behind boolean is that it provides a clear intent of use for the data. The problem here lies in how the value is evaluated for example, if you were to store a value greater than 1 in a boolean and then print the boolean it will print as 1 because 1 represents true. Simply put, just use boolean for its intended use, true and false values.

  • char / unsigned char / byte

Despite the name of the variable type ‘char’ which is short for character, char can be used to store numbers. The simplest way to remember this is that non-numeric variables are alphanumeric which means they can hold characters and numbers and therefore they can also hold only numbers if you want.

byte is specific to Arduino and it represents an unsigned char which, if you understood what I wrote above, can only store positive values.

  • int / unsigned int
  • long / unsigned long

Floating Point

Floating point values can be stored in the following types.

  • float
  • double

Many people get confused by this data type and assume it holds a range twice as large as a float. This is likely because either they either know C++ or someone who knows C++ has told them this. The problem is that this is not always the case, in Arduinos with the ATMega based chip float and double are the same size (4 bytes) so there is no difference between the two. Technically this is because the definition of a double is that it has precision equal to but no less than that of a float. Other Arduino-like boards with different chips and other Arduino boards with different chips (like the Due) will support larger doubles, typically 8 bytes.

Non-numeric (alphanumeric)

There is much less of an option when looking at data types that can hold non-numeric data. For these you are looking at the following

  • char

Char is unique in that it only holds one character. It can also hold numeric values but there is a difference in how each value should be assigned.

When storing numbers in a char type you would use char numericChar = 51;

When storing characters (those which you would find on a keyboard) you would use char stringChar = ‘3’

Coincidentally, if you refer to the ASCII table you’ll notice that the decimal representation for the number 3 is 51 so if you print numericChar and stringChar they will both print as a 3. Keep this in mind when printing as it may cause some confusion.

  • string

Strings can be a source of confusion for beginners and this is mainly because string is simply a representation of an array of characters. If you’ll notice in the table above, both string and array have “1 + …” and that is because they are both arrays. Arrays must have a terminating null character to identify the end so an array (or string) will always have one extra byte.

  • word

Word is a bit confusing, it’s basically the same as an unsigned int and therefore it can hold two characters because it is two bytes. I personally have not yet found a use for word and the usage for it in an alphanumeric sense can be a bit confusing. It is my understanding that word is native to windows and the size corresponds to the number of bytes the processor can handle. It would be great to hear of someone’s experience with the word data type in the comments.

Pointers, Void, Arrays and more

I’m not going to delve too much into detail here as most of these types require their own post entirely.

  • void

Void is not really a variable type, it’s used nearly exclusively for indicating that a function will have no return value. You cannot create a variable of type void.

  • array

A data type that needs its own post, an array is a collection of data types, any variable type and even objects can be stored as an array. An array’s data is stored in memory sequentially

  • pointer

Pointers are something I do not want to begin to get into without a dedicated post. In the simplest way I can put this, a pointer holds the address location of a variable, therefore it holds a “pointer” to the location. The main reason this is helpful is for objects and arrays, think of an array (a collection of values) the pointer will tell you where in memory the first value is, then you simply increment to the next location (increment size depending on the data type size) to retrieve the next value in the array. A string is technically a pointer to an array of char values.

  • enum

Enums are kind of like arrays of byte sized constants. Enum is one of those data types that is there to help make your code cleaner, it allows you to take a list of options and store those options in a collection. For example, say you had a few options for colors, RED, GREEN and BLUE, rather than create three bytes where each color represents a different value you can store them in an enum called COLOR and now reference them as COLOR.RED, COLOR.GREEN or COLOR.BLUE and it makes it clearly apparent that these are colors and it can also force specific values as well.

  • struct

Struct is short for structure and is another one of those things that can introduce confusion to beginners. The simplest way I can think to explain structs are to compare them to arrays. An array can hold multiple values of the same type but a struct can hold multiple values of varying types. An example to further solidify this idea is, say you have a robot with a set of configurations, some might be char, others int, etc but you can put all of these configurations into one nice collection by creating a struct.

Choosing a Data Type

Choosing a data type is generally not all that important for the every day programmer but becomes very important when size or speed is of concern. In general, try to pick the smallest data type that will confidently contain the value you need to store.

For example, if you want to store a pin number, since very few boards have more than 128 pins you’d likely store this value in a byte or char type. Another example might be that if you have a board with more than 128 pins but less than 255 you could confidently use a byte or unsigned char (remember, byte range is equal to unsigned char). There are some obvious speed enhancements by using smaller data types since they require less memory reads but there are also other speed enhancements that can negate this fact.

In the end, my suggestion is simply to try to always use the smallest data type necessary to confidently store the value. Worry about speed and storage size later when you realize it’s actually a problem.

Caveats

  • Overflow

So, the compiler will not yell at you if you try to store a value out of the bounds of the variable’s range. On the other hand, say you are adding to a number and you reach the end of the range. What will happen instead is dependent on the compiler but typically, the number will roll over or overflow. In the case of an unsigned char, 255+1=0 just as (again, typically) if you assign 256 to an unsigned char you will also get 0.

  • Storage

Most Arduinos use the same ATMega processor which doesn’t yield a whole lot of space for storing your program. It is usually enough for most cases but sometimes, between libraries and your own code, you reach the limit. At this point you would need to start evaluating variable sizes but generally even that won’t be enough unless you have large arrays. Think about the fact that changing an int to a char only saves you one byte so the one-offs aren’t going to really help. If you have an array of 100 ints that can be changed to bytes then you can save 100 bytes, still not overly significant but you get the idea hopefully.

  • Processing time read/write

Most Arduinos use an 8 bit microprocessor so they can only handle 8 bits of data at a time. That means to read and write data, while it is not be an exact correlation, you can figure it will take twice as long to read or write two bytes as it would to read or write one. Again, I wouldn’t bother worry too much about this unless I knew it was a problem, in general you can use integers in Arduino with very little impact to performance vs using char or byte. Floating point variables do however impact performance quite drastically so if you were to worry about one data type over another just be sure not to use floating point if you don’t need that sort of precision.

  • processing time for mathematical calculations

Performing calculations on large data types requires a lot of overhead compared to smaller types. In fact, the ATMega chip used on most Arduinos doesn’t even have a floating point unit which means that in order to handle floating point numbers, there is extra software handling that processing.

  • Since Arduino is based on C/C++ you can actually still use core C/C++ data types, many of which are not discussed here.

Leave a Reply

Your email address will not be published. Required fields are marked *