Part 3: Data types

Data type

In this section, we look at the notion of data types in some detail. This is an extremely important concept from many angles. At the basic level, this deals with the various basic data types such as integers, real numbers, etc. At the other end, the same notion takes us to classes and objects – the core of object oriented programming.

In any problem solving, we need to deal with a number of entities. These span a wide range in terms of their complexity. Let us start with the simplest, the numbers. This is something that you associate easily with computers – manipulating numbers. We deal with very many numeric parameters: age, salary, area, height, diameter, interest-rate, various ratios, etc.

Firstly, what do we understand when we say something is a number? This about this for a minute before you read further. When something is numeric, we expect it to have certain properties and we expect certain operation on them to work. To make this clear, let us take a special case of numbers – positive real numbers, ie, numbers from 0 to infinity. Price of vegetable, quantity of oil, monthly salary, tax payable, etc are examples of entities which are of this nature. So when we say age is such a real number, it means age can take values such as 19.4, 20.8, 100.3 and even 14036.5, but not ‘apple’, ‘c’, 14 + 3i, etc. So knowing the type of data gives us some idea of the values it can take. We can subtract A’s age from B’s age – to find how older A is, say —, we can find average age, and so on. In other words, operations such as +, -, *, /, etc are normally valid on such numeric data. Depending on the specific items, certain operations from this may not make sense, but they are allowed. We do not however, normally, see them as a pattern and do not normally ask if there are two consecutive ‘1’ in the age or concatenate two ages, etc. The latter operations, however, are perfectly natural for text type data; such as names, address, title, etc. But, we do not subtract one name from another or divide an address by a name.

Thus data types tell us something about the nature of values such variables can take and the kind of operations we can perform on them. More than, the type telling us such things, we are going to use the type to tell the computer that ‘age’ is a number. This will help the computer alert us if we, by mistake, do something odd like concatenation with such data. Program languages which are strongly type are strongly typed are very particular about such rules. In PHP, we do not tell the computer the data type explicitly. But when a variable has a numeric value, PHP will ensure that only appropriate operations are permitted.

Almost all languages provide some set of data types including

  • Integers : only whole numbers allowed. 1, 10, 100, but not 10.3, 14.6, etc. 100/3 = 33 and not 33.32….
  • Real numbers: usually called floating point numbers. This is a superset of integers, and hence mixing integers in a real number manipulation is permitted. 10.3 + 4 = 14.3;
  • Characters: single alphabet / digit / punctuation/ etc. c = ‘a’, c = ‘,’, c = ‘9’, etc. There is very little you can do with characters. Remember ‘9’ as a character is different from 9 as a number. ‘9’ + 3 is invalid, but 9 + 3 is 12 !
  • Strings: A sequence of characters. Very important type for handling any text data like documents, names, messages, etc.
  • Boolean: Very useful; but not an explicit type in many languages. ‘C’ for example, uses an integer to represent boolean. Boolean variables can have only two values: true and false. The way these are represented varies from language to language. ‘C’ taken as false and non-zero as true. But use symbolic names as far as possible to make the code readable. Thus for boolean ! true = false, irrespective of how true and false are represented.

In most languages, these types are divided further for storage optimisation, etc. Thus you may have short integer and long integer, for example.

Be careful to use the most appropriate data type for each variable. It will save you a lot of pain when you debug the program or modify the code.

In addition to such basic types, today’s languages give you richer complex types also. For example, most programming languages provide you mechanism to handle a collection of homogenous items. Dealing with the exam marks of students in a class, name of various books in the library, date of births of your friends, etc. You don’t need to name them as mark1, mark2, mark3, etc. You can define mark as an ‘array’ – meaning a sequence. The ith element of the array is the mark of the ith student. Primitives are available to define arrays (usually of a given length), assign a value to the ith element, and access the value of the ith element. Usually the array elements are stored consecutively with uniform storage per element, enabling the access to the ith element without having to wade through all i-1 elements. The ith element will be at start–address of array + D*(i-1) where D is the space required for an array. Thus if each element of the array (eg. marks of students) takes 4 bytes, the first element will be at start-address itself (i.e., start–address + 0 * 4), the 10th element will be at 36 bytes away from start–address, and so on. This efficient access through the series makes array a very powerful and widely used data type.

Some languages make the array more flexible or offer more flexible types of sequence structure. For example, in PHP you can pick any arbitrary sub-sequence from the array by notation like mark [s : d]. This give you another array containing marks from 5th to dth element in the original array. Note that these operations are usually not very efficient, though convenient to program.

Apart from aggregate data types like arrays, we can have other complex data types. A complex number has a real part and an imaginary part. A student is not just a mark, but has name, age, address, etc. An address itself has parts such as city, state, pin code, etc. Many entities we deal with has a collection of properties. Won’t it be nice to define a student as an integrated structure with all these, rather than defining name, age, etc as separate variables? Even what if we had to deal with a set of students – we will need a name array, age array, etc.

The notion of ‘class’ and ‘object’ in object oriented languages provide support for this. They allow you to define complex entities by grouping together related properties. So, we can define a student class as

class student {
string name,
integer age,
float marks;
}

Now when you define a variable v of type student, exactly as you defined ‘i’ as an integer variable, v will have 3 properties. So you can get / modify his name by accessing v. name.

v. age = 20

modifies his age.

You can have a sequence of students by defining an array of students (exactly like an array of marks). In early languages like C, this formed the idea of ‘records’. Current object oriented languages extend this idea further to the notion of ‘class’. What is the difference?

Remember, we talked of data types as defining possible values variables can take and possible operations. With complex data types, we can define richer operations, and thus offer higher level primitives. Further note that when we talk of ‘a = a + 5;’ we don’t worry about how a and 5 are represented or how ‘+’ works. This provides a black box abstraction. One can use all these and have much richer collection of data types.

For example, our ‘student’ type variables can have operations like ‘promote’ ‘add marks’, ‘print’, ‘passed’, etc. s.promote ( ) can set the students status to pass, or increment his current class. Without worrying about show status and class are stored, applications can just call ‘promote ( )’ – quite the way we used ‘+’. Similarly, s.print ( ) could produce a visual display suitably, say, as “Ramkumar (44)” using the name and age fields. This is also like getting visual values for numbers, which is very different from its internal representation – as a binary digit sequence.

This abstraction provides another powerful strength. One could implement the class ‘student’ and operations like promote and print in any way they like and the code using them need not know. Again, going back to our numbers, our programs continue to say a = a + 1, irrespective of how numbers are stored or addition is implemented. This is the key idea of object oriented programming.

Thus the notion of data type and classes is a key concept in programming – offering tremendous power, and enabling to write complex programs through a powerful series of abstractions. Note that just as we used strings and numbers to define student, we can build higher abstraction using these as base types. So you can have a ‘project’ class, which has an array of students and a guide

class project {
person guide;
student members [4],
string title;
}

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License