Basics of Programming: Variables and Assignment

We can now start out programming journey. Let us start with a simple problem from your high school mathematics. Find the roots of the equation $4x^2+2x+1 = 0$. You will recall that this is a quadratic equation (single variable, maximum power is 2) and there is a universal formula to find the roots of a quadratic equation.

(1)
\begin{align} (-b + \sqrt{b^2 - 4ac)}\over(2a) \end{align}

Here we have a = 4, b = 2, c = 1. Here is one version of a program to compute the roots of this equation; we will refine this in the next section.

a = 4;
b = 2;
c = 1;
d = sqrt(b*b - 4 * a * c);
r1 = (-b + d) / (2 * a);
r2 = (-b - d) /(2 * a);
print r1, r2;

While there will be some queries as to the details of how we have written this, you can certainly see the equation being captured here and the familiar process of finding the roots, as if you were solving the maths problem. Before you raise objections, note that we have ignored the possibility that $b^2 < 4ac$ and hence d may be undefined. For now, we will assume the equation we get will always have real roots. We will see how to handle the general case, after we learn conditionals.
This little piece of code is a valid computer program. You can put this in a file, say "root.php", and invoke the PHP interpreter as "php root.php" and you will get ….put roots here….. on your screen as the output.
You can edit the file, change the values assigned to a, b and c, and run the program to solve different equations (provided they have real roots).
This program does illustrate many important aspects of computer programming. Let us discuss them one by one.

A computer program is a 'sequence' of instructions. The keyword here is 'sequence'. If there are two things to do, you need to tell the computer to do them in some specific order. Once you specify the order, (e.g. a =4; b = 2), the system will always follow that order. The instructions are executed from top to bottom in a given sequence. When an instruction is completed, the computer moves to the instruction following it textually. We will see how to manipulate this later on.

Programs make extensive use of variables, as we do when solving maths problems. a,b,c,d,r1 and r2 are variables in the above program. They provide a way for us to name things. If we had used '4' instead of 'a' in the program, modifying the program for another equation would have been difficult. Try this for the equation $4x^2+4x+4 = 0$ and then solve $8x^2+3x+5=0$. In programming, variables are also seen as memory references. This interpretation is needed to understand certain constructs that we will soon see. Thus, when the computer sees 'a' for the first time, it associates with it some space in the memory. The details of this are language dependent and we resist from delving deep here. All languages provide a way to access the value of a variable — using that variable as a reference — and also to update the value of the variable. We saw both of this in the program above.

"print r1,r2" is accessing the current values of the variable r1 and r2, for the purpose of displaying the value to the user on the screen. "a=4" puts the value '4' in the place marked for 'a'. So what happens with
a = 4
a = 2
First instruction puts the value '4' in the place for 'a', and then it goes to execute the second instruction, which puts the value '2' in the same place. Since it is a memory area, the operation succeeds - you can write into a memory area, any number of times. Any further references to the variable 'a' will return 2, and not 4. Thus, every time, you modify the value of a variable, its previous value is lost.

Take a look at a statement one will often see in programs:
a = a + 1
This kind of statement puzzles a novice programmer, because you read the '=' as mathematical equality. It is not! The equality symbol in most languages says 'compute the right side expression, and put the value in the memory area indicated by the left side expression'. Thus, this use of '=' is called the "assignment statement". Let us spend a while on what this means, since this is a critical stage in understanding programming languages.

What is on the right side, when evaluated should give us a value. The right side is generally an expression with this property. For now, we will focus on integer expressions; we will come back to other expressions later in the series. Arithmetic expression is something we are familiar with. So, we can have a+1, a*2, a*b, a+b*c, and so on. You can write as complex an expression as you want. Do check on the operator used, and also the writing style. In common usage, we use ab to mean a multiplied by b. But in the computerese, 'ab' can also be the name of a variable. The compiler will not permit 'a b' either. So, you have to make the operator explicit. '*' is used commonly for multiplication. '**' usually indicates "to the power of". So, x ** 4 means $x^4$. In the expression, you can use direct numbers like 1, 2, 55, etc and variables. We have a lot more to discover about expressions; but that is for another time.

We said right side of '=' will be an expression, what about the left side? The left side must be something that can take a value. For now, that means a variable. Later we will see that some kinds of expressions are also allowed here. Thus, a+1 = 2 is invalid; a = 2-1 is valid.

Coming back to our "a = a + 1", this, therefore, means increment the value of a. How? Compute a+1 with the current value of a, and then modify the value of a with the result.

Thus variables are associated with a memory space in the machine, we can inspect and retrieve the value there by using the variables in any expression; we can modify the value by assigning a new value to it. Note that it is not possible to just remove an existing value; we can only substitute it with some other value.

Recall that a program is a sequence of instructions. We saw the implication of the word 'sequence'; now we look at the word instruction. Every bit of information in a program is interpreted as something that the computer is expected to do. Usually, all constructs in a programming language are interpreted in this procedural sense - what will this construct accomplish. So you can tell the computer that a should always be greater than b. You can tell it that if a < b, then raise an alarm. Ensuring that a > b is the job of the programmer, by checking that this property is satisfied every time you update a or b. You cant tell the computer 'please make sure that 'a > b'. Things like this are called constraints - there are specialised languages which can accept and process such inputs - these languages are beyond our scope here. Thus, for us, what we tell the computer must be something that the computer can implement then and there.

Another aspect we can see is a feel for some elements of the syntax and layout. Usually, we write one instruction or statement per line. When we want to scan the program looking for bugs or specific steps, this will be a useful guideline. It also reduces clutter. Most languages do not care about linebreaks, spaces and indentation. So, these do not usually affect compilation or execution of your program. But keep in mind that, far too often, your programs will be read or used by human users (including you, the original author), for debugging, checking or extensions. So, it is important programs are also easily readable by human users. Imagine your textbook, with the entire book being a single paragraph, with no paragraph breaks, section/chapter breaks, headings, etc. The layout of a book is less for aesthetics, than helping human readers to locate relevant information, provide visual cues regarding the flow and organisation of the book, and so on. Similar logic applies to computer programs too. Write them for humans to read, and computers to execute.

This includes judicious use of white space. Compare '(a+b*(2/(c-d)))' vs '(a + b * (2/( c-d )))' and '( a + b * ( 2 / ( c - d ) ) )'. All these are valid expressions and will produce same results in your computer. But look how easily you can figure out the structure and complexity of the expression from the three ways of writing. When you want to match this expression with something you have in your note or when you suspect there may be something wrong in the expression, this plays a major role. Use at least one space to separate significant components in an instruction. Establish some reasonable convention and follow them consistently. Similarly indentation plays an important role when you have richer program constructs as we'll see shortly.

Given that space and newlines are largely ignored for most languages, how does the system delimit statements/instructions? Part of the answer is a convention that they adopt. ';' is often used as a statement delimiter or separator. Some languages also use newline as a statement separator.

Another component of syntax is the alphabet available to write programs. Most languages restrict the characters you can put in a program to those you see on the keyboard. Thus you cannot enter mathematical symbols like $sqrt$ in the program. So we use words like sqrt to denote them. So, if you want an $\epsilon, \theta$ etc in your program which we tend to use often when writing formulae, we can use words like theta or epsilon instead. This explains why we have written the quadratic roots formula as in the code segment.

That brings us to the last part in layout. We largely used single-letter largely-meaningless names for variables in this program. This, in general, is a bad practice. Since the quadratic equation is usually written with a, b and c as the coefficients, and our code is small and directly about solving this kind of equations, we do not see much of a problem. But consider two code segments below:

Code 1:
basic_pay = 150000;
HRA = basic_pay * 0.3
allowance = basic_pay * 0.1
income = basic_pay + HRA + allowance

Code 2:
a = 150000;
b = a * 0.3;
c = a * 0.1;
d = a + b + c;

The first segment of code is almost self explanatory, given the names of the variables used. Remember that to the computer, both the codes are identical. It does not matter what you name the variables as. But to a human reader, the second piece will be hard to understand. If you ask someone to change the HRA to 15% from 30%, it is easy in the first code, and very difficult in the second. So, use meaningful names for your variables in the program, though it may take a little longer for you to type them in. The time that you save later in the programming stage, will be much more than this time you lose.

A word about input and output, and then we end our discussion on this little program. Note the construct "print r1, r2". This is the construct we will use for now to get some values on the screen. You can use this anywhere in the program, not just at the end. So, you can check that the discriminant is computed correctly, by printing its value, soon after it is computed. We will not worry about making the output more beautiful, etc here - we will come back to this later. So, when you want to print out the value of a variable, use print statement, as shown in the code. Similarly, if we want to ask the user for something, while the program is running, we use "read x". After executing this statement, we would expect variable x to have the value that the user entered. A lot of options and richer constructs are available for reading in data in most languages, but we will look at them later in the series. So, for example, we can write the code for computing income as follows: