Data and Data Types
Computers store all data as binary numbers.
Binary is a number system that only has 2 possible digits, 0 and 1 (as
opposed to our decimal number system that has 10 digits, 0 through
9). Every binary digit is called a bit, and every set of 8 bits is
called a byte. There are
standard ways that computers can use these sets of digits to represent
different kinds of data, which will be discussed below.
In almost any program, you need to be able to store
data, so you can keep a hold of pieces of information and be able to do
things with them at different points in the program. A variable
(- named memory location) allows us to store one piece of data.
Every variable has a data type
(- range of values and associated operations), which mandates what
kinds of things we can store in the variable, and what we can do with
it.
There are two kinds of data types in Java, primitive data types (- basic types
of data that a computer can store), and objects (- composite data types
consisting of one or more primitive data types, and including a set of
operations that can be performed on them, operations which are defined
in Java itself). The names of the primitive data types are all
lower case. There are 4 primitive data types that we will
use in Java, and they are int,
double, boolean, and char.
An int
variable can store an integer
(- positive or negative whole number). They
are stored in the computer as 4 bytes, which is 32 bits. The
range of values an int variable can
store go from -2^31 to 2^31 - 1. You can do arithmetic operations
(addition, subtraction, multiplication, division, and modulus division)
and numerical comparisons (equality or lack thereof, less than, less
than or equal to, greater than, greater than or equal to) with
them. Divison with integers can only have an integer as a result,
so if there would be a decimal portion to the answer, it is truncated
(thrown away), not rounded.
A double
variable can store a floating point
number (- number with
a fractional portion). In the computer they're stored as 8 bytes
in
scientific notation, so they're more limited in their precision (how
many digits they can store accurately) than their size. You can
do the same operations with them as integers, except for modulus
division. Because of the way doubles are stored, equality
comparisons may not work as expected, and should be avoided.
A boolean
variable can store one of only two values, true and false. You can only do
equality comparisons with these (though you usually don't) and logic
operations (AND, OR, and NOT).
A char
variable can store one character (such as a
symbol you could type with the keyboard). They are stored as one
byte (usually) and represented by the ASCII encoding (or sometimes
Unicode for international character sets). You can do comparisons
with these.
To use a variable in a program, you have to ask the
computer to set aside a piece of memory for you, and give it a name to
attach to that memory location. The way you do this is by declaring a variable. A
variable declaration looks
like this:
datatype name;
where the data type is given first, and the name (sometimes
called an identifier) of the variable is given second. It's
possible to declare more than one variable of the same type at a time:
datatype name1, name2, name3;
After a variable has been declared, it doesn't have
a value yet. Sometimes it is necessary to start a variable off
with a certain value. This is called initializing a variable. A
variable declaration and initialization
can both be done at the same time. This will declare an integer
and initialize it to zero:
int foo = 0;
Any time later in a program that you want to change
the value of a variable, you have to do an assignment. An assignment uses
the equal sign (=) with something on each side of it. On the left
side is the variable that you are assigning to (changing).
Whatever is on the right side of the equal sign is evaluated (which may
require some operations to be performed), and whatever value the right
side evaluates to is stored in the variable on the left side. As
a
quick example, you could declare an integer, and then assign it the
value of zero in two steps:
int foo;
foo = 0;
A piece of data whose value is written directly in
the code of a program, rather than stored in a variable, is called a literal. Just like varibles,
they have data types, and you can do most of the same things with
literals as you can with variables. In the assignment foo = 0, the number 0 is an
integer literal.
The word variable comes from the fact that the value
that one stores can be changed multiple times during the execution of a
program. There are some types of data that cannot be changed, and
these are called constants.
Literals are one type of constant, as their value is a part of the
code. The other type of constant is created the same way that a
variable is, but the word final
appears at the beginning of its declaration. These are called named constants. They can be
assigned a value once, but that value cannot be changed
afterward. The names of named constants are usually all capital
letters. Two legal ways to give a named constant a value are:
final double PI = 3.14;
and:
final double PI;
PI = 3.14;
Operators and Operations
Operations are performed using operators. Most operators are
binary opeators, which means the operation is performed using the two
pieces of data that are on either side of it. There are a couple
unary operators, where the operation is performed using data only on
one side of it. There are arithmetic
operators (that perform arithmetic), equality operators (that determine
whether or not two things are equal), relational
operators (that do numerical comparisons), and logical operators (that perform
logic operations :o).
Arithmetic operators are used with the numeric data
types int and double. For addition, subtraction, multiplication,
division, and modulus division, they are +, -, *, /, and %. The
first 4 are obvious, the last one (which only works with ints) gives
the remainder of the division as the result. Some examples:
The expression 1 + 1
evaluates to 2
The expression 3 - 1
evaluates to 2
The expression 1.0 + 1.0
evaluates to 2.0
The expression 3 * 4
evaluates to 12
The expression 4 / 2
evaluates to 2
The expression 5 / 2
evaluates to 2
The expression 5.0 / 2.0
evaluates to 2.5
The expression 5 % 2
evaluates to 1
The numbers 1, 2, 3, 4, and 5 above are integer
literals. The numbers 1.0, 2.0, and 5.0 are double
literals. The data type of the result of an operation is always
the same as the data type of the pieces of data the operation was done
with (which is why the / operator is sometimes integer division, and
sometimes regular division). Two more examples:
The expression 1 + 1.0
evaluates to 2.0
The expression 5.0 / 2
evaluates to 2.5
In each of those examples, there was one int and one
double. They need to be the same, so Java has to automatically
convert one type so that it matches the other. It will convert
the int to a double, because doing so would not lose information
(whereas converting from a double to an int could lose information; the
decimal portion), and the result of the operation will be a
double. This automatic conversion is called type promotion.
If we want to do a temporary data type conversion
manually, we can do what's called a type
cast. We do this by putting the name of data type we want
to convert (or cast) something to in front of it, in parentheses.
One common use of it is to divide two ints and get a double. Some
code to demonstrate this:
int three = 3;
int two = 2;
double answer;
answer = (double) three / two;
In the last line, the integer three is temporarily
converted to a double so that the division is done with doubles and
results in a double. The value of answer will be 1.5.
One special type of assignment is an update (- assigning a value to a
variable that's based on its current value). Usually when you
update a variable you are either adding something to it, subtracting
something from it, multiplying it by something, or dividing it by
something. Here are some examples:
int number = 5;
int divisor = 2;
number = number + 1;
number = number / divisor;
The last two lines there are updates. In the
first one, the current value of number is 5. This is an
assignment, so the right side of the equal sign is evaluated
first. number (5) plus 1 is 6, so the value of the right side is
6. That is then assigned to number, so that the value of number
becomes 6 (so the number one was added to it). In the second one,
number (6) and divisor (2) are divided, resulting in 3, which is
assigned to number, so the value of number at the end of all of that is
3. For those kinds of updates, there are some special operators
that can save you some typing. They are +=, -=, *=, /=, and %=. The last two update
examples would look like this:
number += 1;
number /= divisor;
For adding and subtracting one from a number, there
are two more special operators, the increment operator (++) and the decrement operator (--). Using that, we could
add one to the variable number by simply doing this:
number++;
If you have a variable in a program that is only
changed by incrementing or decrementing it, it is a special kind of
variable we call a counter.
Non-numeric Data and Console Input/Output
Non-numeric data is stored using the char data type. An example of
this type of data is all of the text that is input from the keyboard or
output to the screen. A char variable can only hold one
character. Rather than forcing you to read input or print output
one character at a time and store every character in a seperate
variable, Java provides the String
data type,
which is a type of object, not a primitive data type. Usually
object data types start with a capital letter (we saw that the
primitive data types did not). A string
is a collection of characters, and there are some operations that can
be performed on it, some of which we will see later. Though these
don't include arithmetic operations or numerical comparisons, the +
operator is defined, but it's not an arithmetic addition. The
String data type is the only object type for which operators are defined; all
other operations for objects are done a different way.
String fname, lname;
System.out.print("What is your
first name? ");
fname = Stdin.readLine();
System.out.print("What is your
last name? ");
lname = Stdin.readLine();
System.out.println("Hello " +
fname + " " + lname + ".");
The + operator for the String data type is called
the concatenation operator, it
concatenates (joins together)
the two strings on either side of it. The output of the above
code would be:
What is your first name? John
What is your last name? Doe
Hello John Doe.
Another example:
int num1, num2, sum;
System.out.print("What is the
first number? ");
num1 = Stdin.readInt();
System.out.print("What is the
second number? ");
num2 = Stdin.readInt();
sum = num1 + num2;
System.out.println("The sum is "
+ sum);
The expression:
"The sum is " + sum
that we see uses the concatenation operator. The variable sum has
a data type of int, but as we saw with doubles and ints, there is a
type promotion that occurs so that the operator is adding
(concatenating in this case) two things of the same type. In this
case the int variable sum is promoted to a String. The resulting
expression, if the sum was 30, would have this value, "The sum is
30". Data of any data type in Java can (and will if added to a
String) be promoted to a String.
In the statement:
System.out.print("What is your first
name? ");
we see a method call. A method (the name Java uses for subroutine) is a named block of code
(like the main method). We can execute a method by calling it,
which you
do by giving the method's name in the code. The name of the
method we are using here is System.out.print. All
method calls will have parentheses () after them, which may or may not
have anything in them. If anything goes in them, it is called an argument (also referred to as
a
parameter), which is a piece of
data
passed to a method. If a method has more than one argument, the
arguments will be separated by commas. In this particular method
call, there is only one argument, a string literal. String
literals will always be surrounded with quotes. That string is
shown to user before they are asked for input, explaining to them what
they
need to be entering. This type of string is called a prompt.
In this statement:
fname = Stdin.readLine();
the method named Stdin.readLine is being called. When method
calls need to be evaluated (such as when
they are used on the right side of an assignment statement like we have
here), they evaluate to that method's return
value, which is a piece of data passed from a method back to
whoever called it. Stdin.readLine returns a String,
so to be able to store that string in our program, we have to assign it
to a String variable. Not all methods have return values though;
the ones that do are called functions.
The ones that don't are called procedures.
A procedure call (method call where the method is a procedure) cannot
be evaluated, and therefore cannot be used on the right side of an
assignment statement. System.out.print is an example of a
procedure.
The input we get in the second example is supposed
to be a
number, and we need to be able to add the numbers together and get a
sum. We cannot do that with strings, we need to have the number
the user enters as an int. We saw another method call:
num1 = Stdin.readInt();
where Stdin.readInt is a function that will read the string typed by
the user, attempt interpret it as a number, and return that number
as an int if it was successful. Reading a string and trying to
interpret it as some formatted data is called parsing.
Code Structure and Flow Control
Programs consist of instructions telling the
computer what to do and when. Each individual instruction is
called a statement. Most
regular statements (such as the declarations and assignments we've seen
so far) end with a semicolon. The statements in a program are
executed in the order they are written. The way we have the
computer make decisions and do different things depending on certain
conditions is with flow-control
statements which help the computer to decide when and when not
to execute certain sections of the code. Flow-control statements
include if statements and
loops
(sections of code that execute repeatedly). Flow-control
statements are also compound
statements because even though each flow-control statement
counts as one statement, they also include one or more regular
statements within them.
Both if
statements and loops include a condition
that determines what
happens. The code within an if
statement executes (once) if the
condition is true, and the code within a loop keeps executing
repeatedly as long
as the condition remains true after each repetition. A condition
is a boolean expression,
because just
like a boolean variable, it can only be either true or false. A
boolean variable itself could be a boolean expression, but we can also
use the equality and relational operators to create expressions that
evaluate to true or false. The equality and relational operators
are binary operators, and they are ==, !=, >, >=, <, and <= for "is equal to," "is
not equal to," (equality operators) "is greater than," "is greater than
or equal to," "is less than," and "is less than or equal to"
(relational operators) respectively.
In
Java, every source file defines a class,
which has braces {} around it. Every Java application has a class
that has a main method, which
is what is executed when the program is run. The main method also
has braces around it. Any section of code in a program that has
braces around it is called a block.
Some blocks may not have braces around them if they are only one
statement and are associated with a flow-control statement. Any
block of code can have another
block within it, this is called being nested.
The main method is nested within the class, and the blocks of code
associated with flow-control statements can be nested inside of
that. Those blocks can have yet more blocks of code nested inside
of them. Two of these situations that we will see later have
their own names: nested if
statements, and nested loops.
public class Foo {
public static void main(String[] args){
int number = 1;
if (number == 1)
System.out.print("The number is one");
else {
System.out.print("The number was not one, but will
be");
number = 1; }}}
In the above code, either the number is one
or it isn't (in this
case it is), so the expression number
== 1 is either true or
false. If
the expression evaluates to true, the if block is executed.
Since
the
if block only consists of
one statement, it is not required to have
braces around it. With this code, "The number is one" will be
printed to the console. If the expression number == 1 had been
false, the else block
would have been executed, printing "The number
was not one, but will be" to the console and assigning 1 to the
variable number. In
if statements, the else part is optional.
If we have more than two blocks of code, only one of
which we want to execute depending on certain conditions, we can do
something like this:
int number = -5;
if (number < 0)
System.out.print("Negative");
else if (number > 0)
System.out.print("Positive");
else
System.out.print("Zero");
Depending on the value of the variable number, one
of three messages will be printed: "Negative," "Positive," or
"Zero." This type of flow-control statement is called an else-if ladder. Depending on
how many possibilities there are, you can have as many else if blocks
as you need.
If whether we want to execute a certain block of
code is dependent on more than one condition, we can join the
individual boolean expressions (conditions) using the logical
opeators. They are &&, ||, and ! for AND, OR, and NOT
respectively. Joining two expressions with an && requires
both to be true for the overall expression to be true. Joining
two expressions with an || only requires one of the two individual
expressions to be true for the overall expression to be true. The
following expression:
number > 50 || number < 0
is true if the value of the number variable is greater than 50, or if
it's less than zero. If it's greater than 50, we know immediately
that the whole expression will be true. The computer knows this
also, and if the first expression is true, it won't even bother
evaluating the second one. This is called short-circuit evaluation.
Another example:
number <= 50 && number
>= 0
if the number is 100, the first expression is false, so the whole
expression will be false and the number >= 0 expression also won't
get evaluated. So, the order individual expressions are written
when they are joined with logical operators is important.
Sometimes you'll have one expression that could be dangerous to
evaluate (could cause a program to crash in certain situations), so
putting it second you could prevent it from being evaluated when it
shouldn't be.
Finally, putting a ! in front of a boolean
expression reverses the value of it (true becomes false or vice versa).
We've seen that an else-if ladder allows us to have
a set of blocks of code for which only one will be executed.
Sometimes it's not that simple though. Here is one example:
double slope;
int x1, x2, y1, y2, dy, dx;
System.out.print("x1? ");
x1 = Stdin.readInt();
System.out.print("y1?
");
y1 = Stdin.readInt();
System.out.print("x2? ");
x2 = Stdin.readInt();
System.out.print("y2?
");
y2 = Stdin.readInt();
if (x2 - x1 != 0){
dy = y2 - y1;
dx = x2 - x1;
slope = (double) dy / dx;
if (slope > 0)
System.out.println("Uphill line");
else if (slope < 0)
System.out.println("Downhill line");
else
System.out.println("Horizontal line");
System.out.println("Slope = " + slope); }
else
System.out.println("Vertical line\nSlope =
infinite");
As far as what
description of the line we wanted to print, there were 4
possibilities. For 3 of those possibilities, we wanted to
calculate the slope, but for the fourth we didn't, as that would result
in dividing by zero, which would cause a run-time error (- error which will
crash a running program). If we just made
one big else-if ladder with 4 possibilities, the slope calculation
would have to be repeated in 3 of the blocks of code. As we try
to avoid duplication of code whenever possible, the nested if was the
right structure to use here.
The block of code associated with a loop is called
the body of the loop. In
the following code, the body of the while loop contains two
statements:
int number = 0;
while (number < 3) {
System.out.print("Not done yet");
number++; }
The above loop will print "Not done yet" to the
console 3 times. Since we know how many times the loop will
execute (or at least know that would could figure it out if we knew the
value of the variable number, which may not always be the case), that
loop is a count-controlled loop.
With count-controlled loops, they often have a counter, which makes
sure the loop runs the appropriate number of times, and they are
written using the for
loop structure instead of the while
loop
structure. The three things that go in the parentheses for a for
loop generally deal directly with that counter; initializing it (if
necessary), having a loop condition based on the counter, and
updating the counter (usually incrementing or decrementing it).
We call the three things the initialization
statement, the loop condition,
and the update statement to
reflect their purpose. Re-writing the last
example as a for loop, it
would look like this:
for (int number = 0; number <
3; number++)
System.out.print("Not Done yet");
Now the loop body only has one statement. That
example is the most basic example of a count-controlled loop, where a
counter is initialized at zero (it is also declared in the same place,
which is OK as long as the counter isn't needed outside of the loop),
the loop condition is if the number is less than 3 (which will make the
loop run 3 times), and the counter is incremented. One run
through a loop is called an iteration.
The initialization statementruns before the loop executes, the loop
condition is checked before each iteration, and the increment statement
runs at the end of each iteration.
Two more important flow-control statements dealing
with loops are break and continue. Putting just the
word break as a statement inside of a loop will make the loop terminate
immediately. The continue statement ends the current iteration of
the loop, and skips to the next iteration (checking the loop condition
to see if there will be another iteration). When used inside a
for loop, the update
statement will be executed at the end of any
iteration, even one that is ended by a continue statement.
Here is an example of nested loops:
for (int i = 1; i < 4; i++){
for (int j =
1; j <= i; j++)
System.out.print(j);
System.out.println(); }
where the for loop
with j as its counter
(called the inner
loop) is nested inside of the for
loop with i as its
counter
(called the outer loop), and the resulting output is:
1
12
123
The inner loop prints every number in the range from
1 to the value of i, and the outer loop runs the inner loop with the
values of i going from 1 to 3, printing a newline at the end of each
iteration.
So, for
loops are used for count-conrolled
loops. The other type of loop (where we use the while loop
structure) is called a conditional
loop. In those loops, the loop condition isn't based on
some sort of counter, because we don't know how many times the loop
will execute. Termination of the loop is based on some condition
changing at runtime, and we can't predict when that change will
occur. Often the condition will be based on user input.
Here is an example program which demonstrates a lot of important things:
public class Foo {
public static
void main(String[] args) {
int number, sum=0;
System.out.print("Enter a number (0
to exit): ");
number = Stdin.readInt();
while (number != 0) {
sum += number;
str = System.out.print("Enter a
number (0 to exit): ");
number = Stdin.readInt(); }
System.out.println("The sum is " +
sum);
System.exit(0); }}
First, the overall program demonstrates a
conditional loop. You can't predict how many times it will take
the user to enter zero. That's why a while loop is used.
Next, you see that inside the loop, the value of number is changed (by
an assignment based on the value of str, which also changes, as
it's
based on user input) during every iteration through the loop.
This was also true in the other loop examples. This is critical,
because in each case, the loop condition was based on the value of
number. If that
value doesn't change inside the loop, and the
loop condition is true once, it will be true forever. The loop
will never stop executing, and will then be an infinite loop. A couple
example infinite loops:
while (number < 100)
sum += number;
and:
while (number < 100);
{
sum += number;
number++; }
The first example is an infinite loop (as long as
number is less than 100 to begin with) because the value of number
doesn't change inside the loop. The second loop is infinite,
because right after the parentheses for the while condition, there is a
semicolon. A semicolon by itself is an empty statement, which explicitly
tells the computer to do nothing. Therefore, since that second
loop has
an empty body, it will do nothing, forever. The block of code
under it is just that, a block of code. It is not associated with
the loop. This is a common mistake that can also affect for loops
and if statements.
Examples:
for (int i = 0; i < 10; i++);
System.out.print(i);
where the loop isn't infinite, but it does nothing, 10 times.
Also, it will be a syntax error
(- error caused by violating the rules of the programming language),
because i is declared as
part of that loop, and no longer exists by
time the print statement is executed. A syntax error is also a compile-time error (- error that
prevents code from being compiled, or translated into machine language).
if (number > 100);
System.out.print("Big number");
will always print "Big number," because if the number is greater than
100, it will do nothing (the semicolon), then it will always print "Big
number" regardless. This would be a logic error (- error that causes the
program to run successfully, but with unexpected results).
Back to the bigger example:
public class Foo {
public static
void main(String[] args) {
int number, sum=0;
System.out.print("Enter a number (0
to exit): ");
number = Stdin.readInt();
while (number != 0) {
sum += number;
str = System.out.print("Enter a
number (0 to exit): ");
number = Stdin.readInt(); }
System.out.println("The sum is " +
sum);
System.exit(0); }}
Due to the structure of this program, the input
needs to be taken before the loop executes, and then again at the end
of each iteration of the loop. This is a common structure for a
program, and the first request for input (the one before the loop) is
called a priming read.
The loop condition is dependent on the value of
number, which comes from the input taken from the user. It looks
for the user to enter the number 0 to stop the loop. When a loop
looks for a special value like that so that it knows when to stop, that
value is called a sentinel.
In this case, the number zero is the sentinel. An alternative to
using a sentinel here would be to seperate the asking of values and the
asking of whether to stop; we could ask the user if they want to
enter another number after every time we got a number from them.
If we did that, it might look like this:
public class Foo {
public static
void main(String[] args) {
int number, again, sum=0;
do {
System.out.print(
"Enter a number (0 to exit): ");
number = Stdin.readInt();
sum += number;
System.out.print(
"Enter another number?\n(0 for no, 1
for yes): ");
again =
Stdin.readInt();
}while
(again != 0);
System.out.println("The sum is " +
sum);
System.exit(0); }}
Some other things we see here; the statements with
the System.out.print calls are broken up over two
lines. This is OK, because in Java syntax, whitespace (- spaces, tabs, and hard
returns) is insignificant. We do have lots of whitespace though,
every block of code is indented. We do this to make programs
easier to read, so it's immediately obvious where blocks of code begin
and end. Also, we use vertical whitespace (blank lines) to
separate unrelated groups of statements which also improves code
readability.
The only whitespace that is significant is inside of
string literals (inside of quotation marks). We don't put any
tabs or hard returns directly into string literals though, we use escape sequences if we want to
insert tabs or newlines (hard returns) into a string. We see one
of these in the last call to System.out.print, the escape
sequence is "\n". In the output that results, the prompt will
have "Enter another number?" and "(0 for no, 1 for yes)" on seperate
lines. Escape sequences all start with the backslash, which is
sometimes called the escape character. The escape sequence for a
tab is "\t". Since the backslash starts an escape sequence, and a
quotation mark ends a string literal, if we want either of these
characters to be in a string literal, we need to escape them also. So a
backslash would be "\\" and a quotation mark would be "\"".
The other thing that is insignificant in Java code
(besides whitespace) is a comment
(- text that is ignored by a compiler). Comments are used to
explain sections of code to whoever is reading them. There are 3
types of comments in Java, single-line comments, multi-line comments,
and Javadoc comments. A single-line comment starts with two
slashes and goes to the end of the line. Here is an example:
if (number % 2 == 0) // if the
number is even
System.out.print("even");
A multi-line comment starts with a slash and an
asterisk (/*) and ends with an asterisk and a slash (*/). An
example:
/* This loop adds up all of
the numbers from
the starting value of number up through 99 */
while (number < 100)
sum += number;
Javadoc comments start with a slash and two
asterisks (/**) and end with an asterisk and a slash (*/) and are used
by the javadoc program which will be discussed later.