Useful Free Resources

Monday, December 31, 2018

Where Is the Internet?

It is quite common for the Internet to be visually represented as a cloud, which is perhaps an apt way to think about the Internet given the importance of light and magnetic pulses to its operation. To many people using it, the Internet does seem to lack a concrete physical manifestation beyond our computer and cell phone screens.
But it is important to recognize that our global network of networks does not work using magical water vapor, but is implemented via millions of miles of copper wires and fiber optic cables, as well as via hundreds of thousands or even millions of server computers and probably an equal number of routers, switches, and other networked devices, along with many thousands of air conditioning units and specially constructed server rooms and buildings.
The big picture of all the networking hardware involved in making the Internet work is far beyond our  scope here. We should,  however, try to provide at least some sense of the hardware that is involved in making the web possible.

       From the Computer to the Local Provider

Andrew Blum, in his eye-opening book, Tubes: A Journey to the Center of the Internet, tells the reader that he decided to investigate the question “Where is the Internet” when a hungry squirrel gnawing on some outdoor cable wires disrupted his home connection thereby making him aware of the real-world texture of the Internet.
While you may not have experienced a similar squirrel problem, for many of us, our main experience of the hardware component of the Internet is that which we experience in our homes. While there are many configuration possibilities, Figure 1.18 does provide an approximate simplification of a typical home to local provider setup.
The broadband  modem (also called a cable modem or DSL modem) is a bridge between the network hardware outside the house (typically controlled by a phone or cable company) and the network hardware inside the house. These devices are often supplied by the ISP.
The wireless router is perhaps the most visible manifestation of the Internet in one’s home, in that it is a device we typically need to purchase and install.
Routers are in fact one of the most important and ubiquitous hardware devices that make the Internet work. At its simplest, a router is a hardware device that forwards data packets from one network to another network. 
  1. When the router receives a data packet, it examines the packet’s destination address and then forwards it to another destination by deciding the best path to send the packets.
  2. A router uses a routing table to help determine where a packet should be sent. It is a table of connections between target addresses and the node (typically another router) to which the router can deliver the packet. 
In Figure 1.19, the different routing tables use next-hop routing, in which the router only knows the address of the next step of the path to the destination; it leaves it to the next step to continue routing the packet to the appropriate destination. The packet thus makes a variety of successive hops until it reaches its destination. There are a lot of details that have been left out of this particular illustration.

Routers will make use of
to supplement or even replace routing tables; but those are all topics for a network architecture course.
Once we leave the confines of our own homes, the hardware of the Internet becomes much murkier.
In Figure 1.18, the various neighborhood broadband cables (which are typically using copper, aluminum, or other metals) are aggregated and connected to fiber optic cable via fiber connection boxes.
Fiber optic cable (or simply optical fiber) is a glass-based wire that transmits light and has significantly greater bandwidth and speed in comparison to metal wires. 
In some cities (or large buildings), you may have fiber optic cable going directly into individual buildings; in such a case the fiber junction box will reside in the building.
  • These fiber optic cables eventually make their way to an ISP’s head-end, which is a facility that may contain a cable modem termination system (CMTS) or a digital subscriber line access multiplexer (DSLAM) in a DSL-based system. 
  • This is a special type of very large router that connects and aggregates subscriber connections to the larger Internet. 
  • These different head-ends may connect directly to the wider Internet, 
  • or instead be connected to a master head-end, which provides the connection to the rest of the Internet.
         From the Local Provider to the Ocean’s Edge

Eventually your ISP has to pass on your requests for Internet packets to other networks.
  1. This intermediate step typically involves one or more regional network hubs.
  2. Your ISP may have a large national network with optical fiber connecting most of the main cities in the country. 
  3. Some countries have multiple national or regional networks, each with their own optical network. 
Canada, for instance, has three national networks that connect the major cities in the country as well as connect to a couple of the major Internet exchange points in the United States, as well as several provincial networks that connect smaller cities within one or two provinces.
Alternatively, your smaller regional ISP may have transit arrangements with a larger national network (that is, they lease the use of part of their optical fiber network’s bandwidth).
A general principle in network design is that the fewer the router hops (and thus the more direct the path), the quicker the response.
Figure 1.20 illustrates some hypothetical connections between several different networks spread across four countries. As you can see, just like in the real world, the countries in the illustration
differ in their degree of internal and external interconnectedness.
  • The networks in Country A are all interconnected, but rely on Network A1 to connect them to the networks in Country B and C. 
  • Network B1 has many connections to other countries’ networks. 
  • The networks within Country C and D are not interconnected, and thus rely on connections to international networks in order to transfer information between the two domestic networks. For instance, even though the actual distance between a node in Network C1 and a node in C2 might only be a few miles, those packets might have to travel many hundreds or even thousands of miles between networks A1 and/or B1.
Clearly this is an inefficient system, but is a reasonable approximation of the state of the Internet in the late 1990s (and in some regions of the world this is still the case), when almost all Internet traffic went through a few Network Access Points (NAP), most of which were in the United States.
This type of network configuration began to change in the 2000s, as more and more networks began to interconnect with each other using an Internet exchange point (IX or IXP). These IXPs allow different ISPs to peer with one another (that is, interconnect) in a shared facility, thereby improving performance for each partner in
the peer relationship(List of Internet exchange points).

Figure 1.21 illustrates how the configuration shown in Figure 1.20 changes with the use of IXPs.
  1. As you can see, IXPs provide a way for networks within a country to interconnect. Now networks in Countries C and D no longer need to make hops out of their country for domestic communications. 
  2. Notice as well that for each of the IXPs, there are connections not just with networks within their country, but also with other countries’ networks as well. 
  3. Multiple paths between IXPs provide a powerful way to handle outages and keep packets flowing. 
  4. Another key strength of IXPs is that they provide an easy way for networks to connect to many other networks at a single location.11
As you can see in Figure 1.22, different networks connect not only to other networks within an IXP, but now large websites such as Microsoft and Facebook are also connecting to multiple other networks simultaneously as a way of improving the performance of their sites.
Real IXPs, such as at Palo Alto (PAIX), Amsterdam (AMS-IX), Frankfurt (CE-CIX), and London (LINX), allow many hundreds of networks and companies to interconnect and have throughput of over 1000 gigabits per second. The scale of peering in these IXPs
is way beyond that shown in Figure 1.22 (which shows peering with only five others); companies within these IXPs use large routers from Cisco and Brocade that have hundreds of ports allowing hundreds of simultaneous peering relationships.
In recent years, major web companies have joined the network companies in making use of IXPs.

As shown in Figure 1.23, this sometimes involves mirroring (duplicating) a site’s infrastructure (i.e., web and data servers) in a data center located near the IXP.
For instance, Equinix Ashburn IX in Ashburn, Virginia, is surrounded by several gigantic data centers just across the street from the IXP.

This concrete geography to the digital world encapsulates an arrangement that benefits both the networks and the web companies. The website will have incremental speed enhancements (by reducing the travel distance for these sites) across all the networks it is peered with at the IXP, while the network will have
improved performance for its customers when they visit the most popular websites.

                   Across the Oceans

Eventually, international Internet communication will need to travel underwater.
The amount of undersea fiber optic cable is quite staggering and is growing yearly.
As can be seen in Figure 1.24, over 250 undersea fiber optic cable systems operated by a variety of different companies span the globe.
For places not serviced by under sea cable (such as Antarctica, much of the Canadian Arctic islands, and other small islands throughout the world), Internet connectivity is provided by orbiting satellites. It should be noted that satellite links (which have smaller bandwidth in comparison to fiber optic) account for an exceptionally small percentage of oversea Internet communication.


Resources

  1. Fundamentals of Web Development by Randy Connolly, Ricardo Hoar
  2. How Does the Internet Work? - Stanford University
  3. How does the Internet work?
  4. Internet Basics - What is the Internet?

  5. Advanced Networking | Internet2
  6.  

Wednesday, December 26, 2018

Fundamental Data Structures


                              Introduction

The modern digital computer was invented and intended as a device that should facilitate and speed up complicated and time-consuming computations. In the majority of applications its capability to store and access large amounts of information plays the dominant part and is considered to be its primary characteristic, and its ability to compute, i.e., to calculate, to perform arithmetic, has in many cases become almost irrelevant.

In all these cases, the large amount of information that is to be processed in some sense represents an abstraction of a part of reality(aka a model).
  • The information that is available to the computer consists of a selected set of data about the actual problem, namely that set that is considered relevant to the problem at hand, that set from which it is believed that the desired results can be derived. 
  • The data represent an abstraction of reality in the sense that certain properties and characteristics of the real objects are ignored because they are peripheral and irrelevant to the particular problem. 
  • An abstraction is thereby also a simplification of facts.
We may regard a personnel file of an employer as an example. Every employee is represented (abstracted) on this file by a set of data relevant either to the employer or to his accounting procedures.
  • This set may include some identification of the employee, for example, his or her name and salary. But it will most probably not include irrelevant data such as the hair color, weight, and height.
  • In solving a problem with or without a computer it is necessary to choose an abstraction of reality, i.e., to define a set of data that is to represent the real situation. This choice must be guided by the problem to be solved. 
  • Then follows a choice of representation of this information. This choice is guided by the tool that is to solve the problem, i.e., by the facilities offered by the computer. In most cases these two steps are not entirely separable.
  • The choice of representation of data is often a fairly difficult one, and it is not uniquely determined by the facilities available. It must always be taken in the light of the operations that are to be performed on the data. 
  • A good example is the representation of numbers, which are themselves abstractions of properties of objects to be characterized. If addition is the only (or at least the dominant) operation to be performed, then a good way to represent the number n is to write n strokes. The addition rule on this representation is indeed very obvious and simple. The Roman numerals are based on the same principle of simplicity, and the adding rules are similarly straightforward for small numbers. On the other hand, the representation by Arabic numerals requires rules that are far from obvious (for small numbers) and they must be memorized.
  • However, the situation is reversed when we consider either addition of large numbers or multiplication and division. The decomposition of these operations into simpler ones is much easier in the case of representation by Arabic numerals because of their systematic structuring principle that is based on positional weight of the digits.
  • It is generally known that computers use an internal representation based on binary digits (bits). This representation is unsuitable for human beings because of the usually large number of digits involved, but it is most suitable for electronic circuits because the two values 0 and 1 can be represented conveniently and reliably by the presence or absence of electric currents, electric charge, or magnetic fields.
From this example we can also see that the question of representation often transcends several levels of detail. Given the problem of representing, say, the position of an object,
  • The first decision may lead to the choice of a pair of real numbers in, say, either Cartesian or polar coordinates
  • The second decision may lead to a floating-point representation, where every real number x consists of a pair of integers denoting a fraction f and an exponent e to a certain base (such that x = f×2^e). 
  • The third decision, based on the knowledge that the data are to be stored in a computer, may lead to a binary, positional representation of integers, and 
  • the final decision could be to represent binary digits by the electric charge in a semiconductor storage device.
Evidently, the first decision in this chain is mainly influenced by the problem situation, and the later ones are progressively dependent on the tool and its technology. Thus, it can hardly be required
that a programmer decide on the number representation to be employed, or even on the storage device characteristics. These lower-level decisions can be left to the designers of computer equipment, who have the most information available on current technology with which to make a sensible choice that will be
acceptable for all (or almost all) applications where numbers play a role.

In this context, the significance of programming languages becomes apparent.
A programming language represents an abstract computer capable of interpreting the terms used in this language, which may embody a certain level of abstraction from the objects used by the actual machine. Thus, the programmer who uses such a higher-level language will be freed (and barred) from questions of number representation, if the number is an elementary object in the realm of this language.
The importance of using a language that offers a convenient set of basic abstractions common to most problems of data processing lies mainly in the area of reliability of the resulting programs. It is easier to design a program based on reasoning with familiar notions of numbers, sets, sequences, and repetitions
than on bits, storage units, and jumps.
  1. Of course, an actual computer represents all data, whether numbers, sets, or sequences, as a large mass of bits. But this is irrelevant to the programmer as long as he or she does not have to worry about the details of representation of the chosen abstractions, and as long as he or she can rest assured that the corresponding representation chosen by the computer (or compiler) is reasonable for the stated purposes.
  2. The closer the abstractions are to a given computer, the easier it is to make a representation choice for the engineer or implementor of the language, and the higher is the probability that a single choice will be suitable for all (or almost all) conceivable applications. 
  3. This fact sets definite limits on the degree of abstraction from a given real computer. For example, it would not make sense to include geometric objects as basic data items in a general-purpose language, since their proper representation will, because of its inherent complexity, be largely dependent on the operations to be applied to these objects. The nature and
    frequency of these operations will, however, not be known to the designer of a general-purpose language and its compiler, and any choice the designer makes may be inappropriate for some potential applications.
Here these deliberations determine the choice of notation for the description of algorithms and their data. Clearly, we wish to use familiar notions of mathematics, such as numbers, sets, sequences, and so on, rather than computer-dependent entities such as bitstrings.
But equally clearly we wish to use a notation for which efficient compilers are known to exist.
It is equally unwise to use a closely machine-oriented and machine-dependent language, as it is unhelpful to describe computer programs in an abstract notation that leaves problems of representation widely open.

The programming language Pascal had been designed in an attempt to find a compromise between these extremes, and the successor languages Modula-2 and Oberon are the result of decades of experience [1-3]. Oberon retains Pascal's basic concepts and incorporates some improvements and some extensions. It has been successfully implemented on several computers, and it has been shown that the notation is sufficiently close to real machines that the chosen features and their representations can be clearly explained. The language is also sufficiently close to other languages, and hence the lessons taught here may equally well be applied in their use.


                      The Concept of Data Type

In mathematics it is customary to classify variables according to certain important characteristics. Clear distinctions are made between real, complex, and logical variables or between variables representing individual values, or sets of values, or sets of sets, or between functions, functionals, sets of functions, and so on.
This notion of classification is equally if not more important in data processing.
We will adhere to the principle that every constant, variable, expression, or function is of a certain type. This type essentially characterizes the set of values to which a constant belongs, or which can be assumed by a variable or expression, or which can be generated by a function.
In mathematical texts the type of a variable is usually deducible from the typeface without consideration of context; this is not feasible in computer programs. Usually there is one typeface available on computer equipment (i.e., Latin letters).
The rule is therefore widely accepted that the associated type is made explicit in a declaration of the constant, variable, or function, and that this declaration textually precedes the application of that constant, variable, or function. 
This rule is particularly sensible if one considers the fact that a compiler has to make a choice of representation of the object within the store of a computer.
Evidently, the amount of storage allocated to a variable will have to be chosen according to the size of the range of values that the variable may assume. If this information is known to a compiler, so-called dynamic storage allocation can be avoided. This is very often the key to an efficient realization of an algorithm.

The primary characteristics of the concept of type that is also  embodied in the programming language Oberon, are the following [1-2]:
  1. A data type determines the set of values to which a constant belongs, or which may be assumed by a variable or an expression, or which may be generated by an operator or a function.
  2. The type of a value denoted by a constant, variable, or expression may be derived from its form or its declaration without the necessity of executing the computational process.
  3. Each operator or function expects arguments of a fixed type and yields a result of a fixed type. If an operator admits arguments of several types (e.g., + is used for addition of both integers and real numbers), then the type of the result can be determined from specific language rules.
As a consequence, a compiler may use this information on types to check the legality of various constructs.
For example, the mistaken assignment of a Boolean (logical) value to an arithmetic variable may be detected without executing the program.
This kind of redundancy in the program text is extremely useful as an aid in the development of programs, and it must be considered as the primary advantage of good high-level languages over machine code (or symbolic assembly code). 
Evidently, the data will ultimately be represented by a large number of binary digits, irrespective of whether or not the program had initially been conceived in a high-level language using the concept of type or in a typeless assembly code.
To the computer, the store is a homogeneous mass of bits without apparent structure. But it is exactly this abstract structure which alone is enabling human programmers to recognize meaning in the  monotonous landscape of a computer store.
The theory presented here and the programming language Oberon specify certain methods of defining data types.
  1. In most cases new data types are defined in terms of previously defined data types. Values of such a type are usually conglomerates of component values of the previously defined constituent types, and they are said to be structured
  2. If there is only one constituent type, that is, if all components are of the same constituent type, then it is known as the base type
  3. The number of distinct values belonging to a type T is called its cardinality. The cardinality provides a measure for the amount of storage needed to represent a variable x of the type T, denoted by x: T.
  4. Since constituent types may again be structured, entire hierarchies of structures may be built up, but, obviously, the ultimate components of a structure are atomic. Therefore, it is necessary that a notation is provided to introduce such primitive, unstructured types as well. 
  5. A straightforward method is that of enumerating the values that are to constitute the type. For example in a program concerned with plane geometric figures, we may introduce a primitive type called shape, whose values may be denoted by the identifiers rectangle, square, ellipse, circle. 
  6. But apart from such programmer-defined types, there will have to be some standard, predefined types. They usually include numbers and logical values. 
  7. If an ordering exists among the individual values, then the type is said to be ordered or scalar. In Oberon, all unstructured types are ordered; in the case of explicit enumeration, the values are assumed to be ordered by their enumeration sequence.
With this tool in hand, it is possible to define primitive types and to build conglomerates, structured types up to an arbitrary degree of nesting. In practice, it is not sufficient to have only one general method of combining constituent types into a structure. With due regard to practical problems of representation and use, a general-purpose programming language must offer several methods of structuring.
In a mathematical sense, they are equivalent; they differ in the operators available to select components of these structures.
The basic structuring methods presented here are the
  • array, 
  • the record, 
  • the set, and 
  • the sequence.
More complicated structures are not usually defined as static types, but are instead dynamically generated during the execution of the program, when they may vary in size and shape.
Such structures include
  • lists, 
  • rings, 
  • trees, and 
  • general, finite graphs.
Variables and data types are introduced in a program in order to be used for computation. To this end, a set of operators must be available.
For each standard data type a programming language offers a certain set of primitive, standard operators, and likewise with each structuring method a distinct operation and notation
for selecting a component.
The task of composition of operations is often considered the heart of the art of programming. However, it will become evident that the appropriate composition of data is equally fundamental and essential.
The most important basic operators are
  1. comparison and 
  2. assignment,
i.e., the test for equality (and for order in the case of ordered types), and the command to enforce equality.
The fundamental difference between these two operations is emphasized by the clear distinction in their denotation throughout this text.
Test for equality: x=y (an expression with value TRUE or FALSE)
Assignment to x: x := y (a statement making x equal to y)
These fundamental operators are defined for most data types, but it should be noted that their execution may involve a substantial amount of computational effort, if the data are large and highly structured.

For the standard primitive data types, we postulate not only the availability of assignment and comparison, but also a set of operators to create (compute) new values.
Thus we introduce the standard operations of arithmetic for numeric types and the elementary operators of propositional logic for logical values.

                     Primitive Data Types

A new, primitive type is definable by enumerating the distinct values belonging to it. Such a type is called an enumeration type. Its definition has the form

TYPE T = (c1, c2, ... , cn)
T is the new type identifier, and the ci are the new constant identifiers.

Examples

TYPE shape = (rectangle, square, ellipse, circle)
TYPE color = (red, yellow, green)
TYPE sex = (male, female)
TYPE weekday = (Monday, Tuesday, Wednesday, Thursday, Friday,
Saturday, Sunday)
TYPE currency = (franc, mark, pound, dollar, shilling, lira, guilder,
krone, ruble, cruzeiro, yen)
TYPE destination = (hell, purgatory, heaven)
TYPE vehicle = (train, bus, automobile, boat, airplane)
TYPE rank = (private, corporal, sergeant, lieutenant, captain, major,
colonel, general)
TYPE object = (constant, type, variable, procedure, module)
TYPE structure = (array, record, set, sequence)
TYPE condition = (manual, unloaded, parity, skew)

The definition of such types introduces not only a new type identifier, but at the same time the set of identifiers denoting the values of the new type.
These identifiers may then be used as constants throughout the program, and they enhance its understandability considerably. If, as an example, we introduce variables s, d, r, and b.
VAR s: sex
VAR d: weekday
VAR r: rank
then the following assignment statements are possible:
s := male
d := Sunday
r := major
b := TRUE
Evidently, they are considerably more informative than their counterparts
s := 1 d := 7 r := 6 b := 2
which are based on the assumption that c, d, r, and b are defined as integers and that the constants are mapped onto the natural numbers in the order of their enumeration.
Furthermore, a compiler can check against the inconsistent use of operators. For example, given the declaration of s above, the statement
s := s+1 
would be meaningless.
If, however, we recall that enumerations are ordered, then it is sensible to introduce operators that generate the successor and predecessor of their argument. We therefore postulate the following standard operators, which assign to their argument its successor and predecessor respectively:
INC(x)
DEC(x)


                  Standard Primitive Types

Standard primitive types are those types that are available on most computers as built-in features.
They include
  • the whole numbers, 
  • the logical truth values, and 
  • a set of printable characters.
  • On many computers fractional numbers are also incorporated, together with the standard arithmetic operations.
We denote these types by the identifiers

INTEGER, REAL, BOOLEAN, CHAR, SET


                            Integer types

The type INTEGER comprises a subset of the whole numbers whose size may vary among individual computer systems.
  1. If a computer uses n bits to represent an integer in two's complement notation, then the admissible values x must satisfy -2^n-1 <= x < 2^n-1. 
  2. It is assumed that all operations on data of this type are exact and correspond to the ordinary laws of arithmetic, and that the computation will be interrupted in the case of a result lying outside the representable subset. This event is called overflow
  3. The standard operators are the four basic arithmetic operations of addition (+), subtraction (-), multiplication (*), and division (/, DIV).
  4. Whereas the slash denotes ordinary division resulting in a value of type REAL, the operator DIV denotes integer division resulting in a value of type INTEGER. If we define the quotient q = m DIV n and the remainder r = m MOD n, the following relations hold, assuming n > 0: q*n + r = m and 0 <= r
Examples:

 31 DIV 10 = 3           31 MOD 10 = 1
-31 DIV 10 = -4         -31 MOD 10 = 9

We know that dividing by 10^n can be achieved by merely shifting the decimal digits n places to the right and thereby ignoring the lost digits.
The same method applies, if numbers are represented in binary instead of decimal form.
If two's complement representation is used (as in practically all modern computers), then the shifts implement a division as defined by the above DIV operation. Moderately sophisticated compilers
will therefore represent an operation of the form m DIV 2^n or m MOD 2n by a fast shift (or mask) operation.

                                The type REAL

The type REAL denotes a subset of the real numbers.
Whereas arithmetic with operands of the types INTEGER is assumed to yield exact results, arithmetic on values of type REAL is permitted to be inaccurate within the limits of round-off errors caused by computation on a finite number of digits. This is the principal reason for the explicit distinction between the types INTEGER and REAL, as it is made in most programming languages.
The standard operators are the four basic arithmetic operations of
addition (+), subtraction (-),multiplication (*), and division (/).
It is an essence of data typing that different types are incompatible under assignment. An exception to this rule is made for assignment of integer values to real variables,because here the semanitcs are unambiguous. After all, integers form a subset of real numbers.
However, the inverse direction is not permissible: Assignment of a real value to an integer variable requires an operation such as truncation or rounding. 
The standard transfer function Entier(x) yields the integral part of
x.
Rounding of x is obtained by Entier(x + 0.5).

Many programming languages do not include an exponentiation operator. The following is an algorithm for the fast computation of y = x^n, where n is a non-negative integer.

y := 1.0;
i := n;
WHILE i > 0 DO (* x0^n = x^i * y *)
  IF ODD(i) THEN
    y := y*x
  END ;
  x := x*x;
  i := i DIV 2
END



[to be continued...]

Wednesday, December 19, 2018

Donald Knuth

When I was younger prof. Donald Knuth was(and is) one of my idols. An interesting article about a myth in our field. By the way that blog is dedicated to him as well

Thursday, December 13, 2018

Network Programming with Java: The basics

        What Is Network Programming?

A network is a group of two or more computers or other types of electronic devices such as printers that are linked together with a goal to share information.
Each device linked to a network is called a node. A computer that is linked to a network is called a host.
Network programming in Java involves writing Java programs that
facilitate the exchange of information between processes running on different computers on the network.

Java makes it easy to write network programs.
  • Sending a message to a process running on another computer is as simple as writing data to a local file system.
  • Similarly, receiving a message that was sent from a process running in another computer is as simple as reading data from a local file system.
Most of the programs we deal with involve reading and writing data over the network, and they are similar to file I/O. You have to learn about a few new that facilitate the communication between two computers on a network.
You do not need to have advanced level knowledge of networking technologies to understand or write Java programs. Here we cover high-level details of a few concepts that are involved in network communication.

A network can be categorized based on different criteria.
Based on the geographical area that a network is spread over, it is categorized as follows:
When two or more networks are connected using routers (also known as gateways), it is called internetworking, and the resulting combined network is called an internetwork, in short, internet.
The global internetwork, which encompasses all networks in the world connected together, is referred to as the Internet.

Based on the topology (the arrangement of nodes in a network), a network may be categorized as
  • star,
  • tree, 
  • ring, 
  • bus, 
  • hybrid, etc.
Based on the technology a network uses to transmit the data, it can be categorized as
  • Ethernet, 
  • LocalTalk, 
  • Fiber Distributed Data Interface (FDDI), 
  • Token Ring, 
  • Asynchronous Transfer Mode (ATM), etc.
For any details about the different kinds of networks refer to any standard textbook on networks to learn more about networks and network technologies in detail.

Communication between two processes on a computer is simple and it is achieved using InterProcess Communication(IPC) as defined by the operating system.
It is a very tedious task when two processes running on two different computers on an internet need to communicate. You need to consider many aspects of the communication before the such two processes may start communicating. Some of the points that you need to consider are as follows:
  • The two computers may be using different technologies such as different operating systems, different hardware, etc.
  • They may be on two different networks that use different network technologies.
  • They may be separated by many other networks, which may be using different technologies. That is, two computers are not on two networks that are interconnected directly. You need to consider not just two networks, but all networks that the data from one computer must pass to reach another computer.
  • They may be a few miles apart or on other sides of the globe. How do you transmit the information efficiently without worrying about the distance between the two computers?
  • One computer may not understand the information sent by the other computer.
  • The information sent over a network may be duplicated, delayed, or lost. How should the receiver and the sender handle these abnormal situations?
Simply put,
  1. two computers on a network communicate using messages (sequences of 0s and 1s).
  2. There must be well-defined rules to handle the previously mentioned issues (and many more). The set of rules to handle a specific task is known as a protocol. Many types of tasks are involved in handling network communication. There is a protocol defined to handle each specific task. There is a stack of protocols (also called protocol suite) that are used together to handle a network communication.

               Network Protocol Suite

Modern networks are called packet switching networks because they transmit data in chunks called packets.
Each packet is transmitted independent of other packets. This makes it easy to transmit the packets from the same computer to the same destination using different routes. 
However, it may become a problem if a computer sends two packets to a remote computer and the second packet arrives before the first one.
For this reason, each packet also has a packet number along with its destination address. 
There are rules to rearrange the out-of-order arrival of the packets at the destination computer. The following discussion attempts to explain some of the mechanisms that are used to handle packets in a network communication.

Figure 4-1 shows a layered protocol suite called the Internet Reference Model or TCP/IP Layering Model.
This is the most widely used protocol suite. Each layer in the model performs a well-defined task.
The main advantage of having a layered protocol model is that any layer can be changed without affecting others. A new protocol can be added to any layer without changing other layers.


Each layer knows about only the layer immediately above and below it. Each layer has two interfaces—one for the layer above it and one for the layer below it. 
For example, the transport layer has interfaces to the application layer and internet layer. That is, the transport layer knows how to communicate only with the application layer and the internet layer. It knows nothing about the network interface layer or the physical layer.

A user application such as a Java program uses the application layer to communicate to a remote application.
The user application has to specify the protocol that it wants to use to communicate with the remote application. 
A protocol in an application layer defines the rules for formatting messages and associating the meaning to the information contained in the messages such as the message type, describing
whether it is a request or a response, etc.
After the application layer formats the message, it hands over the message to the transport layer.
The examples of protocols in an application layer are
  • Hypertext Transfer Protocol (HTTP), 
  • File Transfer Protocol (FTP), 
  • Gopher, 
  • Telecommunication Network (Telnet), 
  • Simple Mail Transfer Protocol (SMTP), and 
  • Network News Transfer Protocol (NNTP).
The transport layer protocol handles the ways messages are transported from one application on one computer to another application on the remote computer. 
It controls
  • the data flow, 
  • error handling during data transmission, and 
  • connections between two applications.
For example,
  1. a user application may hand over a very large chunk of data to the transport layer to transmit to a remote application. 
  2. The remote computer may not be able to handle that large amount of data at once. It is the responsibility of the transport layer to pass a suitable amount of data at a time to the remote computer, so the remote application can handle the data according to its capacity. 
  3. The data passed to the remote computer over a network may be lost on its way due to various reasons. 
  4. It is the responsibility of the transport layer to re-transmit the lost data. Note that the application layer passes data to be transmitted to the transport layer only once. It is the transport layer (not the application layer) that keeps track of the delivered and the lost data during a transmission. 
  5. There may be multiple applications running, all of which use different protocols and exchange information with different remote applications. 
  6. It is the responsibility of the transport layer to hand over messages sent to a remote application correctly. For example, you may be browsing the Internet using the HTTP protocol from one remote web server and downloading a file using the FTP protocol from another FTP server. Your computer is receiving messages from two remote computers and they are meant for two different applications running on your computer—one web browser to receive HTTP data and one FTP application to receive FTP data.
  7.  It is the responsibility of the transport layer to pass the incoming data to the appropriate application.
You can see how different layers of the protocol suite play different roles in data transmission over the network. Depending on the transport layer protocol being used, the transport layer adds relevant information to the message and passes it to the next layer, which is the internet layer.
The examples of protocols used in the transport layer are
  • Transmission Control Protocol (TCP), 
  • User Datagram Protocol (UDP), and 
  • Stream Control Transmission Protocol (SCTP).
The internet layer accepts the messages from the transport layer and prepares a packet suitable for sending over the internet. It includes the Internet Protocol (IP).
The packet prepared by the IP is also known as an IP datagram.
It consists of a header and a data area, apart from other pieces of information.
The header contains
  • the sender’s IP address, 
  • destination IP address, 
  • time to live (TTL, which an integer), 
  • a header checksum, and many other pieces of information specified in the protocol.
The IP prepares the message into datagrams, which are ready to be transmitted over the internet.
  1. The TTL in the IP datagram header specifies how long, in terms of the number of routers, an IP datagram can keep traveling before it needs to be discarded. Its size is one byte and its value could be between 1 and 255. When an IP datagram reaches a router in its route to the destination, the router decrements the TTL value by 1. If the decremented value is zero, the router discards the datagram and 
  2. sends an error message back to the sender using Internet Control Message Protocol (ICMP). 
  3. If the TTL value is still a positive number, the router forwards the datagram to the next router.
The IP uses an address scheme, which assigns a unique address to each computer. The address is called an IP address. The internet layer hands over the IP datagram to the next layer, which is the network interface layer. The examples of protocols in an internet layer are
  1. Internet Protocol (IP), 
  2. Internet Control Message Protocol (ICMP), 
  3. Internet Group Management Protocol (IGMP), and 
  4. Internet Protocol Security (IPsec).
The network interface layer prepares a packet to be transmitted on the network. The packet is called a frame. The network interface layer sits just on top of the physical layer, which involves the hardware.

Note that the IP layer uses the IP address to identify the destination on a network.
An IP address is a virtual address, which is completely maintained in software. 
The hardware is unaware of the IP address and it does not know how to transmit a frame using an IP address. The hardware must be given the hardware address, also called Media Access Control (MAC) address, of the destination that it needs to transmit the frame to.
This layer resolves the destination hardware address from the IP address and places it in the frame header. It hands over the frame to the physical layer.
The examples of protocols in a network interface layer are
  1. Open Shortest Path First (OSPF), 
  2. Point-to-Point Protocol (PPP), 
  3. Point-to-Point Tunneling Protocol (PPTP), and
  4. Layer 2 Tunneling Protocol (L2TP).
The physical layer consists of the hardware. It is responsible for converting the bits of information into signals and transmitting the signal over the wire.

■Tip
Packet is a generic term that is used to mean an independent chunk of data in network programming.
Each layer of protocol also uses a specific term to mean the packet it deals with. For example,
  • a packet is called a segment in the TCP layer; 
  • it is called a datagram in the IP layer; 
  • it is called a frame in the network interface and physical layers. 
Each layer adds a header (sometimes also a trailer) to the packet it receives from the layer before it, while preparing the packet to be transmitted over the network.
Each layer performs the reverse action when it receives a packet from the layer below it. It removes the header from the packet; performs some
actions, if needed; and hands over the packet to the layer above it.
When a packet sent by an application reaches the remote computer, it has to pass through the same layer of protocols in the reverse order. Each layer will remove its header, perform some actions, and pass the packet to the layer immediately above it. Finally, the packet reaches the remote application in the
same format it started from the application on the sender’s computer.
Figure below  shows the transmission of packets from the sender and the receiver computer. P1, P2, P3, and P4 are the packets in different formats of the same data. A protocol layer at a destination receives the same packet from the layer immediately below it, which the same protocol layer had passed to the layer immediately below it on the sender’s computer.

  
                  IP Addressing Scheme

IP uses a unique address, called an IP address, to route an IP datagram to the destination.
An IP address uniquely identifies a connection between a computer and a router. 
Normally, it is understood that an IP address identifies a computer. However, it should be emphasized that it identifies a connection between a computer and a router, not just a computer. A router is also assigned an IP address.
A computer can be connected to multiple networks using multiple routers and each connection between the computer and the router will have a unique IP address. In such cases, the computer will be assigned multiple IP addresses and the computer is known as multi-homed
Multi-homing increases the availability of the network connection to a computer. If one network connection fails, the computer can use other available network connections.

An IP address contains two parts
  1. a network identifier (prefix) and 
  2. a host identifier (suffix).
The prefix identifies a network on the Internet uniquely; the suffix identifies a host uniquely within that network. It is possible for two hosts to have IP addresses with the same suffix as long as they have a different prefix.

There are two versions of Internet Protocol—IPv4 (or simply IP) and IPv6, version 4 and version 6.
IPv6 is also known as Internet Protocol next generation (IPng).

Note that there is no IPv5. When IP was in its full swing of popularity, it was at version 4. Before IPng was assigned a version number 6, version 5 was already assigned to another protocol called Internet Stream Protocol (ST).
Both IPv4 and IPv6 use an IP address to identify a host on a network. However, the addressing schemes in the two versions differ significantly.

Since an IP address must be unique, its assignment is controlled by an organization called Internet Assigned Numbers Authority (IANA).
  1. IANA assigns a unique address to each network that belongs to an organization. 
  2. The organization uses the network address and a unique number to form a unique IP address for each host on the network. 
  3. IANA divides the IP address allocations to five Regional Internet Registry(RIR) organizations, which allocate IP addresses in specific regions as listed in Table 4-1. You can find more information on how to get a network address in your area from IANA at www.iana.com.


                 IPv4 Addressing Scheme

IPv4 (or simply IP) uses a 32-bit number to represent an IP address. An IP address contains two parts—a prefix and a suffix. The prefix identifies a network and the suffix identifies a host on the network, as shown in Figure.

It is not easy for humans to remember a 32-bit number in binary format. IPv4 allows you to work with an alternate form using four decimal numbers. Each decimal number is in the range from 0 to 255. The decimal number format of IPv4 is called dotted decimal format because a dot is used to separate two decimal numbers. Each decimal number represents the value contained in 8 bits of the 32-bit number. For example, an IPv4 address of
 11000000 10101000 00000001 11100111 
in thhttps://en.wikipedia.org/wiki/Dot-decimal_notatione binary format can be represented as 192.168.1.231 in the dotted decimal format.


How do you know that 192.168.1 represents a prefix in an IPv4 address 192.168.1.231? A rule governs the value of a prefix and a suffix in an IPv4.
How does an IPv4 address divide its 32 bits between a prefix and a suffix? IPv4 address space is divided in five categories called network classes, named A, B, C, D, and E.
A class type defines how many bits of the 32 bits will be used to represent the network address part of an IP address.
The leading bit (or bits) in the prefix defines the class of the IP address. This is also known as a self-identifying or classful IP address because you can tell which class it belongs to by looking at the IP address.
The table lists the five network classes and their characteristics in IPv4. The leading bits in an IP address identify the class of the network.
For example,
  1. if an IP address looks like 0XXX, where XXX is the last 31 bits of the 32 bits, it belongs to the class A network;  There can be only 128(2^7) networks of class A type and each network can have 16777214((2^31/128)-2) hosts. The number of hosts that a class A network can have is very big and it is very unlikely that a network will have that many hosts.
  2. if an IP address looks like 110XXX, where XXX is the last 29 bits of 32 bits, it belongs to the class C network. In a class C type of network, the maximum number of hosts that a network can have is limited to 254(2^8 - 2).


[to be continued...]

Thursday, December 6, 2018

Object Orientation

If we want to model in an object-oriented style, we must first clarify what object orientation means.
The introduction of object orientation dates back to the 1960s when the simulation language SIMULA was presented, building on a paradigm that was as natural to humans as possible to describe the world.
The object-oriented approach corresponds to the way we look at the real world; we see it as a society of autonomous individuals, referred to as objects, which take a fixed place in this society and must thereby fulfill predefined obligations.
There is not only one single definition for object orientation. However, there is a general consensus about the properties that characterize object orientation. Naturally, objects play a central role in object-oriented approaches.
Viewed simply, objects are elements in a system whose data and operations are described. Objects interact and communicate with one another.

                              Classes

In many object-oriented approaches, it is possible to define classes that describe the attributes and the behavior of a set of objects (the instances of a class) abstractly and thus group common features of objects.

For example, people have
  • a name, 
  • an address, and a 
  • social security number(SSN).
Courses have
  • a unique identifier, 
  • a title, and 
  • a description.
Lecture halls have
  • a name as well as 
  • a location, etc.
A class also defines a set of permitted operations that can be applied to the instances of the class.

For example, you can reserve a lecture hall for a certain date, a student can register for an exam, etc.
In this way, classes describe the behavior of objects.

                            Objects

The instances of a class are referred to as its objects.
For example, lh1,the Lecture Hall 1 of the Vienna University of Technology, is a concrete instance of the class LectureHall.
In particular, an object is distinguished by the fact that it has its own identity, that is, different instances of a class can be uniquely identified. 
For example, the beamer in Lecture Hall 1 is a different object to the beamer in Lecture Hall 2, even if the devices are of the same type.
Here we refer to identical devices but not the same device. 
The situation for concrete values of data types is different: the number 1, which is a concrete value of the data type Integer, does not have a distinguishable identity.
An object always has a certain state. A state is expressed by the values of its attributes. 
For example, a lecture hall can have the state
  • occupied or 
  • free.
An object also displays behavior.
The behavior of an object is described by the set of its operations. Operations are triggered by sending a message.

                          Encapsulation

Encapsulation is the protection against unauthorized access to the internal state of an object via a uniquely defined interface.
Different levels of visibility of the interfaces help to define different access authorizations. 
Java, for example, has the explicit visibility markers 
  • public,
  • private, and 
  • protected,
which respectively permit access for
  • all,
  • only within the object, and 
  • only for members of the same class, its subclasses, and of the same package.

                             Messages

Objects communicate with one another through messages.
  1. A message to an object represents a request to execute an operation. 
  2. The object itself decides whether and how to execute this operation. 
  3. The operation is only executed if the sender is authorized to call the operation—this can be regulated in the form of visibilities (see Encapsulation)—and a suitable implementation is available.
In many object-oriented programming and modeling languages the concept of overloading is supported. This enables an operation to be defined differently for different types of parameters. 
For example, the operator + realizes different behavior depending on whether it is used to add up integers (e.g., 1 + 1= 2) or to concatenate character strings (e.g., “a” + “b” = “ab”).


                           Inheritance

The concept of inheritance is a mechanism for deriving new classes
from existing classes. A subclass derived from an existing class (= superclass) inherits all visible attributes and operations (specification and implementation) of the superclass.

A subclass can:
  • Define new attributes and/or operations
  • Overwrite the implementation of inherited operations
  • Add its own code to inherited operations
Inheritance enables extensible classes and as a consequence, the creation of class hierarchies as the basis for object-oriented system development.
A class hierarchy consists of classes with similar properties,
for example, Person ← Employee ← Professor ← ... where
A ← B means that B is a subclass of A.

When used correctly, inheritance offers many advantages:
  1. reuse of program or model parts (thus avoiding redundancy and errors), 
  2. consistent definition of interfaces, 
  3. use as a modeling aid through a natural categorization of the occurring elements, and 
  4. support for incremental development, i.e., a step-by-step refinement of general concepts to specific concepts.

                           Polymorphism

In general terms, polymorphism is the ability to adopt different forms.
During the execution of a program, a polymorphic attribute can have references to objects from different classes. 
When this attribute is declared, a type (e.g., class Person) is assigned statically at compile time.
At runtime, this attribute can also be bound dynamically to a sub-
type (e.g., subclass Employee or subclass Student).

A polymorphic operation can be executed on objects from different
classes and have different semantics in each case. This scenario can be implemented in many ways:
  1.  via parametric polymorphism, better known as genericity—here, type parameters are used. In Java for example, the concrete classes are transferred to the operations as arguments;
  2. via inclusion polymorphism—operations can be applied to classes and to their direct and indirect subclasses; 
  3. via overloading of operations; and 
  4. via coercion, that is, the conversion of types.
The first two methods above are known as universal polymorphism; the other two methods are referred to as ad hoc polymorphism.


Resources

.

Wednesday, December 5, 2018

BlueJ environment and Objects First approch

BlueJ is a Java development environment that is being developed and maintained by the Computing Education Research Group at the University of Kent in Canterbury, UK, explicitly as an environment for teaching introductory(1st, 2nd year) object-oriented programming.

It is better suited to introductory teaching than other environments for a variety of reasons:
  • The user interface is much simpler. Beginning students can typically use the BlueJ environment in a competent manner after 20 minutes of introduction. From then on, instruction can concentrate on the important concepts at hand—object orientation and Java—and no time needs to be wasted talking about environments, file systems, class paths, or DLL conflicts.
  • The environment supports important teaching tools not available in other environments. One of them is visualization of class structure. BlueJ automatically displays a UML-like diagram representing the classes and relationships in a project. Visualizing these important concepts is a great help to both teachers and students. It is hard to grasp the concept of an object when all you ever see on the screen is lines of code! The diagram notation is a simple subset of UML, again tailored to the needs of beginning students. This makes it easy to understand, but also allows migration to full UML in later courses.
  • One of the most important strengths of the BlueJ environment is the user’s ability to directly create objects of any class, and then to interact with their methods. This creates the opportunity for direct experimentation with objects, with little overhead in the environment. Students can almost “feel” what it means to create an object, call a method, pass a parameter, or receive a return value. They can try out a method immediately after it has been written, without the need to write test drivers. This facility is an invaluable aid in understanding the underlying concepts and language details.
  • BlueJ includes numerous other tools and characteristics that are specifically designed for learners of software development. Some are aimed at helping with understanding fundamental concepts (such as the scope highlighting in the editor), some are designed to introduce additional tools and techniques, such as integrated testing using JUnit, or teamwork using a version control system, such as Git, once the students are ready. Several of these features are unique to the BlueJ environment.
BlueJ is a full Java environment. It is not a cut-down, simplified version of Java for teaching. It runs on top of Oracle’s Java Development Kit, and makes use of the standard compiler and virtual machine. This ensures that it always conforms to the official and most up-to-date Java specification.

The authors of Objects First with JavaTM: A Practical Introduction Using BlueJ, have many years of teaching experience with the BlueJ environment (and many more years without it before that). We both have experienced how the use of BlueJ has increased the involvement, understanding, and activity of students in our courses. One of the authors() is also a developer of the BlueJ system.

                                 Real objects first

One of the reasons for choosing BlueJ was that it allows an approach where teachers truly deal with the important concepts first.

“Objects first” has been a battle cry for many textbook authors and teachers for some time.
Unfortunately, the Java language does not make this noble goal very easy. Numerous hurdles of syntax and detail have to be overcome before the first experience with a living object arises. The minimal Java program to create and call an object typically includes
  1. writing a class;
  2. writing a main method, including concepts such as static methods, parameters, and arrays in the signature;
  3. a statement to create the object (“new”);
  4. an assignment to a variable;
  5. the variable declaration, including variable type;
  6. a method call, using dot notation;
  7. possibly a parameter list.
As a result, textbooks typically either
  • have to work their way through this forbidding list, and only reach objects somewhere around Chapter 4; or
  • use a “Hello, world”-style program with a single static main method as the first example,thus not creating any objects at all.
With BlueJ, this is not a problem. A student can create an object and call its methods as the very first activity! Because users can create and interact with objects directly, concepts such as classes, objects, methods, and parameters can easily be discussed in a concrete manner before looking at the first line of Java syntax.

                                 An iterative approach

Another important aspect of a teaching book is that it follows an iterative style. In the computing education community, a well-known educational design pattern exists that states that important concepts should be taught early and often(The “Early Bird” pattern, in J. Bergin: “Fourteen pedagogical patterns for teaching computer science”, Proceedings of the Fifth European Conference on Pattern Languages of Programs (EuroPLop 2000),
Irsee, Germany, July 2000).
It is very tempting for textbook authors to try and say everything about a topic at the point where it is introduced. For example, it is common, when introducing types, to give a full list of built-in data types, or to discuss all available kinds of loop when introducing the concept of a loop.
These two approaches conflict: we cannot concentrate on discussing important concepts first, and at the same time provide complete coverage of all topics encountered. Our experience with
textbooks is that much of the detail is initially distracting, and has the effect of drowning the important points, thus making them harder to grasp.

In this book we touch on all of the important topics several times, both within the same chapter and across different chapters. Concepts are usually introduced at a level of detail necessary for
understanding and applying the task at hand. They are revisited later in a different context, and understanding deepens as the reader continues through the chapters. This approach also helps to
deal with the frequent occurrence of mutual dependencies between concepts.
Some teachers may not be familiar with an iterative approach. Looking at the first few chapters, teachers used to a more sequential introduction will be surprised about the number of concepts touched on this early. It may seem like a steep learning curve.
It is important to understand that this is not the end of the story. Students are not expected to understand everything about these concepts immediately. Instead, these fundamental concepts will be revisited again and again throughout the book, allowing students to get a deeper and deeper understanding over time. Since their knowledge level changes as they work their way forward, revisiting important topics later allows them to gain a deeper understanding
overall.
We have tried this approach with students many times. It seems that students have fewer problems dealing with it than some long-time teachers. And remember: a steep learning curve is not
a problem as long as you ensure that your students can climb it!

[to be continued..]

Resources

 Objects First with JavaTM: A Practical Introduction Using BlueJ

Monday, December 3, 2018

Unit Testing With BlueJ

                      TESTING OBJECTS

The character of testing has been changed significantly by the introduction of object orientation into introductory courses.
While many of the techniques for testing procedural programs are
applicable in an object-oriented system, object orientation introduces some new testing problems.
One of these problems is that the overhead for the construction of test cases is higher in object-oriented languages because of the size and number of separate units that require testing.
Procedural programming tended to produce large monolithic applications with long function definitions.
Whilst it was hard to construct good tests for these, the infrastructure required to set up and run the tests was relatively
straightforward.
Object oriented programming tends to produce better separated units of code with smaller and more precise methods.
This means that testing can be more effective because methods tend to be more cohesive, but the sheer number of tests means that more testing infrastructure is needed.
Because of this, tools to help manage the testing process are now
more important.

                  TOOL SUPPORT FOR TESTING

We have identified three activities as being representative of the type  of testing performed by students:
  1. Testing immediately after implementation is covered well for ad-hoc testing by BlueJ, using interactive method invocation. These tests, however, are transient and cannot easily be repeated.
  2. Testing after detecting a bug is adequately addressed by symbolic debuggers and tools that allow object inspection.
  3. Testing after fixing a bug (regression testing) is supported by testing frameworks such as JUnit. Professional tools such as Test Mentor also offer relevant functionality, but are too complex for use by students.
No existing tool supports well all relevant forms of testing. We introduces a design for a tool that supports all three testing activities in a manner appropriate for beginning students.
This tool is an extension of BlueJ, which already includes a symbolic  debugger that adequately covers (2). Thus, our discussion concentrates on tools to support and integrate the testing activities (1) and (3).
The tool incorporates the quick and efficient BlueJ object interaction with the regression testing facilitated by JUnit, to provide an easy way for students to construct and run test cases


                       TESTING IN BLUEJ

We first briefly describe BlueJ's testing facilities prior to JUnit framework.
  • The main display of  BlueJ is a simplified UML diagram of the classes in the system. 
  • Each class is displayed as an icon with UML stereotypes to indicate different class types such as «abstract», or «interface».
  • Each of the classes displayed has an associated popup menu that displays all public constructors for the class, as well as all its public static methods. 
  • When a constructor is selected, the user is prompted to enter necessary parameters, and then a representation of the constructed object will be created on the object bench. 
  • Once an object is available on the object bench, any public method can be interactively invoked. Parameters may be entered and results can be examined.
  • Using the object interaction mechanism in BlueJ, a user can create the initial setup of a test phase by instantiating objects and placing them on the object bench.
  • Methods can then be tested, without the need to write specific test drivers, by making a sequence of interactive method calls, immediately after the code has been constructed. Parameters can be entered and method results can be examined.
  • No test harnesses or test drivers are required to execute the methods that have just been constructed.
However, this testing is ephemeral. Objects are automatically removed if any change is made to their class or if the project is
closed. In particular, the potentially time consuming set up phase,
where objects are constructed, must be manually repeated after each compilation.
Tests cannot  be easily repeated to confirm the behaviour later on
in the program development. This acts as an impediment to using the tool to test methods.

                                    TESTING IN JUNIT

The JUnit testing framework has become a de facto standard for implementing unit tests in Java. Similar unit testing frameworks have now been released for many other languages.

JUnit is a simple framework that defines classes for some key testing abstractions. A programmer uses JUnit by extending these classes.
  1. The main class that programmers extend is TestCase. Within the new extended class (called the test case or test case class), the programmer defines test methods (all methods that begin with the prefix ‘test’).
  2. These test methods contain calls to the application’s methods and assertion statements on the results which will be checked when the test is run. 
  3. Any assertions that fail will be displayed. 
  4. The assertion statements available in the test case class are methods such as assertTrue(), assertSame() and assertEquals().
  5. The test methods in a test case often need access to a set of objects of known state. 
  6. A test fixture is a common set of test data and collaborating objects shared by many tests. They are normally implemented as instance variables in the TestCase class. Two special methods (the setUp() and tearDown() methods) are responsible
    for initializing the test fixture instance variables at appropriate
    times.
  7. The JUnit framework provides a variety of extensible interfaces for displaying results. One simple implementation is the TextRunner which displays the results in a text terminal. An alternative is the SwingRunner which displays the results using
    a GUI. 
  8. Whichever display class is chosen, the JUnit framework will display for each test method the status of the assertions, and,
    if any failed, provide a stack trace showing the expected values
    for the assertion that failed.
  9. Once test fixtures and test methods have been written in JUnit,
    the system provides an easy mechanism to perform regression testing. Tests can be repeatedly run and any failing tests are highlighted 

                   INTEGRATING JUNIT AND BLUEJ

Unit testing frameworks such as JUnit support a standard way to write tests but do not provide any tools that help in this task.
The BlueJ interaction mechanism is useful for ad-hoc testing, but
does not provide any recording facility, so tests must be redone by
hand, eliminating its usefulness for regression testing.
A combination of the two can serve to merge the best of both worlds.
The result is not only the sum of both mechanisms, but the creation of a new quality of testing tool with new functionality emerging out of the combination of the two.
The design for the new testing functionality integrates BlueJ’s object interaction facility with JUnit’s ability to perform regression testing. It introduces explicit support for unit testing into BlueJ, based on the design and implementation of JUnit.
The new BlueJ version now recognises JUnit test classes as a special type of class. Associated with these classes, the following functionality has been incorporated into the user interface:
  • constructing a test case class;
  • running all the tests in a test case;
  • running an individual test method from a test case;
  • constructing a test fixture from objects on the object bench;
  • moving the test fixture from a test case onto the object bench; and
  • constructing new test methods by recording interactions with objects on the object bench.
In order to describe this new tool in more detail, we will walk through the process of testing a simple text based adventure program. This assignment is a modified version of the zork project
presented [9]. The actual example used here is not especially relevant, and the ideas should become clear without the need to know details of the application being tested.

Our game contains five major classes:
  1. Game,
  2. GameAction,
  3. ActionFactory,
  4. Parser and 
  5. Room.
In the following example, we assume that we have implemented the
Parser class and wish to test it. The class contains a single method:

tokenizeAndLower(java.lang.String)
This method tokenises a string, lowercases the tokens and then returns the tokens in an array...

                   Constructing a test class

As our first step we construct a unit test for the Parser by selecting
the new function “Create Test Class” from the Parser class’  popup
menu.
Unit tests are represented in the interface similar to other classes. They appear attached to a class they were created for, are marked with a UML stereotype «unit test» and have a distinct colour.
The resulting unit test class will automatically be named ParserTest (Figure 1).

The visual attachment to the Parser class is automatically maintained: when the Parser class icon is moved, the ParserTest icon moves with it.

The generated unit test class is constructed using a template unit
test file. The default template can be customized by the user or system administrator to fit in with any local requirements of coding style

                   Creating test methods

The first test we wish to create is a test to see that the Parser class
is always returning a valid String array, no matter what the input.
  1. We start the construction of a new test case by selecting “Create Test Method...” from the unit test class’ popup menu (Figure 1).
  2. We are prompted for the name of the test and we enter the name “NotNull”.
  3. An “End Test”button appears at the bottom right corner of the object bench. All our interactions with BlueJ, from now until the “End Test” button is pressed, will be recorded as Java source in a testNotNull() method of the ParserTest class.
  4. We construct a new Parser object by selecting “new Parser()” from the Parser class’ menu. We can now execute methods of the Parser object and BlueJ will perform the operation. 
  5. We select the tokenizeAndLower(String) method and are presented with a dialog asking for parameters to the method call. As we are testing to make sure the method always give us a non-null string array, we start with a boundary case like the empty string "". As with normal BlueJ interactions, a result dialog is now displayed showing the returned object. However, because we are in test mode, the result dialog is extended to show an assertion panel that can be used to make assertions in the current test (Figure 2).

                        Asserting results

We want to assert that the result we received from the method is not null.
  1. To do this we check the “Assert that” checkbox. We can then select the type of assertion that we want from the drop down list of supported JUnit assertions. In our case, we select the “not null” assertion. 
  2. When we click on the “Ok” button, this assertion is added to the current test. 
  3. We repeat this process for some other cases we wish to test such as the strings "a" and "AA ab". After exhausting all the cases that we wish to check we click the “End Test” button in the bottom right corner of the object bench.
  4. The testNotNull() method has now been added to the ParserTest  class. After compiling the test, we are now ready to run it.


                             Executing tests

Whereas traditional IDE’s only allow interaction with an application in a monolithic way (by executing the main method of the program), BlueJ allows interaction at an object, class and method level.
Similarly, while the standard JUnit interface only allows the execution of all tests in a test class, its BlueJ version also allows execution of separate test methods.
The popup menu for a unit test class in BlueJ has a menu item for each test defined in the class and a “Run All” entry.
By selecting a test method from the popup menu, just a single test method is run. If a test is successful then a simple message indicating the success is displayed in the status bar at the bottom of the screen. If the test fails then the result is displayed in a dialog showing the failed assertion, similar to the dialog shown by the “Run All” menu. This allows quick execution of specific tests which is useful when editing the particular area of code that those tests target. The “Run All” function in the test class’s menu runs all the tests in the selected test case.
The results are displayed in a dialog indicating the list of test methods and success or failure of each.
As in other JUnit implementations, a large bar is displayed in green if all the test methods pass.
If there are any failures then the bar is displayed in red.
In a case where the test has failed, the bottom half of the dialog displays the results of the assertion, showing the line where the assertion failed and what the actual and expected results were (Figure 3).

BlueJ uses its own implementation of the standard JUnit SwingRunner user interface. The BlueJ version differs internally from the normal SwingRunner in that the execution of the interface
code and the tests occur on different virtual machines.
The interface presented however remains the same


                        Using test fixtures

In constructing tests for the Parser class we notice that there are some objects that we use in each test (a Parser object for example).
It would be useful to be able to share the effort of constructing the
objects between all the test methods.
JUnit’s fixtures provide a mechanism for doing this.
A design goal for the integration of BlueJ and JUnit was that BlueJ’s object interaction methods should be usable for the construction of test fixtures in the same way that the BlueJ interaction is being used to construct test methods.
Similarly, if JUnit test fixtures could be brought into BlueJ, then object interaction and test generation could be performed with existing JUnit tests.

                                  Creating a test fixture

To illustrate the construction of a test fixture we will construct some tests for the Room class.
  1. Our first step is to construct a test class for Room.We select “Create Test Class” from the Room class’ popup menu and the new RoomTest class is created. We would like to make a small set of connected rooms to test that rooms can be successfully navigated. Creating the rooms and linking them with exits requires a longish sequence of method calls, so we aim to automate this task in a test fixture.
  2. First, we construct several room objects on the object bench with various descriptions. We then use the method interaction  facility to set the exits of each room instance, passing in the other room objects as parameters. 
  3. We now want to use the Room objects on the object bench as a fixture in our RoomTest class. We select “Object Bench to Test Fixture” from the RoomTest class’ menu. The source code to construct the objects present on the object bench in their current state is saved into the RoomTest class’s setUp() method. The objects which have been saved now disappear from the object bench. They can be restored in two different ways as explained in the next section.

                               Restoring a  test  fixture

  1. A test fixture can be restored to the object bench by selecting “Test Fixture to Object Bench” from the test class’s menu. This will  execute the setUp() method and place the constructed objects onto the object bench. BlueJ users can interact with these objects on the object bench the same way they do with objects that have been constructed through normal interaction.
  2. The other method of restoring a test fixture to the object bench is by creating a test method. When a test class has a test fixture, the fixture’s objects are automatically placed on the object bench whenever the recording of a test method is started. Any method calls which are made on these objects are local to the test method being recorded and will not impact upon the state of the test fixture objects in other test methods.

                          DISCUSSION

The mechanisms discussed  so  far  provide  an  easy,  yet  powerful
way to do unit testing in BlueJ.
BlueJ's flexible interactive testing mechanisms can still be used, avoiding the need to write test drivers.
These test sequences can now be recorded and replayed, providing
an easy way to implement regression testing.
Thus, BlueJ can now write test drivers automatically through recorded interaction.
In addition to this functionality, the current design and implementation also supports other uses of the mechanism.

                                Test-driven development

The Extreme Programming community strongly advocates the use
of test-driven development (TDD).
In this style, test cases are written after class interfaces have been designed, but prior to their implementation.
In our integrated BlueJ/JUnit system, this style of working is still fully supported.
The recording facilities described above are added on to the standard JUnit functionality, but do not replace the traditional way
of writing test cases.
In other words: Test classes can still be written manually before implementing methods. They can then be executed later.
The code that is produced by the test recording facility is standard
Java source code. The test classes are normal classes that can be
edited, compiled and executed like other classes (with some added
functionality).
Thus, recorded test cases can be modified later by directly editing
the source code, and test cases can still be coded by hand.

                               Testing multiple classes

The standard way to create test classes has been described above:
selecting the “Create Test Class” function from a class’s menu creates an attached test class.
In addition to this, “free” test classes (not attached to a single class) can be created by writing a new class that extends TestCase.
BlueJ will recognise this correctly as a test case class. (Creation of
such classes is also supported through BlueJ’s “New Class” dialogue.)
The free test cases can then be used to hold tests that concern a
combination of other classes.
All test functionality is still available on these classes.
Test cases written in projects outside of BlueJ will also appear as
free test cases when opened in BlueJ.

                                   Status

All functionality described in this paper has been implemented, and
is currently undergoing final testing. It is expected that, at the time
of publication of this paper(2003), a BlueJ version including this mechanism will have been released.


                            SUMMARY

The unit  testing extensions to BlueJ aim to improve the tool support available for testing in introductory teaching.
We have achieved this by integrating the JUnit testing framework into the BlueJ development environment in a manner that diminishes neither.
At its most basic, the unit testing extensions allow the recognition and execution of JUnit test classes.
We have extended this to also allow a JUnit test fixture to be moved onto the BlueJ object bench, and provided a method for converting objects on the BlueJ object bench into a JUnit test fixture.
We have also developed a method for helping in the construction of
unit test methods through the recording of object interactions on
the object bench.

Resources

Introducing Unit Testing With BlueJ 
by Michael Kölling, John Rosenberg, Andrew Patterson