Laws of Software Evolution Revisited







Có sinh sản là có tiến hoá. Sinh vật có F1 F2, phần cứng có model1 model2, phần mềm có version1 version2. Sinh vật tiến hoá theo định luật Darwin, Mendel. Phần cứng tiến hoá theo định luật Moore, nano. Phần mềm tiến hoá theo định luật gì?

Giáo sư Lehman đã nghiên cứu vấn đề này từ tận những năm 1970 đến nay! Bài viết này giới thiệu 8 định luật ông phát hiện qua nghiên cứu rất nhiều phần mềm mã đóng. Phần mềm mã đóng có lịch sử từ khoảng 1950, mã mở, outsourcing mới chỉ ra đời mươi năm nay. Do đó để áp dụng cho 2 loại này, có lẽ phải mở rộng thêm 8 định luật này hoặc đưa ra những định luật khác chăng?

Định luật 1: Continuing Change

Phải chỉnh sửa phần mềm liên tục, nếu không mức độ hài lòng của khách hàng ngày càng giảm.

Có 2 lí do, liên quan đến lí thuyết về feedback control system của môn điều khiển học, vì rõ ràng khách hàng là người điều khiển lập trình viên:

  • Khả năng diễn đạt của khách hàng có hạn chế, họ cần x, nhưng lại nói thành y, lập trình viên nghe thành z
  • Nhu cầu của khác hàng thay đổi theo thời gian

Như vậy phần mềm phải tiến hoá liên tục nếu không muốn bị khách hàng xoá khỏi máy.

Định luật 2: Increasing Complexity

Khi tiến hoá, độ phức tạp của phần mềm luôn tăng, nếu không bỏ công sức để làm giảm nó xuống.

Đây chỉ là hệ quả của định luật 2 của môn nhiệt động học bao trùm mọi thứ trong vũ trụ, nói rằng entropy luôn tăng. Điều này có nghĩa chương trình ngày càng béo ra, cấu trúc ngày càng xấu tệ, cần phải refactor.

Định luật 3: Large Program Evolution

Các yếu tố liên quan đến quá trình tiến hoá (ý thích của khách hàng...) tuân theo qui luật phân phối xác suất chuẩn.

Định luật 4: Invariant Work-Rate

Độ ổn định là chỉ số quan trọng trong hệ thống điều khiển. Để đảm bảo độ tiến hoá ổn định, nhân sự cần ổn định qua thời gian.

Ai từng đọc quyển The mythical man-month đều biết: tăng thêm người vào team càng làm cho project đã chậm càng chậm hơn.

Định luật 5: Conservation of Familiarity

Ở mỗi phiên bản mới, phiên bản này chỉ thành công nếu những người liên quan (lập trình viên, nhân viên bán hàng, người dùng...) hiểu rõ sự khác biệt của phiên bản mới so với phiên bản trước. Do đó, phải bảo toàn hệ số góc của đường phát triển. Thay đổi nhanh quá sẽ bà con theo không kịp.

Định luật 6: Continuing Growth

Phải thêm tính năng vào phần mềm, nếu không mức độ hài lòng của khách hàng ngày càng giảm.

Có vẻ giống định luật 1, vì 2 cái nói về 2 hiện tượng khác nhau nhưng không phải không liên quan. Định luật 1 liên quan đến lí thuyết bất định Heisenberg của môn cơ học lượng tử: khách hàng không biết trước để có thể trình bày đầy đủ và chính xác mọi yêu cầu.

Ở định luật 6 thì ngược lại, khách hàng biết rõ họ cần 100 tính năng, nhưng do điều kiện tài chính, thời gian, tay nghề của lập trình viên... họ phải cắt bớt số tính năng xuống còn 60 để version 1 có thể hoàn thành kịp thời hạn. Sau đó, qua thời gian họ sẽ yêu cầu thêm tính năng còn thiếu vào version 2, 3.

Định luật 7: Declining Quality

Chất lượng của phần mềm càng ngày càng giảm nếu không được bảo trì và thay đổi cho phù hợp với điều kiện thực tế.

Theo thời gian mọi thứ đều tốt lên, nên về mặt tương quan, cái nào không tiến hoá sẽ tự động được coi là kém chất lượng. Ví dụ thời bao cấp mỗi tháng có nửa kí thịt thì được là có chất lượng sống cao, nhưng cũng nửa kí thịt đó (không thay đổi) thì hiện nay được coi là đói nghèo.

Định luật 8: Feedback System

Để có thể sửa chữa cải tiến, phải coi qui trình phát triển phần mềm là hệ thống điều khiển kiểu feedback.

Nguồn : Blog cộng đồng về CNTT
For more information: Laws of Software Evolution Revisited (1999)

Scheduling with Quartz

Batch solutions are ideal for processing that is time and/or state based:
  • Time-based: The business function executes on a recurring basis, running at pre-determined schedules.
  • State-based: The jobs will be run when the system reaches a specific state.
Batch processes are usually data-centric and are required to handle large volumes of data off-line without affecting your on-line systems. This nature of batch processing requires proper scheduling of jobs. Quartz is a full-featured, open source job scheduling system that can be integrated with, or used along side virtually any Java Enterprise of stand-alone application. The Quartz Scheduler includes many enterprise-class features, such as JTA transactions and clustering. The following is a list of features available:
  • Can run embedded within another free standing application
  • Can be instantiated within an application server (or servlet container).
  • Can participate in XA transactions, via the use of JobStoreCMT.
  • Can run as a stand-alone program (within its own Java Virtual Machine), to be used via RMI
  • Can be instantiated as a cluster of stand-alone programs (with load-balance and fail-over capabilities)
  • Supoprt for Fail-over
  • Support for Load balancing.
The following example demonstrates the use of Quartz scheduler from a stand-alone application. Follow these steps to setup the example, in Eclipse.
  1. Download the latest version of quartz from opensymphony.
  2. Make sure you have the following in your class path (project-properties->java build path):
    • The quartz jar file (quartz-1.6.0.jar).
    • Commons logging (commons-logging-1.0.4.jar)
    • Commons Collections (commons-collections-3.1.jar)
    • Add any server runtime to your classpath in eclipse. This is for including the Java transaction API used by Quartz. Alternatively, you can include the JTA class files in your classpath as follows
      1. Download the JTA classes zip file from the JTA download page.
      2. Extract the files in the zip file to a subdirectory of your project in Eclipse.
      3. Add the directory to your Java Build Path in the project->preferences, as a class directory.
  3. Implement a Quartz Job: A quartz job is the task that will run at the scheduled time.
    public class SimpleJob implements Job {
    public void execute(JobExecutionContext ctx) throws JobExecutionException {
    System.out.println("Executing at: " + Calendar.getInstance().getTime() + " triggered by: " + ctx.getTrigger().getName());
    }
    }

  4. The following piece of code can be used to run the job using a scheduler.
    public class QuartzTest {
    public static void main(String[] args) {
    try {
    // Get a scheduler instance.
    SchedulerFactory schedulerFactory = new StdSchedulerFactory();
    Scheduler scheduler = schedulerFactory.getScheduler();

    long ctime = System.currentTimeMillis();

    // Create a trigger.
    JobDetail jobDetail = new JobDetail("Job Detail", "jGroup", SimpleJob.class);
    SimpleTrigger simpleTrigger = new SimpleTrigger("My Trigger", "tGroup");
    simpleTrigger.setStartTime(new Date(ctime));

    // Set the time interval and number of repeats.
    simpleTrigger.setRepeatInterval(100);
    simpleTrigger.setRepeatCount(10);

    // Add trigger and job to Scheduler.
    scheduler.scheduleJob(jobDetail, simpleTrigger);

    // Start the job.
    scheduler.start();
    } catch (SchedulerException ex) {
    ex.printStackTrace();
    }
    }
    }

    A trigger is used to define the schedule in which to run the job.
For more information on batch processing visit: "High volume transaction processing in J2EE"

Java EE 6 Highlights

The key features of Java EE 6 (Java Enterprise Edition version 6) are:

Modular Platform - Java EE 6 introduces profiles targeted for particular segment of users like web developers or mobile developers. Java Profiles allows you to select Java EE 6 features to be included in a profile. This allows creating smaller runtime with only the modules and extensions you need.

Extensibility - Scripting languages and extensions are now treated as “first class citizens” and can be easily integrated with the core platform. Third party libraries will be able to self-register.

Annotations across Web API - No more manual editing of web.xml (yeah!).

RESTful web services - Java EE 6 will support creating RESTful web services out of the box.

For more information: Introduction to Java 6.0 New Features

Tomcat - Is this an Application Server ?

Apache Tomcat is one of the most popular options for lightweight development scenarios,and in many cases meets the need for an application server, even though it is technically a Web server.Java EE extends Java Platform, Standard Edition (Java SE) to support Web services, an enterprise component model, management APIs, and communication protocols for designing and implementing service-oriented architectures, distributed applications, and Web applications.

A compliant Java EE application server must support features such as an Enterprise JavaBeans (EJB) server and container; JNDI capabilities; a Java Message Service (JMS) framework; a Java Transaction API (JTA) framework; and J2EE Connector Architecture. Java EE servers usually support a hierarchical classloader architecture enabling such functionality as EJB loading/reloading, WAR loading/reloading, manifest-specified utilities, and so on.

Java EE defines containers for client applications, servlets, and EJB components. These containers provide structure and functionality that facilitate the deployment, persistence, and execution of supported components. The J2EE Connector Architecture enables a provider of an enterprise system to expose the system using a standard interface known as a resource adapter.

Using a Java EE server(Application Server) gives you the convenience of hosting a system in a pre-tested environment that offers all of the Java enterprise development services. In some cases, however, the Java EE server brings unnecessary overhead to an execution environment that only requires one or two of these services.

For instance, many Java-based Web applications are deployed to environments that only support the technologies found in a Web server/container, such as servlets, JSPs, and JDBC. In these scenarios you might choose to construct a system piecemeal, using sundry frameworks and providers.

Some developers would choose to use Tomcat in place of the Java EE application server given these environmental constraints.

Web applications vs. enterprise applications

For some, the confusion over Tomcat’s definition points to the deeper question of what differentiates an enterprise application from a Web application. Traditionally, a Java enterprise application is defined as a combination of the following components and technologies:

* EAR files
* Java Servlets
* JavaServer Pages or JavaServer Faces
* Enterprise JavaBeans (EJB)
* Java Authentication and Authorization Service (JAAS)
* J2EE Connector Architecture
* JavaBeans Activation Framework (JAF)
* JavaMail
* Java Message Service (JMS)
* Java Persistence API (JPA)
* Java Transaction API (JTA)
* The Java Management Extensions (JMX) API
* Java API for XML Processing (JAXP)
* The Java API for XML-based RPC (JAX-RPC)
* The Java Architecture for XML Binding (JAXB)
* The SOAP with Attachments API for Java (SAAJ)
* Java Database Connectivity (JDBC) framework

A Java Web application, meanwhile, is said to combine a subset of Java enterprise application components and technologies, namely:

* WAR files
* Java Servlets
* JavaServer Faces or JavaServer Pages
* Java Database Connectivity (JDBC) framework

In a typical Java EE Web application, an HTML client posts a request to a server where the request is handled by the Web container of the application server. The Web container invokes the servlet that is configured to handle the specific context of the request.

Once the servlet has received the initial request, some form of request dispatching ensues in order to perform the necessary business logic for completing the request. One or more business services or components are then invoked to perform business logic.

Most business services or components require access to some form of data storage or information system. Oftentimes an abstraction layer between the business service and the data store is provided in order to protect against future changes in the data store. DAOs (data access objects) are often employed as data abstraction components in this situation.

When the DAO invocation step is complete, the response data is passed back up the chain of command, usually as one or more Java beans. The Java beans are then passed to some type of state machine and/or view manager in order to organize and format the markup response. When processing is complete for a given request, a formatted response is passed back to the HTML client.

Now, suppose we add a requirement to the application for asynchronous messaging between business service components. In a Java-based system, this would typically be handled using the Java Message Service (JMS) as shown in figure :














Most Web servers do not offer JMS as a standard feature, but it is simple enough to add a JMS implementation to a Web server environment.

The application scenario depicted in Figure above could be handled quite easily with just a Web server providing a servlet engine and JSP engine.

Now we add the requirement for connectivity between business services and disparate enterprise information systems. Java EE offers the Java Connector Architecture as a common standard to meet this challenge.














The architecture is now approaching a complexity that is arguably better suited for a Java EE application server.

A Web server such as Tomcat could possibly be used in combination with other frameworks to meet the requirements, but system management and monitoring complications might make the server/framework mix impractical.

Figure presents a moderately complex, Java-based, service-oriented architecture employing all of the technologies along with communication between multiple WAR deployments, EJBs, and Web services.





















The architecture in Figure above has entered the realm of complexity that requires a tested, scalable, manageable Java EE enterprise application server. Once again, a development team with the proper skill level could use Tomcat for the Web tier and piece together technologies and frameworks to support the business and data tiers.

What i personally feel is to support this type of architecture using web server is Impractical. But Most of the tasks that are involved in J2EE environment can be supported by Apache Tomcat Web Server !!

Debug Tomcat HowTo

Step 1: Add these lines to [CATALINA_HOME]/bin/startup.bat:

SET JPDA_TRANSPORT=dt_socket
SET JPDA_ADDRESS=8000

and change this
call "%EXECUTABLE%" start %CMD_LINE_ARGS%
to this
call "%EXECUTABLE%" jpda start %CMD_LINE_ARGS%

Step 2: Startup Tomcat
Step 3: In Eclipse, go to Run | Debug... | Click on 'Remote Java Application' | New | Type in a name for the configuration, Select a project(keel-server) | Click Debug

Code search engines that you should know

Reusing the code/frameworks ( either the in public domain or FOSS licensed code) is pretty common. But searching for useful code online is not very easy.The regular search engines like Google or Yahoo are not designed for code searching. Now there are few specialized code search engines which can fetch better result. Here are the top five code search engines:

1. Google Code Search
2. Krugle
3. Koders
4. Oreilly Code Search
5. CodeBase
6. CodeFetch

All the above engines allow you to search on Language (like Java, C etc) and license (like GPL, MIT etc). Except for oreilly code search, rest of them search the internet. Oreilly code search contains the code from their books.Which currently contains over 123,000 individual examples, composed of 2.6 million lines of code all edited and ready to use. CodeFetch allows you to search all source code examples included in all books on all languages.

But make sure about license terms and conditions before reusing the code.

Productivity for Software Estimators

1 Introduction

Software estimation, namely, software size, effort, cost and schedule (duration) are often causing animated discussions among the fraternity of software estimators. Normally, it is the senior Project Leaders and project Managers who carry out this activity.

Software development consists of a few disparate activities needing specialized knowledge, namely, in Requirements Gathering, analysis, and Management; Software Design, Coding, Independent Verification and Validation, Rollout / Deployment / Installation & Commissioning. Each of these activities is carried out by a differently skilled person using different tools, having different complexities.

2. Productivity

Productivity is defined as the rate of output for given inputs. Productivity is expressed as “so many units of output per day” or “so many units or output per hour”.

Productivity is also defined as the ratio of output to input.

For the context of this paper, Productivity is defined as the rate of producing some output using a set of inputs in a defined time unit.

3. Concerns with Software Size Estimation

The present scenario in the industry is that we have multiple measures, namely,

1. Function Points
2. Use Case Points
3. Object Points
4. Feature Points
5. Internet Points
6. Test Points
7. FPA mark II
8. Lines of Code
9. Etc.

There is no accepted way of converting software size from one measure to another.

One odd aspect of these measures is that the size is adjusted (increase or decrease) due to factors of complexity etc. A size is something that does not change. For example, a pound of cheese does not alter if the person weighing is less/more experienced, or the scale is either a mechanical scale or an electronic one – right?

Or the distance of one mile remains one mile if a young person is walking or an old man is walking or if it is a freeway or if it is a busy city street.

But the rate of achieving changes – an old man completes one mile slower than the younger one you go faster on a freeway than on a busy street.

There is no agreement on how to count Lies of Code – logical statements or physical statements, treatment of inline documentation.

These are some of the issues with size measurement.

4. Concerns with Productivity

The software development world is obsessed with giving one single, empirical, all-activities-encompassing figure for productivity.

Attempts have been made to give productivity figure as – such as 10 person hours per Function Point but with a rider that it could vary from 2 to 135 depending on the product size and other factors.

Some times ranges are given – such as 15 to 30 hours per Use Case Point.

Some times empirical formulae are worked out depending on a set of factors – such as in COCOMO.

Another aspect is that these productivity figures lump all activities – requirements analysis, design, review, testing etc – in one single measure. The skill requirements for these activities are different, the tools used are different, the inputs are different, outputs are different – lumping them all together under the head “Software Development” and giving one single figure of productivity at best can only give a very rough estimate but never an accurate one.

5. The Productivity Path

We have the following activities in software development –

1. Pre-project activities
a. Feasibility study
b. Financial budgeting
c. Approvals – financial and technical
d. Project go-ahead decision
2. Project startup activities
a. Identifying project manager
b. Allocating project team
c. Setting up development environment
d. Project Planning
e. Setting up various protocols
f. Service level agreements and progress reporting formalities
g. Project related training
3. Software engineering activities
a. User requirements analysis
b. Software requirements analysis
c. Software design
d. Coding and unit testing
e. Testing – integration, functional, negative, system and acceptance
f. Preparing the build and documentation
4. Rollout activities
a. Installing the hardware and system software
b. Setting up database
c. Installing the application software
d. Pilot runs
e. User training
f. Parallel runs
g. Rollover
5. Project cleanup activities
a. Documenting good practices and bad practices
b. Project post mortem
c. Archiving records
d. Releasing resources
e. Releasing the project manager
f. Initiate software maintenance

Now, when we talk of industry thumb rules of productivity, we are not clear as to how many of the above activities are included in the productivity figure.

Interestingly, no one would like to stake his life on the productivity figure – industry thumb rule – that is being floating around!!

Look at the nature of these activities –

1. Requirements analysis – here it is understanding what the user needs, wants and expects and documenting the same so that the software designers understand them and can design a system strictly in conformance with the stated requirements. There is a lot of dependence on external factors.
2. Software design – considering the alternatives of hardware, system software and development platforms, arrive at the optimal one, design an architecture that will meet the stated requirements and fulfill expectations and yet feasible with the current technologies and document the design in such a way that the programmers understand and deliver a product that conforms to the original specifications of the user. There are quite a few alternatives and this is a strategic activity and errors here have strategic consequences.
3. Coding – developing software code that conforms to the design and is as failure-free as possible – it is so easy to leave bugs inside!!
4. Code review – walking thru code written by another programmer and deciphering the functionality and try to guess the possible errors
5. Testing – trying to unearth all the defects that could be left in the software – it is an accepted fact that 100% testing is impossible!

Now with such variance in the nature of activities, it is obvious that the productivity of all these activities is not uniform. The pace of working differs for each of these activities.

These activities do not depend on the amount of software code produced but on other factors – such as –

1. Requirements depend on the efficiency and clarity of the source of requirements – users or documentation
2. Design depends on the complexity of processing, alternatives available and constraints within which the functionality is to be realized
3. Code review depends on the style of coding
4. Testing depends on how well the code is written – more errors are left, it takes more time to test and re-test
5. Coding itself depends on the quality of design

Therefore, we need to have separate productivity figures for each of these activities.

Drawing a parallel from the manufacturing industry, for punching hole in a sheet –
i. Machine setup
ii. Tool setup
iii. Load job
iv. Punch hole
v. Deburr hole
vi. Clean up
vii. Deliver the sheet for next operation

If multiple holes are punched, “per hole” time comes down, as setup activities are one-time activities.

If we look at “coding a unit” – the activities could be –
i. Receive instructions
ii. Study the design document
iii. Code the unit
iv. Test & debug the unit for functionality
v. Test & debug the unit for unintended usage
vi. Delete trash code from the unit
vii. Regression test the unit
viii. Release it for next step

Similarly, we can come up with micro activities for each software development phase.

5.1 Empirical or study-based Productivity figures?

Each of these activities has a different rate of achievement. We have to establish standard times for each of these activities. Then using the Work Study techniques like Synthesis or Analytical Estimation, we need to arrive at the over all time to complete the job.

Whether we use time study techniques to arrive at individual productivity studies or gather empirical data – to answer this query, we have to acknowledge that software development is not totally mechanical in nature nor is it totally creative in nature. Work Study acknowledges that it is not practical to time activities that have a creative component. Lots of work is being undertaken in the matter of “white-collar-productivity” and perhaps future may provide some methods to “time” software development productivity figures. As of present, empirical data seems to be the solution.

Where do we get data for this? One way is the Time Study using the Industrial Engineering techniques. Second and more easier, as well as reliable, is from historic data from the timesheets.

Most timesheet software available and is being used by the industry are oriented towards payroll and billing rather than capturing data at micro level so that it can be used for arriving at the productivity data. Most timesheets capture data at two, or three levels (project is always the first level, second and third can be module & component or component & activity or a similar combination) in addition to date and time. The timesheet needs to capture at five levels, namely, project, module, component, development phase, and the task accomplished – in addition to date and time for each employee. Thus data would be available to establish productivity data empirically in a realistic manner.

The present focus is on macro productivity – for all activities of software development. This needs to change and we need to shift our focus from macro to micro – productivity for all activities. The way to achieve is to modify our timesheet.

Benefits of productivity at micro level are –

i. Better predictability of software development
ii. Better quality estimates for pricing assistance during project acquisition/sanction stage
iii. More precise target setting while assigning work, which leads to better morale in the software developers
iv. More accurate cost estimation


6. Conclusion

The conclusions are that we need to shift focus from macro productivity to micro productivity; empirical data gathering is preferred for arriving at productivity figures and that improvement of timesheet is the way forward for computing micro level productivity figures.