Why do I document Code?



Hello and welcome to another session of why do I document code. Today's contestants are:

1. The Software Requirements Document otherwise known as the SRD
This valuable little document tells the developer what to develop. Is was started by the Carnegie Mellon. It is used as a contract document between the developers and the customer. The customer starts the document by what they expect the program to do. Everyone knows that the customer always changes their mind, well if you use the SRD, they are held by a legally binding contract that specifically states what to develop. You as a developer don't develop anything else except for what this document states. Therefore if the customer changes their minds, well you can either point back to the SRD or decide to charge them more money.

2. The Database Maintenance Manual otherwise known as the DMM
This handy dandy contestant describes every little feature about your applications database. IT describes the tables, the columns, the attributes of the columns, the generated script of the entire database, the user logins, the ways too install and upgrade the database on another machine, the DTS and last but not lease etc... This basic manual describes in detail every single part of the database. The reason for this is if you had a total hardware melt down and nothing works, well you now have a copy of the database that can be recreated using the script that was generated and put inside the file.

3. The Software Design Document also known as the SDD
This massive document describes all the methods, namespaces and functionality of the Code. IT also describes the developers thoughts and opinions to why they code the application one way compared to another. When I say everything, I mean everything. This document has all the developers thoughts and opinions when they were designing and developing the code. Thank god most comments can be extracted via an XML parser. The XML parsed comments can even put it into a nice little help file just like MSDN.com. Where can you learn how to write one, well let me tell you. Our good friends(Not really at all) at Bit Formation has made a great tutorial on how to write one.

4. The User Guide
The user guide plain and simple is the thing users use to get around the application. Every little thing that was EVER created by man has some sort of user guide attached to it. These are a no-brainer, but long and tedious to write just like the other documents listed here today!

Now that you know our contestants, lets find out why you would do such a thing.

Alright, enough with the game show. I thought it would be a good starter. I completely agree that all the documents though rather tedious and considered a time waster by developers is a necessary part of life. Developers need to both COMMMENT CODE and write documentation. That is the way it should be and should end up. Documents are there in case you as the developer get into some kind of horrific accident and are no longer able to continue on. They must find someone else to keep going. Sorry, but that's the way life is. You are writing the documentation incase you have to be replaced. I currently work on a 20 year application and I know for a fact that I will not be working on this same application for another 20 years. I just won't do it. It is too boring and mundane. I do know that some day, they will hire another guy or girl who will have to continue my work and when that day comes, the documents are there.

Things must be documented.

Most Common TCP Ports

Ports are basically divided into three ranges: the Common Ports, the Registered Ports, and Private Ports.

The Common Ports are those from 0 through 1023.
The Registered Ports are those from 1024 through 49151
The Private Ports are those from 49152 through 65535

Common Ports
The Common Ports are assigned by the IANA and on most systems can only be used by system (or root) processes or by programs executed by privileged users.
Ports are used in the TCP [RFC793] to name the ends of logical connections which carry long term conversations. For the purpose of providing services to unknown callers, a service contact port is defined. This list specifies the port used by the server process as its contact port.

Port Assignments for Common Ports:

Port UDP TCP Definition
7 x x echo
9 x x discard
11 x x systat
13 x x daytime
17 x x quote of the day
19 x character generator
20 x ftp - data
21 x ftp - control
23 x telnet
25 x smtp mail transfer
37 x x timeserver
39 x rlp resource location
42 x x nameserver
43 x nicname whois
53 x x dommainlein name server
67 x bootpc bootstrap protocol
68 x bootpc bootstrap protocol
69 x tftp trivial file transfer
70 x gopher
79 x finger
80 x http
88 x x kerberos
101 x hostname nic
102 x iso-tsap class 0
107 x rtelnet
109 x pop2
110 x pop3
111 x x sunrpc
113 x identification protocol
117 x uucp
119 x nntp
123 x ntp
135 x x epmap
137 x x netbios - name service
138 x netbios - dgm
139 x netbios - ssn
143 x imap
158 x pcmail - srv
161 x snmp
162 x snmptrap
170 x print - srv
179 x border gateway protocol
194 x irc internet relay chat
213 x ipx
389 x ldap
443 x x https (ssl)
445 x x microsoft - ds
464 x x kpasswd
500 x isakmp key exchange
512 x x remote execute
513 x x login / who
514 x x shell cmd / syslog
515 x printer spooler
517 x talk
518 x ntalk
520 x x router / efs
525 x timeserver
526 x tempo
530 x rpc
531 x conference chat
532 x netnews newsreader
533 x netwall
540 x uucp
543 x klogin
544 x kshell
550 x new - rwho
556 x remotefs
560 x rmonitor
561 x monitor
636 x ldaps over tls/ssl
666 x x doom id software
749 x x kerberos administration
750 x kerveros version iv
1109 x kpop
1167 x phone
1433 x x ms - sql - server
1434 x x ms - sql - monitor
1512 x x wins
1524 x ingreslock
1701 x l2tp
1723 x pptp point to point
1812 x radius authentication
1813 x radius accounting
2049 x nfs server
2053 x kerberos de - multiplexor
9535 x man remote server

How JSF fits for Web Application

JSF is a framework of best choice for the web applications development because of its support for wide range of qualities like:

Standard Java framework

Easy creation of UI

Capacity to handle complexities of UI management

Clean separation between presentation and logic

Shorter development cycle

An extensible architecture

Support for multiple client devices

Flexible rendering model

International language support

Robust tool support

The article explains all these points clearly. Reading this article will make you clear the idea why JSF is the best selection for developing web applications of different range of complexities.

Read the article at:
How JSF Fits For Web Applications?

7 Strategies for Unit Testing DAOs and other Database Code

I don't care what all the tutorials say, unit testing in the real world is tough. Well, let me rephrase that. Creating a unit test is easy. Creating a repeatably-passing unit test is not.

The problem is, just about all enterprise applications are dependent on an external database, and while a database itself isn't anathema to unit testing, the data inside it is. The reason is that, unavoidably, unit tests of an enterprise application must refer to specific data in order to test functionality, and this specific data is susceptible to change. What typically ends up happening is that earnest developers create unit tests that work at time t, and then at t+1 the underlying data has changed and subsequently a seemingly random assortment of unit tests fail - the ones that depended on the exact data from time t. Ouch. The inevitable fall-out is that developers must then either (a) investigate the failures and fix the unit tests, (b) delete the failing tests, or (c) live with something less than 100% passing unit tests. Not a good choice in the lot.

In my experiences and research, I've found about 7 different strategies for handling data in unit tests, and each one has different costs and constraints associated.

1) Mocks and Stubs

2) Development Database

3) Unit Test Database

4) Local Database

5) In-memory Database

6) Data-independent Tests

7) Non-durable Tests

And, yes, all but solution 1 would technically be considered integration tests, not unit tests. I will try to name them appropriately in this article, but if I slip up, you know what I mean.

Mock and Stubs

Most TDD purists would probably argue for the importance of completely isolating the *unit* to be tested (or "system under test"), and to achieve this, any class that the unit depends on must be mocked, or stubbed, or otherwise faked. For unit testing DAOs, which of course depend on the database, this means that the database itself must be mocked/stubbed. Doing this has its advantages, namely performance and isolation, as mentioned, but comes at a significant cost to complexity and maintainability - writing mock JDBC layers and fake data to be processed by DAOs is not a trivial task.

Even if you did go ahead and create this testing infrastructure, the real meat of the DAO (the SQL, HQL, or mapping) still wouldn't even be exercised by the test, since it's hitting the fake layer and not the real database. If a unit test of a DAO won't tell you that your SQL/HQL isn't valid, then, really, what good is it?

Essentially, mocks and stubs are a tried and true solution for unit testing in general, but with regards to a database dependence,they don't seem to be a workable alternative.

Development Database

This is the path of least resistence. Rather than spend hours mocking out database layers and creating fake data, why not just hit an actual database, and exercise the real queries, mapping files, etc? And further, since the development database (i.e. the central database the team uses for testing code in-development) already exists and has realistic data, why not just use that? Good plan, right? Well, for creating unit test that work *now*, yes, this is a great, low-cost solution. For creating repeatable unit tests, not-so-much.

The problem, of course, is that development databases are used by everyone, and so data is very volatile (proportional to the number of developers/users and rate of code change). You could attempt to control the volatility by using data population/clean-up scripts (e.g. DbUnit) that are run on the setUp() and tearDown() methods of your unit tests, but this solution too has cracks, as any client of the database manipulating data at the same time you run your test could easily muck things up. For instance, what if Joe in the next cube deletes the customer your unit test expects? Essentially, if a test fails, it could be because something is legitimately broken, or it could just mean that some user of the database unknowningly deleted the row your unit test was depending on. You just don't know.

Because of the lack of isolation in a development database, this is not a viable solution for creating repeatable, data-dependent unit tests.

Unit Test Database

A separate, central database created and managed specifically for the purpose of running unit tests would provide a degree of isolation more for running unit tests than a development database, and so therefore increases the chances that tests would run successfully at future points in time. Again, by using data population/clean-up scripts prior to and after running unit tests, the data conditions that the tests depend on could be assured. Further, by rolling back transactions at the completion of a test and thereby not modifying the data, you can feel confident that your unit tests are not stepping on the toes of other unit tests running at the same time.

There are problems with this solution, however. First, if unit tests have data that is specifically inserted and deleted for *each* test (i.e. in the setUp and tearDown methods of the test) rather than one central set of data for all tests, then multiple unit tests run at the same time could cause conflicts. For instance, if test1 tests that a "findBy" method returns 5 records, but test 2 inserts a new record in its setUp() that gets picked up by the "findBy", then test 1 would fail. The solution, of course, is to use one central data load script for all unit tests, which typically isn't too onerous.

The second problem is that in most enterprise environments, the database isn't owned by the application developers. Creating and maintaining a separate database just for unit testing means convincing (pleading with, cajoling, etc.) the DBAs...and in my experience, unless there's a very strong advocate of unit testing in the management ranks, it just ain't gonna happen.

The bottom line is that this is a good middle-of-the road solution - it has the advantage of being easy to manage (from the application developer's perspective), but still can suffer from data volatility problems which can reduce the repeatability of tests.

Local Database

Maintaining a separate database instance on each developer's local machine is the most optimal solution in terms of ensuring data durability, but the most costly in terms of investment in infrastructure/configuration and the most unrealistic in typical enterprise environments. On the up-side, with a local database, a developer can be extremely confident that the data that his unit test depends on will not change. Further, if one data-population script is checked in to source control and used by all developers, then developers can be sure that data on one developer's machine matches data on another - and one step further, a unit test that runs on one machine should run on another. Most excellent.

What's necessary from the application developers perspective are a few things. First, of course, each developer needs a instance of the database on their machine - which of course entails that each developer take some time to set up and adminster the database, but also could present some problems with licenses depending on your DBMS. Second, a DDL script that populates the structure of the database and DbUnit or SQL scripts that populate the base data. And third, these scripts need to be plugged in to some Ant targets that execute the creation, population, and removal of the database. These are all very achievable tasks.

Unfortunately, if it would be tough to convince a DBA of the "Unit Test Database" option, it'll be down-right impossible to convince him of generating and maintaining a DDL for creating the structure of your local database for the purpose of unit testing. Again, in my experience, DBAs are fairly protective of "their turf", and won't jump at the chance to change their processes and approaches to suit your "touchy-feely", agile practices.

Additionally, having each developer maintain their own instance of the database can provoke problems of data synchronicity and general maintenance chaos. For example, when your unit CustomerDAOTest runs but mine does not, I'm forced to wonder, "do you have different data than me?", "did I configure my database correctly?", "do I need to check out the latest DDL or DbUnit scripts and re-build my database"?

All things being equal, if the DBAs are receptive and developers are competent enough to handle the additional complexity, this is the most optimal approach, in my opinion.

In-Memory Database

If you have the fortune of working with an OR mapping framework, using an in-memory database is a very attractive option. Unfortunately, in many enterprise environments, SQL, stored procedures, triggers, and the like aren't going anywhere.

Data Independent Tests

If you find yourself in an environment where none of the approaches above work, but you still dream of a day with repeatable unit tests, then a very sub-optimal solution is to connect to the development database but merely lower the bar for your unit tests. For instance, instead of asserting that a "findBy" method returns exactly 5 records, just assert that the query does not bomb. Though this does very little to verify whether the actual functionality of the DAO or class still works, it at least informs that the mapping is correct - i.e. the database hasn't changed underneath your feet. In some environments, this alone provides enough value to write unit tests.

Again, this severely limits the coverage and power of the test, it still allows for the ability for unit tests to be assembled into suites, run in a nightly build, and pass with a 100% success rate independent of whether the data has changed. That's something, right?

Non-Durable Tests

In some camps, I imagine it'd be blasphemous to suggest writing unit tests that are non-durable (i.e. may break down-the-road if data changes), but it is an option. Unit tests serve many purposes, one of which is to give confidence that a change (refactoring or enhancement) made to the system has not broken old functionality. To realize this benefit, indeed, it's probably necessary to have an assemblage of unit tests that run with 100% success - run the tests, make the change, run the test, you're done.

There are, however, other, less-widely touted benefits of unit tests, and these benefits can be achieved even without a 100% success rate. Firstly, unit tests allow developers to test isolated pieces of a system without having to go through the interface. Without a unit test, a developer who wants to test a query in a DAO, literally must write the query, the DAO, and then every piece on top of it all the way up to the interface, compile and deploy the application, navigate to the part of the system that executes the query, and finally run the function. Whew. A unit test, checked into souce control, even if it doesn't pass, at least gives a maintenance developer a driver of that piece of code so he can execute it from his IDE. This alone is helpful.

Secondly, unit tests, regardless of whether they pass, can also serve as semi-decent documentation of code - describing what parameters to pass in, how to set up pre-conditions, etc.

This is an extremely unsatisfying solution, but it could be argued that there is some value in it.

Conclusion

From my experiences and research, these seem like the prevalent approaches for unit testing database dependent code. I'm quite sure I'm missing options, or missing specific costs, benefits, or nuances of the options I have described. Let me know what you think!

The Search for a Good UML Tool

Over the past couple weeks, I've dug into a few of the more popular UML tools out there, hoping to find out which to use at either end of the complexity spectrum, the small startup or the big corporation. What I found is that such a quest isn't so straight-forward. There are entirely too many UML tools to choose from, and it's amazing (at least to me) that a little marketplace natural selection hasn't yet weeded out the good from the bad from the ugly. But such is the state of the UML world: chaotic...and as I found out, a bit disappointing. In the end, however, there are some good alternatives, but not surprisingly, one size does not fit all.

My Approach

With so many tools on the market, it's just not possible to look at each one. [1] Instead, you've got to define your requirements up front, and then take a reasonable sample and see if you can find one that satisfices. This was my approach...

Breaking it down, developers essentially need a UML tool for three things: (1) to design something from scratch, (2) to understand some existing piece of code, and (3) to maintain documentation for an existing system or component.



For each of the these "use cases", different features are required from a UML tool. When designing from scratch (use case 1), I need the tool to help me easily create different perspectives of the system, share these diagrams with my colleagues, and perhaps forward engineer these diagrams into source code. When using a UML tool to understand some existing piece of code (use case 2), I need the tool to reverse engineer my code (without sucking the life from my computer), and then let me drill into the model to understand the relationships or structures that are germane to my task. And finally, when I'm maintaining documentation for an existing system or component (use case 3), I would like the UML tool to reflect any changes in the source code from the last time I've opened my model. For example, if I create and save a diagram at time T1, change the underlying source at T2, and then reopen my model at T3, I should see my changes from T2 in my diagram at T3.

By use case, here are the features I'm looking for:

1. design: intuitive interface, UML 2.0 support, export to image, forward engineering
2. understand: reverse engineer, not overly resource intensive, information hiding, dependency illumination
3. maintain: code-to-model synchronization, IDE integration


Most developers, in my experiences, only really need support for use cases 1 and 2 - they create diagrams to help them clarify or communicate design thoughts at the present time, and they capture these thoughts in some formal documentation (on a good day!). However, they typically aren't as concerned with maintaining their models. If something in the code has changed that affects the diagram, they'll just re-create (or re-reverse engineer) the diagram - they don't necessarily need the tool to keep in synch with the underlying code base. For my client, however, this wasn't going to work. They needed the code-to-model synchronization that comes with that 3rd use case. Updating documentation was a priority, and so the barriers needed to be as low as possible to do so.

So going off a few years of UML experience, a little internet research, and some conversations with friends, I narrowed my search to six tools: Poseidon, StarUML, Together for Eclipse, Omondo, Sun Java Studio, and MagicDraw. I put each of these tools through their paces, and here's what I found.

Poseidon

I fell in love with poseidon as soon as I opened it. First of all, there's a free version that is very capable, which is great. The interface seemed very usable, there were a bevy of nice features, and the GUI effects were very cool. But the price you pay, it seems, is performance. Even with a relatively small project (a few hundred classes), it was pretty memory intensive and slow. In the age of instant gratification, waiting a few seconds to drag-and-drop a class on a class diagram is too much. Yes, I'm impatient. Overall though, I got a good feel from poseidon. For a small college-type project, great. Not ready for prime time though, in my opinion.

StarUML

Another free UML tool, and this one was fast. Opening a project, dragging and dropping, rearranging - lightning quick. It even seemed to scale a little better to larger projects. The problem was, at least for me, that the interface was a little perplexing. I think this is because they are using Irrational's, errr....I mean Rational's terms and approach - which if you're not drinking their kool-aid is very counter-intuitive. For example, you can store your diagrams in one of five model types: the Use Case Model, Analysis Model, Design Model, yada-yada. This is just confusing. The graphic components are a bit cumbersome to use as well, and I found myself spending way too much time rearranging boxes and lines. All in all though, there are some nice things about it.

Together for Eclipse

I've used past versions of Together from Borland, and I have to admit, the tool is pretty bad-ass. With this version, they went the extra-mile and integrated with Eclipse, a very laudable effort. The problem is that they don't just give you a plug-in to hook in to your Eclipse, they give you their version of Eclipse instead. What the...? Well, I don't want to have to use two different Eclipses, and I don't want to be locked in to Together's Eclipse, so I guess I'm done here...

Omondo

There are two versions of Omondo available. The first is free, and pretty capable...until you realize that your model is stuck in your Eclipse. There's no image export, or even save available. Obviously this won't work, so I tried their Studio version. Bingo. This tool is very nice. First, it's plugged in to Eclipse, so round-trip engineering is all there (and so simple). The UI is intuitive, and I found some nice features for reverse-engineering (finding dependencies, associations, etc.). All that, and it's fast too. The only downside is that it adds annotations to your source - so for a legacy system where you don't want to be responsible for changing so much existing code, this might not work.

Sun Enterprise Java Studio

Admittedly, I did not dig quite as deep into Sun's product. I guess I'm settled on Eclipse (for good or bad), and I just couldn't see switching between two big-time IDEs. Overall though, and some friends confirmed this, it seems a bit buggy, slow, and confusing (especially the reverse engineering). There are some nice features...and it is free, but it wouldn't be my first pick.

MagicDraw

I found this to be a great all-around tool. It doesn't integrate into Eclipse and it didn't seem to solve the code-to-model synchronization problems from my 3rd use case, but all else was very solid. The UI features were quite helpful. For example, in a class diagram you can click on any class and automatically add to the diagram all related types. Awesome. It's also pretty quick even with a larger project, and not too expensive to boot. I found that MagicDraw recently won a reader's choice award from JDJ, and I can see why.

Summing Up


In the end, every tool that I looked at will handle use cases 1 and 2 - designing from scratch and understanding some existing code. If this is all you need, and your project small and budget tight, poseidon will do the trick. It's free, intuitive, usable, and has some "good-enough" features. If, however, you need that 3rd use case of maintaining your documentation (code-to-model synchronization), then the only two tools of the seven that'll do are Omondo and Together. Of these two, Omondo is superior if only because of the better integration with Eclipse. Finally, don't overlook MagicDraw if you need something more robust than poseidon, but less so than Together or Omondo. The price is reasonable, the tool is fast and very capable.

10 tools for Modern PHP Development

A simple list of tools for modern PHP development. There are alternatives to most of the tools, but I’ll list native PHP tools wherever possible.

1. PHPUnit

PHPUnit is a testing framework belonging to the xUnit family of testing frameworks. Use it to write and run automated tests.

2. Selenium RC

Selenium RC can be used in conjunction with PHPUnit to create and run automated tests within a web browser. It allows tests to be run on several modern browsers and is implemented in Java, making it available to different platforms.

3. PHP CodeSniffer

PHP CodeSniffer is a PHP code tokenizer, that will analyse your code and report errors and warnings based on a set of Coding Standards.

4. Phing

Phing is a project build tool and is a PHP port of the popular Java program ant. Phing can be used to automate builds, database migration, deployment and configuration of code.

5. Xdebug

Xdebug is a multi-purpose tool, providing remote debugging, stack traces, function traces, profiling and code coverage analysis. Debug clients are available in many PHP IDEs and even plugins so you can debug from everybody’s favourite editor vim.

6. PHPDocumentor

PHPDocumentor is an automated documentation tool, that allows you to write specifically formatted comments in your code, that can be brought together to created API documentation.

7. phpUnderControl

phpUnderControl is a patch for the popular Continuous Integration tool, CruiseControl. Together with the previous six tools, phpUnderControl gives you a great overview of the current state of your application/codebase.

8. Zend Framework

Frameworks facilitate the development of software, by allowing developers to focus on the business requirements of the software, rather than the repetitive and tedious elements of development, such as caching. There are plenty of frameworks to choose from, but I particularly like the Zend Framework.

9. Subversion

Subversion is a revision control system that has superceded CVS. If you’re writing software of any kind, you shoud be using version control software.

10. Jira

So I could have named one of many, but this is the one I’ve liked the most recently. Jira is a bug/issue tracking software package and can also help with project management in terms of goals and roadmaps. Most issue trackers link to version control repositories, such as Subversion. Only downside to Jira is that it costs for non open source projects.

I’m pleased to say that with a little bit of pushing and persuasion by myself, we are currently using all of these technologies with the exception of Jira, we have a bespoke issue tracker.

What do you think to the list? Anything I have missed? Any alternatives you prefer?