Tom's Blog
Weighing risks of storing files in a content management system
Published by Tom |
February 27, 2009 11:29 AM EST |
I was cleaning out a desk drawer last weekend and found a few old 3 1/2 inch floppy disks. The discovery made me realize that in order to read the data off those disks, I would have to pull the floppy drive from an old computer and install it in a functioning computer -- and hope the new computer had the appropriate data connector, or that I could find an adapter.
The discovery of the floppies, and the realization that the data couldn't easily be read, is parallel to a situation I face today as I debate moving my document and media files to a content management system. The risk today, just like with the floppy disks, is that I might end up storing important data in a format that later becomes unreadable.
I have been contemplating moving a bulk of my personal text files, and perhaps even multimedia files, to an Alfresco CMS running on my home server in order to access my documents conveniently from anywhere. I have been using Evernote to store text notes and web clips, and I looked at Google Docs as an option. Both are great services and dead-simple to manage, but these online services don't provide some of the conveniences of Alfresco.
Alfresco is a free, open source Java web application that is slowly becoming a Swiss army knife for managing content. One of Alfresco's compelling features is the wide variety of file-access protocols it offers to manipulate documents stored in its repository. Alfresco's documents can be accessed via its web client, Java APIs, and CMIS, sure. But more interesting for my current needs is that documents stored in Alfresco also can be read, written and deleted from other computers on the LAN using a CIFS/SMB shared drive, over the web using WebDAV, using NFS, and even FTP.
Because Alfresco can expose its managed content using so many industry-standard protocols, I thought storing my files inside Alfresco would make it easier for me to access my documents no matter where I was, without adding the need for a specialized client application or web connection. I could use a CIFS shared drive at home to access documents from any of my home computers. I could access the documents securely from work using WebDAV over SSL. And I could access documents from a friend's house by logging into my home server from a web browser using Alfresco's native web application. My documents would be stored at home, but also available "in the cloud." I could even use Alfresco's feature to emulate a SharePoint server to version and share my MS Word and other Office documents from Office applications.
Making my documents this accessible would be convenient and (on the geeky side of) cool. I do have a strong concern about whether I want to risk exposing my documents to Internet hackers. But the longer term concern is will I find myself wanting to access my files someday without using Alfresco? Will my content repository one day become the equivalent of a floppy disk?
The risk of storing data in a format that later becomes unreadable is not new, and the problem grows as more of our lives become digitized. I remember a few years ago hearing Grady Booch describe his work preserving seminal software for the Computer History Museum and his labor of love, the Handbook of Software Architecture. He mentioned that software that should be preserved for historical and educational reasons is sometimes stored in once-popular paper or magnetic formats that are difficult to read today. The Library of Congress has been concerned with what digital formats it should use in order to store its electronic archives.
Alfresco stores its content as regular files, which is good. However, those files are named using globally unique identifiers rather than the original file name. The stored documents are mixed with other files used by Alfresco for versioning and other purposes in a series of numbered subdirectories. Do I want to rely on Alfresco being the required middleman to give me the files I need? Using the digital media sustainability factors used by the Library of Congress to rate digital preservation, I would rate Alfresco's storage like this, with High meaning good for sustainability:
- Disclosure: High
The files are stored in your native format like ext3 or NTFS. Alfresco itself is open source, and it runs under Java, which can be run on nearly any modern operating system. - Adoption: Low
Despite Alfresco being powerful and free, the file organization and metadata formats are unique to Alfresco. - Transparency: High
Alfresco stores files as regular files, albeit buried within its own directory organization scheme, and the file metadata is stored in a relational database of your choosing. - Self-documentation: Low
Alfresco separates file contents from its metadata using a proprietary storage scheme. Reuniting the two requires Alfresco. - External dependencies: Medium-to-High
Retrieving file data with its metadata requires Java, a web application server, Alfresco, and the database used to store the metadata. Since I use the open source MySQL as my database, and all other dependencies are open source, the external dependencies can be easily assembled. But it would be a pain. - Impact of patents: High
I think all the technology needed to retrieve the data is unencumbered by patents. - Technical protection mechanisms: High
Alfresco's files are stored on the file system without alteration, so no translation or decryption is needed.
But, hmm, how to find that fire insurance policy while I'm at the local motel's shared lobby computer. Yes, it will be possible to find the file I need through search tools, or by opening the files one by one -- and in the case of binary image files, by changing the file extension from Alfresco's ".bin" to whatever format the file really contains so I can open it with the proper application. But getting my files out of the Alfresco repository, with the file name and directory structure with which the files are usable, will not be as easy as plugging the disk drive into someone else's computer and opening the file with a text editor. It will be in unexpected situations like this when I will have wished I had kept my files stored as regular files in regular directories and just used Samba.
That's where I am now, weighing the advantages of storing my files in a content management system versus the disadvantages and risks. I'm guessing many businesses go through this same struggle whenever they adopt a content management system for their documents. Once a company switches to a content management system, they must jump in with both feet and live with the benefits and problems of storing their documents inside an electronic vault controlled by a piece of non-standard software. At least with Alfresco, the process is reversible through its CIFS interface, and less scary because of its open source nature.
Maybe my solution will be to use Alfresco but to backup my content repository using the CIFS interface. That way, my backups are independent of Alfresco and I preserve the files with their original names and directory locations. I'd lose any extra Alfresco metadata stored with the files, any versioning, any software triggers or rules associated with the files. But I'd still enjoy Alfresco's benefits on my live file system. If you have faced and solved a similar situation when using a content management system, your comments are welcome.
Friday February 27, 2009 Permalink
Comments [0]
Finally got Tomboy working in Fedora 10
Published by Tom |
February 24, 2009 09:21 PM EST |
After installing Fedora 10 last month, I finally got the
Tomboy note-taking application working.
I began using Tomboy in Fedora 8,
and have several notes stored in Tomboy notebooks.
When Tomboy broke in Fedora 10,
I put it on my to-do list to figure out how to get it working.
I figured the fix would be as easy as re-installing Tomboy.
It wasn't.
Fedora 10 was released three months ago tomorrow. That's why I was surprised to find that reinstalling / upgrading to the latest Tomboy from the Fedora repository didn't fix the bug. Before I fixed the problem, trying to run Tomboy would give me an error like:
** (Tomboy:4816): WARNING **: The following assembly referenced from
/usr/lib/tomboy/Tomboy.exe could not be loaded:
Assembly: Mono.Addins (assemblyref_index=8)
Version: 0.3.0.0
Public Key: 0738eb9f132ed756
The assembly was not found in the Global Assembly Cache, a path listed in the
MONO_PATH environment variable, or in the location of the executing assembly
(/usr/lib/tomboy).
Until I saw the error,
I didn't even know Tomboy was a .NET application running under Mono.
I searched around for a solution to the problem
and found the bug has been
reported
three
times
to Red Hat Bugzilla,
but still no one has solved it.
The solution,
fortunately,
was pretty simple,
and was mentioned by Austin Acton in a
bug comment.
The solution also was mentioned on this
blog post
by Mark Ito (I'm assuming that's his name from the subdomain).
The solution is to install mono-addins from the 'fedora' repository.
sudo yum install mono-addins
For such an easy fix,
you have to wonder why this 5-month old bug with high severity is still open.
Tomboy comes as part of the standard Fedora 10 install.
It must not be as easy as making the tomboy package dependent on the mono-addins package.
Tuesday February 24, 2009 Permalink
Comments [0]
Installing Sun Java JDK 6 Update 12 on Fedora 10
Published by Tom |
February 03, 2009 08:27 AM EST |
When I set out to install Sun's latest Java development kit on
my newly upgraded Fedora 10 development box,
I discovered the previous instructions I had used on Fedora 8 from the
Fedora FAQ
no longer cover installing the Sun JDK.
The instructions now refer only to OpenJDK using the java-1.6.0-openjdk package.
After a short search,
I found a newer installation technique,
but unfortunately had to tweak it because it didn't work with JDK 6u12.
The best instructions I found for installing the Sun JDK on Fedora were from Fedora developer Paul Howarth at www.city-fan.org/tips/SunJava6OnFedora. Paul's instructions and his modified jpackage Java 6 RPM package are fantastically helpful. He details how to custom-build Java installation RPMs by rebuilding his RPM with the Sun Microsystems Java 1.6 "bin" installer.
The only roadblock to success was that Paul built his RPM for Java 6 update 7. The RPM spec file doesn't work if you run it with Sun's latest (as of this writing) jdk-6u12-linux-i586.bin file. My first attempt to follow Paul's instructions got me this:
[tom@development Download]$ rpmbuild --rebuild java-1.6.0-sun-1.6.0.7-1.1.cf.nosrc.rpm Installing java-1.6.0-sun-1.6.0.7-1.1.cf.nosrc.rpm warning: InstallSourcePackage at: psm.c:246: Header V3 DSA signature: NOKEY, key ID b56a8bac warning: user paul does not exist - using root warning: group paul does not exist - using root warning: user paul does not exist - using root warning: group paul does not exist - using root warning: user paul does not exist - using root warning: group paul does not exist - using root Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.W96jt0 + umask 022 + cd /home/tom/rpmbuild/BUILD + LANG=C + export LANG + unset DISPLAY + rm -rf /home/tom/rpmbuild/BUILD/jdk1.6.0_07 + export MORE=10000 + MORE=10000 + sh /home/tom/rpmbuild/SOURCES/jdk-6u7-linux-i586.bin sh: /home/tom/rpmbuild/SOURCES/jdk-6u7-linux-i586.bin: No such file or directory error: Bad exit status from /var/tmp/rpm-tmp.W96jt0 (%prep)The warnings are harmless. But as you can see, during the "prep" stage, rpmbuild is expecting the bin file to be called jdk-6u7-linux-i586.bin instead of the bin file for update 12. I optimistically hoped I might be able to get around this snag by renaming the newer file to the older name:
[tom@development Download]$ mv ~/rpmbuild/SOURCES/jdk-6u12-linux-i586.bin ~/rpmbuild/SOURCES/jdk-6u7-linux-i586.binBut that just got me one step farther:
[tom@development Download]$ rpmbuild --rebuild java-1.6.0-sun-1.6.0.7-1.1.cf.nosrc.rpm Installing java-1.6.0-sun-1.6.0.7-1.1.cf.nosrc.rpm warning: InstallSourcePackage at: psm.c:246: Header V3 DSA signature: NOKEY, key ID b56a8bac warning: user paul does not exist - using root warning: group paul does not exist - using root warning: user paul does not exist - using root warning: group paul does not exist - using root warning: user paul does not exist - using root warning: group paul does not exist - using root Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.1AYQKX + umask 022 + cd /home/tom/rpmbuild/BUILD + LANG=C + export LANG + unset DISPLAY + rm -rf /home/tom/rpmbuild/BUILD/jdk1.6.0_07 + export MORE=10000 + MORE=10000 + sh /home/tom/rpmbuild/SOURCES/jdk-6u7-linux-i586.bin + cd /home/tom/rpmbuild/BUILD + cd jdk1.6.0_07 /var/tmp/rpm-tmp.1AYQKX: line 33: cd: jdk1.6.0_07: No such file or directory error: Bad exit status from /var/tmp/rpm-tmp.1AYQKX (%prep)The rpmbuild was able to find and run Sun's (renamed) shell script, but then failed when it tried to switch to the non-existent jdk1.6.0_07 directory in BUILD.
To solve the problem, I had to edit the RPM "spec" file and make two small changes to account for the updated version. Then I continued with Paul's instructions, except using the modified spec file in place of directly using his RPM file. I got the idea of editing the spec file from a blog posting by Nick Lothian.
Here are my modification's to Paul's instructions,
- Follow Paul's instructions up to and including
running the rpmbuild command under the section "Build Java RPM Packages."
- Begin Detour:
After you get the error (shown above) that says
"jdk-6u7-linux-i586.bin: No such file or directory,"
you won't have the RPM files but you will have
an RPM spec file stored in ~/rpmbuild/SPECS,
called
java-1.6.0-sun.spec.
- Edit this ~/rpmbuild/SPECS/java-1.6.0-sun.spec file by:
Changing this line (line 37 in my spec file):%define buildver 7to say:%define buildver 12so the buildver is 12 instead of 7, and changing this line (line 45 in my spec file):%define toplevel_dir jdk%{javaver}_0%{buildver}to say:%define toplevel_dir jdk%{javaver}_%{buildver}That is, remove the "0" (zero) right before the %{buildver} variable. That second change stumped me at first because Paul apparently had to add a zero-padding in the directory name to get "07" when he was working with Update 7.
- Run rpmbuild again by using the spec file instead of the rpm file using this command:
[tom@development Download]$ rpmbuild -ba --rebuild ~/rpmbuild/SPECS/java-1.6.0-sun.specThis command should succeed with building new RPMs for Sun's JDK.
- End Detour. Continue with Paul's instructions under "Remove Any Old Cruft."
[tom@development ~]$ java -version java version "1.6.0_12" Java(TM) SE Runtime Environment (build 1.6.0_12-b04) Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)Success.
Tuesday February 03, 2009 Permalink
Comments [0]
The terribly misunderstood super()
Published by Tom |
January 13, 2009 12:31 PM EST |
For developers new to Java,
here's a tip that could make you look
more like a ninja coder than colleagues who have been writing Java for years:
learn how
super()
works within constructors.
I say this because I recently completed a yearlong project with 12 developers and found during the staffing process that about one third of the developers I interviewed, many of whom had been coding Java professionally for years, misunderstood fundamental concepts of Java object creation. Some of these smart developers would insist during the job interview that unless a Java constructor explicitly invokes
super(),
the parent constructor would never be called.
With such an important Java language feature being so terribly misunderstood,
I thought I'd dust off this blog with a reminder to those new to Java of how
super()
works in constructors.
A related topic would be how
this(...)
works,
but I'll leave that for another time.
Rule: You never, ever, have to call the no-argument
super().
Corollary: It is impossible to instantiate an object without at least one constructor being invoked in all parent classes.
With this rule in mind, here is code to illustrate.
public class A {
public A() {
System.out.println("A says hello");
}
}
public class B extends A {
public B() {
System.out.println("Hello from B");
}
}
If you instantiate class
B
like this:
B myB = new B();
the console will output:
A says hello
Hello from B
Since
B's
constructor didn't specify a different constructor in class
A
by using
super with an argument list,
the Java runtime invoked
A's
no-argument constructor by default.
No call to
super()
is needed from within
B's
constructor in order for the
A
parent class to be instantiated.
In fact,
there is absolutely, positively no way to create a
B
instance without creating an
A
instance first.
If I modify class B to add super():
public class B extends A {
public B() {
super();
System.out.println("Hello from B");
}
}
the output would be identical to the first version.
In fact,
the generated Java bytecode would be identical.
When there is no explicit call to
super()
as the first statement in a constructor,
the Java compiler implicitly adds a call to the no-argument
super()
to invoke the no-argument constructor of the parent class.
There is never a need to add an explicit call to a no-argument
super().
I think a lot of Java developers end up believing you need to call
super()
in order for the superclass's constructor to be called because so much Java code
out there includes extraneous calls to
super().
For instance,
here is a constructor taken verbatim from a
Hello World
J2ME coding example from
Research in Motion Ltd.,
the makers of the BlackBerry smart phones.
//create a new screen that extends MainScreen, which provides
//default standard behavior for BlackBerry applications
final class HelloWorldScreen extends MainScreen
{
public HelloWorldScreen()
{
//invoke the MainScreen constructor
super();
//add a title to the screen
LabelField title = new LabelField("HelloWorld Sample", LabelField.ELLIPSIS
| LabelField.USE_ALL_WIDTH);
setTitle(title);
//add the text "Hello World!" to the screen
add(new RichTextField("Hello World!"));
}
If I were a developer just learning Java,
I would assume the call to
super()
is required in order to invoke the parent class's constructor.
Why else would the
HelloWorldScreen
developer code it, and add that comment
to explicitly point out the call to the parent class?
I
searched Krugle
for open source projects using calls to the no-argument
super()
and found 82,347 Java files,
including code from major projects like Eclipse and
GlassFish.
It seems many developers like explicitly invoking
super().
I can see one possible reason for doing so. Perhaps there are several constructors in the parent class and the developer wants to call out that he or she is using the no-argument version. The first danger I see with adding extra code that adds no behavior to a program is the risk of adding confusion. For instance, I recently ran across code with constructors that looked something like this:
public class SpecialClass extends RegularClass {
private int x, y;
public SpecialClass() {
super();
}
public SpecialClass(int x) {
super();
this.x = x;
this.y = 0;
}
public SpecialClass(int x, int y) {
this.x = x;
this.y = y;
}
}
The first two constructors explicitly called
super(),
but the third constructor didn't.
Was that a mistake?
Did the developer mean to add a call to a different superclass constructor, like
super(x, y),
but forgot?
If not,
why did he leave off the third call to super()?
Finding small inconsistencies in code like this waste development time as the reader tracks down whether the inconsistency was the result of harmless oversight or the result of an error that is now a bug. Code that doesn't do anything, without a documented reason for being there, seems way more hazardous to understanding code than any value I can see that might be gained from "documenting" that you really meant the automatic behavior to be taken by writing extra code. Similar to seeing a class that extends
java.lang.Object,
I end up asking why did the developer do that.
(If you use
super()
regularly for documentation purposes,
I would appreciate hearing your reasons.)
The second danger in adding extraneous calls to
super()
is that it seems to be teaching a lot of new Java developers
that
super()
is required in order for parent constructors to be invoked.
At least that certainly is my recent experience from interviewing
Java developers.
The only time
super
is required is when it takes a non-empty argument list
to invoke a constructor in the parent class that requires parameters.
For example,
here is my base class to represent a
knight from the movie
Monty Python and the Holy Grail.
Note that the constructor requires an argument.
public class EnglishKnight {
private String whatISay;
public EnglishKnight(String saying) {
whatISay = saying;
}
@Override
public String toString() {
return whatISay;
}
}
This following subclass (with an error)
is meant to be a certain kind of knight from the movie:
public class KnightWhoSaysNeep extends EnglishKnight {
public KnightWhoSaysNeep() {
// syntax error here.
}
}
You probably see the compile-time error.
Since
KnightWhoSaysNeep
extends
EnglishKnight,
and since
EnglishKnight
does not contain a no-argument constructor,
the
KnightWhoSaysNeep
class must override the implicit (and illegal) call to
super()
in its constructor.
Here's the error from Eclipse:
Implicit super constructor EnglishKnight() is undefined.
Must explicitly invoke another constructor
To fix
KnightWhoSaysNeep
we need to call one of the valid constructors in the superclass.
In this case,
there is only one constructor in the superclass,
which takes a String as an argument.
public class KnightWhoSaysNeep extends EnglishKnight {
public KnightWhoSaysNeep() {
super("Neep!");
}
}
The corrected
KnightWhoSaysNeep
class demonstrates the proper use of
super --
one that takes a parameter to override default behavior.
Tuesday January 13, 2009 Permalink
Comments [9]
Koen Aers jBPM EclipseWorld sneak preview
Published by Tom |
November 08, 2007 09:05 AM EST |
Even though
Koen Aers
from JBoss had to be up early Thursday to give a
jBPM presentation
at EclipseWorld 2007,
he kindly stopped by our Northern Virginia Java Users Group
(NovaJUG)
meeting Wednesday night to
talk about business process management in general and JBoss's
jBPM platform specifically.
Koen Aers presents on jBPM at
the NovaJUG meeting Nov. 7.
[photo from my phone]
Business process engines can make applications easier to write, but they have received a bad reputation, Aers said. The reputation stems from the fact that most business process management systems are behemoths that take up half your hard disk and come with a steep learning curve, he said. JBPM is about a 500KB core library, not counting its Hibernate database persistence layer, and developers can learn and use only a small part of the whole platform.
BPM engines don't need to be complex. At their core, he said, business process engines boil down to the management of state: what state is each instance of a business process in at the moment, and what internal or human activities trigger a transition to a new state.
Here are my notes from Koen Aers's jBPM presentation.
Why use a process language?
- Simplify an application by extracting the state-management logic.
- Improves communication: Process languages should support graphical modeling that maps to executable notation.
- Automatic persistence history can be used for business intelligence.
- A tool that allows an analyst to model workflows (business processes) and hand over results to a developer, who will add the details to make it executable.
- With modeling, the more expressive the modeling notation, the harder it is to make the model executable.
- Thus the choice of modeling notation is important.
Popular modeling notations:
BPMN: a pure modeling notation. No automatic translations to code.
BPEL: The purpose is to orchestrate web services and publish result as a new web service.
XPDL: A format for storing process models. - A big repository that holds executable processes, persists the execution state of the processes, and records history of what happened during the process executions.
JPDL is an XML language defined by a schema. The language is extensible to support custom business processes. The language also supports defining Java actions that can be invoked at numerous points as the business process changes states.
Aers showed a demo of coding a business process using JBoss's visual process designer, an Eclipse plugin. The plugin lets you edit the jPDL both as XML and visually. Aers is a developer for the designer tool.
Before he started working on the jBPM designer tool, Aers said, he would code jPDL using straight XML with an editor that supported auto-completion from the XSD. The designer is primarily a marketing tool, he said, to support people's expectations of what a powerful BPMS must provide. "If you go to a presentation and you don't have a [graphical] designer, then you suck" in the customer's view, he said.
Thursday November 08, 2007 Permalink
Comments [3]


