Why your online presence is important

This November marks 7 years of active Internet usage for me. Considering that the Internet arose in the 1980’s I’ll admit I was a good few years late on the spot. That being said, I’m happy to note that I was just in time to see the internet rise to the status of core infrastructure in our society that it has now. That, combined with the Web 2.0 movement makes the internet a very interesting place to be. In recent years the Internet has become an increasingly democratic medium, with anyone with a stable Internet having the capability to become not just the consumer, but also the creator of a variety of information.

It has only been over the past year or two that I’ve really started to take advantage of this inherently two-way aspect of the Internet. Having a online presence can be very useful nowadays and I would argue that it won’t be too long before having a website (or an ePortfolio) becomes as useful as a resume (and for some professions, even more useful). Being a computer scientist/engineer and part time web-designer, having an online presence is something I feel to be very important.

What is an online presence?

So what exactly is this online presence I talk about? Everything that you put out on the Internet is a part of your online presence. Email, IMs, IRC conversations, blog posts and comments, social network activity, all of it taken together goes to defining the unique ‘you’ online. For some people, like my parents, this presence is tiny, limited mostly to email and some IM conversations. But for most younger people (including myself) there’s going to be a whole slew of important and often unrelated information out there that represents your online identity. It’s important to know where your personal information is and what it represents. Much of the web is searchable and as the years go by even more different forms of information are being indexed and made available at a moment’s notice to anyone who wants them. Just as you wouldn’t leave your wallet or personal banking information lying out on the street, it’s a good idea to keep track of where and why you’re sending your information.

Here’s a little experiment: Type your full name into Google and see what the results are. If you have a relatively common name, then chances are that many of the results may not be about you. However the ones that are about you are worth looking into. Do they represent things you’re proud of, or at least things that you wouldn’t mind others seeing? If not, you should try your best to have them removed. In some circumstances you can’t (mailing list emails), but in some you can (forum posts). While I’ve always been an advocate of disclosure and I can’t stand hypocrisy, it’s also your right to rectify past mistakes. And while you are rectifying past mistakes, remember not to make future ones.

Your window on the world

It’s important to realize that your online presence is not just what the world sees of you, but what you see of the world. Having multiple email addresses with multiple providers (perhaps with not-too-discrete names) might confuse people you’re in contact with, but it will also make life harder for you. No matter what field you’re in, email probably takes up a considerable part of your time and you can make things easier by consolidating your email into a single place. At the other end of the spectrum, you could also have one gigantic inbox with everything mixed together. Learning to use filters and folders to separate out email will save you the mental and time costs of having to manage things on your own.

Whenever you see duplication in your life (multiple emails, multiple blogs, multiple websites), there might very well be an easy way to cut them down, whether it’s by using a client (or Gmail) to pull together multiple mailboxes or having multiple subdomains instead of multiple websites.

Your online presence is two-way street. In some ways it’s like your home: you live there and spend a lot of your time there, but you also have friends, family and other guests over. Taking care of your image on the web is just as important as keeping your house clean.

Looking ahead: new feed settings

Over the past few months this blog has experienced a significant rise in readership. It’s also been a great way for me to record, think about and gather information about my activities. Keeping that in my mind, I’m certain that I’ll be posting and devoting time to it over the next year. Though I currently pay for the ByteBaker.com domain name, I’m still on free hosting at WordPress.com. I’m planning to move to a paid host sometime early next year. This will give me more freedom in customizing and adding more content to the blog as well as gathering more statistics about what my readers like. 

This blog has been through a number of transformations over the past two years, going from a not-very-regular, not-very-focussed compilation of my occasional thoughts to a source of information that I think an increasing number of people find useful. That being said, through all that time, my RSS feed link has been the same. But the link still has the name of one of The ByteBaker’s predecessors and is something I wish to leave behind. I’m going to move the feed to a new Feedburner feed:

http://feeds.feedburner.com/bytebaker

This feed will stay the same even when I move to paid hosting. I’ll be shutting down the older Feedburner feed at the end of the week. WordPress.com also publishes its own feed for the blog, which will get updates while I’m still on WordPress.com but will stop after I move. So I’d encourage everyone to move to the new feed as soon as possible.

Sorry for the inconvenience, but I hope that a continuing stream of free technology-related posts will eventually make up for that. Once again, thanks to all my reader for giving me a reason to keep up my writing.

Making Linux boot faster

I’ve been a committed Linux user for the past 3 years now (though I’ve developed a liking for the Mac platform too) and currently I dual boot Vista and Arch Linux on my laptop. It’s a pretty rare ocassion for me to actually boot into Windows, mostly it’s just a game of SimCity 4. But whenever I do I’m really hate how much slower it is than my Linux install. What I hate even more is how even after I log in my system is still unusable for a good 20 seconds while everything finishes loading in the background.

By comparison, my linux install takes just 30 seconds to get to a usable desktop. Admittedly, what I boot into isn’t really a ‘desktop’ in the traditional sense of the world. I run Stumpwm, a lightweight tiling window manager, I don’t have fancy 3d graphics, I don’t even have a taskbar or desktop icons. That being said, I do boot into a fully usable system with an active wireless network connection.I was quite happy with the 30 second boot time until I came across an article about a group of researchers who managed to make a linux system boot in 5 seconds. Needless to say a 5 second boot time is really impressive and makes my boot time look extremely sluggish by comparison. 

With Thanksgiving Break around the corner, I’m considering putting some time into a building an experimental system in order to see just how fast a boot I can get. Chances are I’ll be using an older Dell machine, but I’m not sure about the specs yet. There are two main steps I’ll be taking to cut down on boot time:

1. Cutting down on unnecessary processes

A standard linux boot often starts up a number of processes and tasks that aren’t really needed on a normal single user system. Namely, I don’t need a Cron scheduler, nor do I need any type of mail server to be started. Since I never print, I can get rid of CUPS too. I use a lightweight graphical login manager called Slim, but if I wanted to shave another second or two, I could boot straight to a logged in desktop.

2. Making changes to the kernel

Since I’ll know exactly what hardware I’ll be running, I can build a custom kernel with appropriate modules for the devices built right in. However, I’ll still need to start the Hardware Abstraction Layer and udev so that I can use hotplugged devices (USB drives and the like). Since this is an experimental machine, I probably could remove the need for hotplugging, but I think in the real world, that crosses the usability line in the wrong direction. 

Adding things directly into the kernel has a two-fold effect: firstly I no longer have to wait for the modules to be loaded separately later in the boot process. Secondly, I’m trying to figure out if this will allow me to get rid of the initial ramdisk. Arch linux boots up by having the kernel load up a compressed disk image into the RAM which contains the basic modules needed to get on with the boot: mostly hard drive modules. This then lets the kernel mount the root filesystem and get on with the boot process. Removing the need for a ramdisk might be a significant time boost and is something I will definitely be looking into. 

I doubt whether I’ll be able to get a boot time under 10 seconds, especially since I won’t be making any source level changes. I’ve been looking for information to see if a different filesystem (other than the popular ext3) would have a noticeable impact, but I haven’t been able to find any concrete information yet. I might be looking around at some other kernel patches. Unlike the researchers mentioned above I won’t be making any changes to X or implementing readahead (which lets the kernel look ahead to see which filesystem blocks need to be loaded next). As always, I’ll be keeping a log of my activities and I should have some interesting results.

Scala as an alternative to Java: Part 1

The end of the semester is drawing near and that means that my independent study in programming languages is also coming to its end. This course has helped me learn a lot about programming languages. Before it I was mildly interested in programming languages, but over the course of the semester I’ve been introduced to many new and innovative ideas and I’ve become convinced that studying programming languages is something I want to continue.

At the same time, I’ve also become interested in virtual languages and the benefits they offer. Over the past few years the Java virtual machine has grown in prominence and there has been a corresponding rise in the number of different languages that can run on the JVM. I’ve never been a big fan of Java, though I will admit that it does have some advantages. With about a month left till I have to go home, I think it’s time to look seriously into an alternative for Java on the JVM. Though there are a number of interesting candidates (Scala, Groovy, Fortress, JRuby, Clojure) the one I’m most interested in looking into is Scala. I’m only just starting to use it, but here are some features of the languages that I’m looking forward to using:

1. Uniform object and static type system

One of my biggest gripes about Java is that it isn’t purely object-oriented. Java is a mix of hierarchical class systems, primitive types (ints, floats) and reference types. Though this isn’t always a program, it can be messy and there can be times when you would really like to subclass one of the primitive types, but you simply can’t. Java is statically typed, but again, not quite. Though type-casting can be quite handy sometimes, I can’t help feeling that it’s somehow ‘unclean’ and not very aesthetically pleasing. And it’s not very safe either. In Scala everything is an object. For examples integers are instances of scala.Int. That lets you subclass base types. It also has strong static typing, as well as type inference. I’m still not very sure about the effectiveness of static typing, but I’ve come to respect it and I’m interested to see how it works.

2. First class functions, anonymous functions and singleton objects

Through my contact with Scheme, Lisp and ML, I’ve come to love functional programming. Some languages like Python do a good job intersecting object oriented and functional programming styles. I’m interested to see how I can adapt my functional programming habits to work in a object-oriented, strongly typed system like Scala. The ability to create Singleton objects without having to go through the hassle of building a container class and instantiating it is something I missed more than once in Java that will lead to some more productive code. As a price, I’ll have to give up Java’s static fields and methods, but I think that should be an acceptable price.

3. Traits and Multiple Inheritance

Java doesn’t support multiple inheritance and I somewhat support the reasons why. However there are workarounds, but none of them have looked very appealing to me. I’ve never been in a position where I really needed multiple inheritance, but I can understand the situations where it would be helpful. Scala allows reusing code from multiple classes using traits which are similar to Java’s interfaces, but can contain code and not just definitions. There are some rules to be followed, but it looks like something that could be a powerful tool

4. Method Overriding

Method overriding is something that is sorely missed by a lot of Java programmers and the lack of it contributes to Java’s verbosity. The concept is simple: if you have a Real numbers class, you can redefine the ‘+’ operator do real addition if overloading is supported. But in Java, you would have to create a class method, say add() which would handle addition. I’ve become a supporter of the idea that things which do you the same thing should look the same (conversely, different things should look different), and overloading is a step in the right direction.

In the near future…

I’m going to be exploring more about Scala. Each of the above is worth an entire post in itself and I’m sure there will be more things I’ll discover on the way. I’m going to be adding to this series at regular intervals so stay tuned.

Two weeks backing up to Amazon S3

It’s been two weeks since I started using Amazon’s Simple Storage Service in connection with Jungle DIsk as a synced backup solution for all my files. So far it’s been working great. Both S3 and Jungle Disk are great tools and they make a killer combination.

Amazon S3 lets users make ‘buckets’ which are storage units for all your files. The best metaphor is to think of them as being individual drives with their own independent directory structures. In fact, JungleDisk lets you mount them as Network Drives. I set up 3 buckets — one for my personal Subversion Repository (which contains my day-to-day use files), one for my photos and music, and one as a dump for large files like backups of software installers. On my daily work Mac Mini, I have all three buckets mounted as network drives. The media bucket is set to automatically scan and back up all my music and photos every monday morning. I set it to Monday morning because that is usually when I’m in class and my computer isn’t doing anything. My main Subversion repository lives on an older G4 that is currently in my college’s student server room. I only have the repository bucket mounted on that machine, which automatically backs up every Sunday at midnight. The third bucket is mounted only on my Mini. It is only set up as Network Drive, since I only occasionally have large files that need to be dumped. JungleDisk handles the automatic backups and makes sure that for large files, only the changed parts actually uploaded (because S3 charges for data transfer as well as storage).

On the S3 side, things are mostly smooth. I haven’t been billed yet, but I just checked my activity usage. My data upload is recorded at just over 23GB, which is roughly the amount I’ve backed up. What I’m slightly confused about is that the total storage usage only amounts to about 6.8GB and I’m being billed accordingly. I don’t have an explanation for why this is case (perhaps the storage amount used for billing is the amount of storage I used at certain days in the month) and a quick Google search didn’t reveal anything similar. I’ll keep an eye on this and will notify Amazon at the end of the month if I think that something unusual is going on.

Update: As JungleDave said on the first comment below, Amazon charges both on the basis of how much you’ve stored and for how long (which makes perfect sense). So by the end of the month, I should be getting a full bill. Thanks again to JungleDave.

I’ve had a good experience so far. You know an automated backup system is doing its job if you don’t give it a second thought after you’ve set it up. This was certainly the case with me. I hadn’t thought about my backups until I got a reader’s comment asking about how it was going. If you’re looking for a simple and cheap remote backup solution, I would certainly recommend an Amazon S3 + JungleDisk combo. The initial $20 price of JungleDisk is certainly worth it.