Why learn algorithms and data structures ?
The question is – Why learn algorithms and data-structures ?
and here are some thoughts. Probably you’d find this question answered in the preface of any algorithms book but anyway, here’s my take.
1) To not be constrained by the programming language for data-structure
Without a good knowledge of various data structures, it is easy to be constrained to thinking for solutions in terms of the data structures directly provided by the language.
If you were a Java programmer, you’d probably be thinking on solutions just in terms of what is provided in java.lang.util package.
Well, there’s lots more. Take for instance a simple need: In a low end, basic phone, as the user types something, you’d like to present the user with “auto complete” feature. Thinking in terms of the data structures readily provided by the language /libraries doesn’t help much.
A simple trie would help here. For another example, to think of any networking website that we use-
How does Facebook suggest you friends, How does LinkedIn tell personX is a 2nd level contact ?
A graph data structure helps here.
Or when there’s a steady stream of numbers coming in – and you want to maintain the largest 10 numbers. (You don’t have the
luxury of storing all the numbers and sorting them) Max heap is the one to go to.
The list goes on..
2) Well, algorithms is “technology” too
Think about an air ticket reservation system. There are plenty of websites that search various airlines, for the dates required and presents with the results. A quick response back to the user, is what we’d like to have. Sure, for this we’d like to add all the required hardware.. Wherever there’s scope for pre-processing, (say a scheduled-job that pre-computes some for some common queries and stores it in database) we’d make use of it. Wherever there’s possibility of caching we’d make use of it. To search for various airlines, probably we’d be making web service calls.
We use various technologies throughout – “webservices / cron jobs / caching / good hardware / load balancing…” and these are all interesting – but not without having efficient algorithms (to find path / find shortest path / cheapest path etc / paths without break)
No amount of additional hardware is going to compensate for an inefficient algorithm. Super computers could take years to solve with an inefficient algorithm, what a PC might take just minutes with an efficient algorithm. (I’ll add some stats on this later)
3) You’ll come across some very interesting, non intuitive ways of solving problems
Given an array with n/2 distinct elements and n/2 copies of another element. Find the missing element.
Well, the normal solution we would all think of is to do a linear scan through the array and to find the missing element after reading through n/2+2 elements at the worst case. [ Worst case = all n/2 distinct numbers appear first in the array and then the number-which-repeats appears n/2 times]
Interestingly, this could be solved faster this way:
while(true) do i = getRandomNumberWithinRange(1 to n) j = getRandomNumberWithinRange(1 to n) if(i!=j) and a[i]=a[j] then return i;
And the probability that the algorithm does not quit in 10 iterations is 0.1074 !
“Randomization” turns out to be an amazing practical way of solving some section of problems.
And to give a more “real-world”ish application for “randomization” -The randomized version of quicksort is one of the most practical/ efficient sorting algorithms !
Get ready to be surprised by lots of cool, non-intuitive-yet-amazing ways of solving problems
4) Refresh a little bit on probability, logarithm and some fun math
It’s interesting to get back to some math, – specifically on probability, expectations, logarithm..
5) Learn some interesting paradigms
“Divide and Conquer” is quite a common/popular paradigm. But so are “dynamic programming”, “backtracking” etc..
It’s fun to be exposed to these..
For starters -
The basic fibonacci(n) algorithm/code which we’ve been exposed to while learning recursion is practically useless.
With ‘dynamic programming’ however, it could be optimized with just a few more lines of code.
6) It’s a skill
Certain things are just “knowledge” – that could be learnt-along-the-way when a need arises.
For example: There’s not really a need to know all the “MVC” frameworks out there. Knowing one is just enough. There’s a need to use some other MVC framework ? Well that could be learnt along the way..
But algorithms and data structures don’t fall into that category. It’s a skill, to deduce an algorithm for a problem / It’s a skill to analyse the complexity and tell which algorithm is the better one / It’s good to learn in advance. Acquiring a skill is interesting, right ?
7) Quadratic complexity when O(n) solution exists
Without some learning of algorithms, it’s quite possible that we come up with inefficient solutions when an efficient one already exists !
8) Understanding what’s behind the everyday tools..
It’s fun to know what’s behind the tools/framework that we use..
To give an example, for any DI framework, all the dependencies have to be resolved first.
BeanA depends on BeanB which in turn depends on BeanC ?
In what order should the resolution be ? BeanC, BeanB, BeanA - yes.
Obviously real-world dependencies are not so simple and ‘topological sort’ is right there for this situation.
With some learning, its fun to think and guess the algorithms that are being used in our everyday tools
- Well, that’s it.. from the top of my head.
Embracing assertive programming
I am not using this term “assertive programming” strictly on the basis of using “assert” keyword, but a little loosely, to fail-fast and expose bugs early.It is a great technique for programmers to shorten their debugging-time, but still I find that not many embrace “assertive programming”
Languages like Eiffel contain constructs for preconditions and postconditions and goes a step further in helping to program assertively, but at least we’ve got an “assert” keyword in Java.
Here’s a very good article on “failing fast“.
There’s a choice on the way an assertion is done.
Choice1: The assert keyword is handy, throws error on failing and could be switched off at production.
Choice2: The other way is to go towards an API (Spring-Assert, GoogleGuavaPreconditions) or a custom-class,that throws an Exception
My personal preference is to go with Choice2, just so I could catch the exception in a global exception handler and still have my application end gracefully. But regardless of the way, ‘asserting a condition that must be true’, is a powerful tool, not to be disregarded.
It is not always about ‘asserts‘ either. For example, when using transaction attributes, you could ensure that a transaction must be present using ‘mandatory’ rather than using a ‘required’ and stay away from a vague transaction attribute like ‘supports’
While working with read-only values, you could use something from the programming language to ensures it. Marking references as, ‘final’, better yet, using Immutable objects, would go towards it.
There’s a lot of good to be assertively programming, then why not embrace it ?
Good Reads:
Spring Webflow – Tips
Just a random list of tips – handy if you are using SWF
Passing a String in an evaluate expression:
To pass a string literal, enclose the string in single quotes ‘ ‘ as in
<evaluate expression="searchService.fetchSearchResult('hello') result="flowScope.searchResult" />
Set a boolean
Likewise to set a boolean value,
<set name="flowScope.isTicketAvailable" value=" 'true' "/>
Flush flow changes without restarting the server
While developing, do set, development=true as such and save yourself lots of time
<flow:flow-builder-services id="flowBuilderServices" view-factory-creator="viewFactoryCreator" development="true"/>
The RequestControlContext object
On the rare cases where you need a requestControlContext in your bean, you can get hold of it by passing ‘flowRequestContext’ from the flow.
<evaluate expression="someBean.someMethod( flowRequestContext)">
public void someMethod(RequestControlContext context){ //your operation
}
The start state
During development, if you wanted to skip ahead a few states, just define the start-state in the <flow> element.
Want to trace the flow?
Its pretty simple to add a couple of methods in your listener that extends FlowExecutionListenerAdapter
@Override
public void stateEntered(RequestContext context,
StateDefinition previousState, StateDefinition newState) {
logger.debug("Entered State:" + newState.getId());
}
@Override
public void transitionExecuting(RequestContext context, TransitionDefinition transition)
{
logger.debug("Executing transition:" + transition.getId());
}
and trace the flow.
A few things to avoid while writing unit tests
Long back I wrote a post on some good practices to follow while writing unit tests. I thought I’d add one on mistakes/practices to avoid while writing unit tests.
1) Reading a configuration file:
If for writing a unit test, you access a file, then strictly speaking, it is not a unit test.
For example,
class CarTest
{
Engine engine;
// your test for engine object
}
If you created the object under test by doing,
Engine engine = new Engine();
that’s neat. Whereas if DI was used to create the ‘engine’ object, then it wouldn’t have been a unit test. (DI involves reading an external file/ scanning for annotations )
Likewise, if for test data, you retrieved the data from a file, then its not really a unit test.
the main drawback being ?
Reading files takes time. A unit test has to be really fast. Something that’s executed very often.
2) Mocking the wrong objects:
Mocking frameworks are great. You could easily mock a ‘dao’ or a ‘service’ call and write a unit-test quickly. Mocking a ‘value-object’ like the one above is needless.
public void AccountBalanceTest
{
@Mock AccountInfo accountInfo;
@Test
public void debitShouldDecuctTheGivenAmount()
{
when(accountInfo.getCurrentBalance()).thenReturn(500);
Account account = new Account(accountInfo);
account.debit(100);
}
}
It turns out to be messy when a lot of such stub values are defined.
‘Value-Objects’ being easily constructible with a ‘new’ should rather be used like this:
@Test
public void debitShouldDecuctTheGivenAmount()
{
AccountInfo accountInfo = new AccountInfo();
accountInfo.setCurrentBalance(500);
Account account = new Account(accountInfo);
account.debit(100);
}
}
This reads a lot better, doesn’t it? And when there are a lot of test data to be prepared, there’s of course the very useful builder pattern http://nat.truemesh.com/archives/000714.html
3) Ignoring the feedback
A great thing about writing tests is that you get “feedback” about your code.
For example, if you started writing a test like this,
@Test
public void testValidateAddressShouldCheckForMandatoryPostCode()
{
Customer customer = new Customer();
Address address = new Address();
address.setPostCode("123456");
customer.setAddress(address);
Validator validator = new Validator();
validator.validateAddress(customer);
}
you should be seeing a “warning” here. You’re spending some effort constructing the object required for test,
and this “warning” should hint you that, the validateAddress() method should be refactored to take an address parameter and not a customer parameter.
It is easy to set up the object like above and get the test passing – but that’s not the point. The idea should be to take the feedback and refactor.
@Test
public void testValidateAddressShouldCheckForMandatoryPostCode()
{
Address address = new Address();
address.setPostCode("123456");
Validator validator = new Validator();
validator.validateAddress(address); // refactored method.
}
Yes, the above one was a simplified example, but hope the idea is clear.
Similarly, when a piece of code is hard to test, it is possible to overcome the pain by using some advanced mocking frameworks and finishing up the test. But “being hard to test” might be a feedback telling you to take efforts to refactor the code and make it testable/loosely-coupled.
Here’s a great post http://martinfowler.com/articles/modernMockingTools.html that explains it.
So to put it again, “the feedback should not be ignored”
Good TDD resources
http://misko.hevery.com/
http://www.growing-object-oriented-software.com/
http://vimeo.com/10569751
Contributing to open source projects
Recently I was voted as a committer at the lovely OpenEJB project ! Am I excited ? You bet !
For a long time I had wanted to contribute to Open Source projects that I love. Like everything takes a start somewhere, my start was at the OpenEJB project a few months back. Thanks to the friendly folks there who are great at mentoring, I’ve come to understand a few things about “contributing to open source projects” in general, which I thought I’d share.
So which project to contribute to ?
The answer is pretty simple. Whichever project/projects that interest you.
But the project looks daunting.
Yes it might be. But there’s no rush to understand everything all at once. There’s always a small piece of work, to start from. Building an understanding is a gradual process…
What are the first things?
Get in touch with the folks at the project.
Almost all projects have a mailing list. Subscribe to the mailing list and observe what’s going on.. But without being shy or hidden for long, pop out a “Hi” to the folks there, just introducing yourself. You could ask them to suggest simple pieces that you could start with.
Mental Todos versus Saying a Hi ?
We make mental todos like “Wow.. This project is so good. I must get involved and contribute..” but never really start at it. I think saying a “Hi” and getting in touch, works better than making mental todos.
Mailing lists are great and essential because members work from different timezones and there’s a need for asynchronous communication.
But that’s not the only way to get in touch. Most projects have an IRC chat,.. so you could chat up !
Contribute ? But I am not an expert…
You don’t have to be ! Whether you are a beginner or an expert in that technology is not a factor for contributing to an open source project.
Just ,
1) Your interest
2) Time you could afford
would be the factors.
Don’t have hours to spend ? No big deal…
It’s of course great if you could have a few hours everyday to spend at the project. But even if you don’t, there are things you could to with the limited time. Like
1) Answer a user’s question
2) Document something
3) Blog a “how-to”
4) Read up some project docs and get better.
It’s not all about patches and code !
A must read…
I hope to add more to this post later on, but check out the awesome ”contribution tips” OpenEJB doc. Excellent tips there !!
Fun with unix commands -(Bash & imdb-api )
I’ve been a Windows user for long. I had wanted to switch to Linux, learn a good bit of commands and shell scripting, but never really started until about a month back. So with a fresh install of Ubuntu, as I was trying with Seds and awks, I just thought I could put these newly learned commands into fun use. And I remembered how I wished for the 1000GB of movies (my brother had that in an external harddisk) sorted based on imdb ratings.
Geekish work starts:
There’s a great RESTful API put up at http://www.imdbapi.com/
So all that was required was:
- Scanning the filesystem for video files
- Strip away unwanted words from movie’s filename
- Encode the filenames (esp, spaces and apostrophes)
- Send request to imdbapi
Parse the result
So what I ended up with was some bash scripts run like this:
sh lsMovies.sh | ./encode.sh > mlist.txt sh askImdb.sh (that saves JSON response in a file mresponse.txt)
Not much need to parse it. Just replace every } with a newline character and save it as a CSV file.
Open the file with the seperator character set as : and voila ! Movie names, ratings, votes, decription etc go into different columns.
Ha, so can do all sorts of sorting
Here’s the code:
Scanning the filesystem for video files, trimming away needless words from filenames
#lsMovies.sh -Scan directory given as argument. Else scan current directory for videos. #!/bin/bash if [ -z $1]; then echo 'Directory parameter not provided. Searching in current dir' dir='.' else echo 'Searching in' $1 dir=$1 fi find $dir -iname *.avi | sed "s/.*\///g" | #remove any path info sed "s/\./ /g"| #replace . with space sed "s/\[.*//g; s/(.*//g; s/DVD.*//i; s/xvid.*//i; s/rip.*//i; s/.avi//g"
Then the encoding part
encode.sh
#!/bin/bash sed "s/ /%20/g" | sed "s/'/%27/g"
Sending request: askImdb.sh
echo "" > mresponse.txt #clear contents initially count=1 while IFS= read -r line do echo "Movie #"$count "requesting imdb for info about"$line curl http://www.imdbapi.com/?t="$line" >>mresponse.txt count=$(($count+1)) echo "info on" $count "movies retrieved" done
So to cheat without parsing, the insertion of newlines
cat mresponse.txt | sed "s/}/\n/g"
plus saving this as csv and opening with : as separator is all that's left !
I'm walking down the list watching the best rated movies ! Fun
[Update: Did a Java Swing application later, that scans the filesystem for .avi files, queries imdbapi.com for imdb info for each movie, parses the result using Jackson and creates a CSV report of it using OpenCsv. Try out https://github.com/stratwine/movie-insight. Download section has the executable JAR ]