I am currently beginning to work on my final project for an Artificial Intelligence class (as part of my B.Sc in Computer Science). In this project we are required to choose an interesting problem in the field of Artificial Intelligence, expanding on one or more topics from class, and solve it. We later write a report discussing our results and submit both the report and the code we’ve written.
Obviously we are not expected to equal the state of the art in the research of classic problems, but to either examine and solve (to a good degree) an uncommon problem (most of the people opting for this approach choose to solve some simple computer or board game that have yet to be solved to death by the AI research community) or to examine a more common problem in some novel way, perhaps suggesting a new and interesting heuristic or some modification to an existing algorithm. In the latter case, we are not expected to out-perform modern research results, only to offer some new perspective.
The subject my partner and I chose for the project is Sokoban, which puts us in the second group (it has not been researched to death, as only two-thirds of the common test set can be solved by the best solver, but the state-of-the-art solvers for this problem seem too complicated for us to hope to get anywhere near them with a part-time, two weeks project). We want to try and solve Sokoban problems using a search problem approach.
Anyway, before starting to implement our Sokoban solver, I began to wonder which of the few languages we are familiar with (C, C++, Java and Python) is more suitable to be used in the implementation of a search-based solver meant to perform searches on a very big search space (Sokoban has a very deep search tree, with some problems requiring more than 300 moves to solve, and a very high branching factor [over 100 in some problems]; note that this high branching factor is achieved when only stonebox moves are considered, not player moves, and so in each state we may move any of the stones to any of four directions).
The main reason I began to consider this issue is because in another course on artificial intelligence – dealing with applying AI techniques to product design – I created an automated room designer, which will design a room by searching through the state space of all possible room designs (with a given room size and a set of furniture) and returning the state with the highest score (measured by some heuristics). That program was written in Java, and it ran out of memory on each run, after searching only tens of thousands of search nodes. I think the main reason this happened is because I chose a very object-oriented approach for that project; it was written in Java, and every search state was represented by an object, and every such state, when arrived at by a searcher object, was wrapped by a search node – yet another object – which of course meant the program’s memory was soon filled with a lot of objects, and thus ran out pretty quickly.
Now, I know part of the problem was using a memory-intensive algorithm (A*), and the way I chose to implement it, but I’m wondering if using Java had some part in the problem too. So this leads me to two questions:
1. Which programming approach, in general, is more suitable when implementing search problems and search algorithms? (Object-Oriented, functional or other)
2. Which programming language is more suitable when implementing search problems and search algorithms, Java, C, C++ or Python? (Other languages are also possible, but only if their syntax is very similar to one of the aforementioned languages)
Specifically, what features and properties of these languages can be used to implement a problem solver meant to search on a very large search space in a memory (and run time) efficient way?
5
Answers
My opinion:
Java: Far too verbose/ugly syntax, especially for mathematical things. And uses too many memory/cpu resources.
C: Absolutely horrible, if you are used to object-oriented pointer-free languages
C++: Powerful and fast, but too heavy/slow-compiling to do experiments with many different algorithms.
Python: The slowest of them all. Probably nice for creating prototypes, but not really suited for big computations.
If you are limited to using languages that most people are familiar with I would probably recommend c++.
If you want to branch to a language that is common in the field of AI and I find relatively easy to implement a solver in, I would use prolog. I know that you can find examples of heuristic algorithms online. Minimax and alpha-beta pruning are common as I am sure you know.
Also, there is a language that I have used linked with prolog for game playing called GDL. See: http://en.wikipedia.org/wiki/Game_Description_Language. It contains what you might call prolog predicates in lisp syntax.
I know a few people, all doing memory-intensive algorithms like the ones you described in Java. You can make it work, but you have to resort to primitive collections and arrays.
That being said, I don’t think it’s very clever. If you want to code efficient Java, you basically write code which resembles C/C++ code. Then you can also take the final step and write C/C++ directly, getting the opportunity to optimize a bit further (maybe gaining another factor of 2 in speed and memory).
This gets me to your questions:
Functional programs often look very nice and seem like a natural fit for algorithmic problems. The problem is that most often imperative algorithms are faster (and way more difficult to write without bugs). Object-oriented, well I don’t know, I count it towards imperative. Object-hierarchies introduce a computational overhead you usually don’t want (in comparison of what it buys you).
I think Python is similar in conciseness to functional languages (it has first-class functions), but it’s too slow for anything serious. Of the languages you proposed I’d probably pick C++, as Java is not nicer but possibly slower. One possibility would be using something like Scala, which allows concise programming (comparable to Python) with speed close or equal to Java.
By using C++ structs you might save some space over Java’s Objects, but the root of the problem is your current algorithm implementation: it looks like your memory requirements scale exponentially.
Pseudo calculation: Suppose your
n = 10 000
. So you need100 000 000
objects which translate into1 Gb
RAM. Suppose by switching from Java to C++ (and trading known problems for unknown problems), you bring it down to0.5 GB
RAM. Great, except that if you double your n, your memory allocation quadruples… So for n = 20 000 you’d need4 GB
Java RAM and2 GB
C++ RAM…If instead, you focus on more memory efficient algorithm implementations, you’ll find that all of a sudden you’re using MB’s instead of GB’s with a large n. I’ve experienced this quite drastically by implementing Just In Time Selection (trading a List based architecture for an Iterator based architecture).
Space demands are usually the hardest to manage with a Best-first search (BFS), because you are in effect storing the entire search space, and it is often exponential in size.
Space can be managed by throwing away part of the search space which appear to be unpromising (low scores), either resulting in giving up completeness of the search (when “good enough” answers are OK) or by causing re-search if tossed part of the search needs to be revisited. The extreme version of this idea is called “depth first search” (DFS), which keeps only the currently most-promising branch. A compromise between DFS and BFS is a so-called beam search. These kind of space-tradeoff schemes are independent of programming language; you wire them into your search algorithm.
Once you realize that you can and must manage the space effectively, now you have to manage the search time. In general, the best algorithm always wins and it often isn’t raw search. In the AI world you often don’t know the “best algorithm” thus the resort to brute force search. For raw brute force search, you need the best heuristics for the problem you can get, and this isn’t a algorithm problem; it is domain knowledge you have to acquire somehow. Most of AI is arguably about trying to capture and harness domain knowledge. Both algorithm and heurstic knowledge are language independent.
Finally, you can help manage search time by using a parallel language (although this only buys you at best a constant factor of N speedup). For this, you need a language which can generate parallel computations cheaply on demand in great quantities. Many of our modern languages (Java, C#) have clumsy built-in “Thread” constructs with klunky invocations. C and C++ don’t have this built-in but you can use “fork” libraries; unfortunately these are even more clumsy to use and often have limits (e.g., you can’t have more parallel grains than threads your OS will offer you, limiting to at most a few thousand, or because of the big-stack assumption. A practical scheme in this case is to treat your threads like workers, each one expanding one unit of part of the frontier.
Another scheme is to have enough grains to match your problem structure well; you essentially use parallelism to manage nodes in the search space. For this you have to solve the big stack problem or you run out of stack space 🙁 We do some search-based algorithms to support software engineering tools in a proprietary language called PARLANSE which enables forking children almost trivially. PARLANSE is lisp-like; it has a (|| A B C ) built-in parallelism operator, for forking sub-computations A B C etc, as well as partial-order parallism and general-purpose parallelism (spawn X) to fork sub-computation X. All of these operators are cheap; they take only a few hundred cycles to fork a parallel bit of work; this is possible because the operators are built-into the language and the compiler understands them. You can see a depth first search coded using PARLANSE., you have to look hard for the (|| operator but its there (hint: find the inner loop!). A really useful property of PARLANSE’s parallelism is the ability to abort a computation and its children; this allows the “best answer” in the search space to propagate its result and kill off the other search children.
IBM’s X-10 parallel supercomputing language has facilities to spawn dynamic children (the finish construct) that seems pretty promising.