Tuesday, January 25, 2011

Implementing the CAH

So how do we do this? How do we implement the Continual Aggregate Hub (see also comment on restful soa or domain driven design).

I recall how fast we produced business logic using Smalltalk and object oriented programming with good data structures and data operators. Since then I have not seen much to "Algorithms and data structures". What we all learned in basic IT courses at University, does not seem to be in much use. Did SQL or DTO's ruin it all? Did the Relational normalized data structure put so many constraints that programming was reduced to plain looping, some calculations and moving data to the GUI? Where is the good business model that we can run code on. We need good software craftsmanship.

The basis for good implementation is a good architecture. We need to make some decisions on architecture (and their implementations) for our processing and storage. I would really like to have the same structure in the storage architecture as in the processing layer (no mapping!). It means less maintenance and a more verbose code base. There are so many interesting cases, patterns, and alternative implementations that we are not sure where to start. So maybe you could help us out?

Our strategy for our target environment is about parallel programming and being able to scale out. I find this talk interesting; at least slides 69/35 and the focus on basic algebra properties. http://www.infoq.com/presentations/Thinking-Parallel-Programming I find support here for what we are thinking of; the wriggle room, sequence does not matter. The waiters in the CAH restaurant are Associative and Commutative handling orders. I also agree that programmers should not think about parallel programming, they should think "one-at-a-time". But in designing the system parallel processing should be modeled in and it should be part of the architecture.

It seems like Tuple Space is a right direction, but also here there are different implementations. But what other implementations is there that will be sound and solid enough for us? Several implementations are referenced at (http://blogs.sun.com/arango/entry/coordination_in_parallel_event_based), but which?

For the storage architecture there are also many alternatives. Hadoop with HBASE, or MarkLogic for instance. Or is Hadoop much more. If we can have all storage and processing at every node. How do we manage it? How much logic can be put into the Map-Reduce. What is practical to process before you merge the result?
I just cant let go of feeling that storage structured is within a well known and solid relational database. The real challenge is to think totally different as to how we should handle and process our data. (see document store for enterprise applications) Is it really needed to have a different data structure in the storage architecture? Maybe I am feeling like waking up from a bad dream.

In the CAH blogg we want to store the Aggregates as they are. I think we not need different data structure in processing architecture (layer) and the storage architecture.

(2013.10.30): It is now implemented:  big data advisory definite content

Creative Commons License
Implementing the CAH by Tormod Varhaugvik is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

No comments:

Post a Comment