Skip to content

Discussion: Massively parallell CMS?

April 11, 2013

I’ve been thinking a lot about this lately, and I realize this is probably a question that can only be answered by actually trying, but still, I think it’s an interesting discussion to be had.

To what extent would it be useful to make a web application, such as a CMS, massively parallell? (or for that matter, concurrent, see comments)

And with that I mean, would it make sense to break the typical one thread per request model? Could we break it up so that fetching from the database is made in different threads or processes than rendering the template? Could the rendering be split up into smaller bits, so that parts of the template could be rendered while the data is still being fetched from the database? Are there other things that could be split up and made parallel?

About these ads

From → plone, python

15 Comments
  1. matthamilton permalink

    Why bother?! I mean really, is there a point in parallelising a single task when many of those tasks happen in parallel anyway? What resource is being underused?

    The only time I think it could be useful is when you have many CPU cores and hardly any traffic. But what is the point there anyway?

    I guess for maybe some very specific use cases it could be good, but I don’t see how those tasks could then be generalised by the CMS (eg some heavy operation like plotting a large chart).

    -Matt

    • Each core/CPU could be more specialized and hence touch much less of the memory, making cache more efficient. Memory bandwidth can often be a bottleneck, so that could be a benefit under really heavy loads. Of course, that’s all theoretical. :-)

  2. What you’re saying makes a lot of sense on the frontend part: e.g. the Guardian has an architecture that paralellizes fetching Content, Images, Ads, etc. and gives each task a deadline: once the content is there and the ads don’t make it within X milliseconds the page is served without it.

  3. Little off-topic, but how thepiratebay.org content is managed and distributed may count as ultimate parallel :)

  4. The additional complexity wouldn’t be worth it. The vast majority of your time is actually going to be spent waiting for disk and network IO, which can often be “solved” more efficiently without touching threads. This may swing more on the side of concurrency than parallelism, but make sure your CMS supports async IO or cooperative multi-tasking (gevent, eventlet can monkey patch this automagically, Twisted can be used to build something more “purely/correctly” async).

    • Well, OK, you seem to say that a deeply concurrent system is worth it at least, and even if it isn’t parallel it’s halfway there. ;-)

  5. It seems to me that because of the http://en.m.wikipedia.org/wiki/Amdahl's_law the actual return from such parallelization would only be justified if the implementation would be deeply parallel. Moreover, to see any benefits, it would have to be ran at a massively parallel scale.

    While it at least seems feasible, is it worth it? In the former case, the parallelization would incur strict limits on programming techniques used for implementation. In the latter, you’d have probably enough computing power that you no longer care so much about efficiency (e.g. you are successful enough to run it anyway).

    All in all, this is probably the reason no-one did it yet. Would have been fun to see a proof of concept, though.

  6. tadeas permalink

    What about Twisted, or other async frameworks? It has a threadpool for database connections and (can) use different threads for different tasks.

    • There are several frameworks that will help you do things like this, that’s not the question, and Twisted is one of them. The question, however, is if there is even any point in doing it. :-)

  7. People already do this, frequently. Large complicated pages are seldom composed in a single request on the server anymore. Instead complicated pages are usually delivered in an incomplete way, with script tags and other Javascript filling in the rest – all of which is done in a parallel way with a variety of requests to other backend services.

    Some server-side systems like SSIs (which can easily do subrequests) use similar techniques, but that seems to have fallen out of favor with the advent of client-side composition.

    • Absolutely. I was thinking of something orthogonal to client-side though, ie, somehow handling each request in a more parallel or concurrent way. My example with template rendering and data fetching is probably irrelevant though, as the we is going towards not having any rendering at all done on the server, but done completely on the client side.

  8. In CMS the C means content. The first job is to make the content available as a service. It scales easily. Then build the platform to interact with the content adding main functionalities: workflow, publication path, versioning, translation, platform sharing. These are metadata platform specific. They can be splitted in facets and each facet can be processed by different web services. It scales less but it scales.

  9. I hadn’t thought about even separating out things like workflow into a separate service from the CRUD service. That is an interesting idea, and with that fine-grained services parallelism/concurrence does indeed become quite pointless. I think that answers the question. :-)

  10. mike bayer permalink

    I can hardly imagine these days delivering CMS content by hitting a database each time the page is rendered. In the vast majority of situations, the content should be entirely static, and if not, it would be behind a caching layer, if not something internal then using something like Cloudflare. Anything on the page that’s “dynamic”, like comments or whatnot, should be pulled in client side using an approach like that of Disqus – in 2013, there’s no longer any need to string together pages using server side technology for everything, client techniques can create heterogeneous compositions more effectively.

    • If you are caching away the request do it never reaches the CMS, then I don’t count it. ;-) Also, writes are a much harder problem than reads.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,339 other followers

%d bloggers like this: