Hierarchical cluster analysis

Published on: 19/02/2012

Went through the hierarchical cluster analysis again with Warnar of the existing data matrix. What we did was the following:

  1. Select e.g. the all ‘aims’.
  2. Tick the dendogram-option.
  3. Let the procedure do its work.
  4. The result should be a horizontal dendogram with the number of the cases to the left.
  5. Below distance 10 it is worth to interpret and discuss the clustering.
  6. If I make a dendogram of each dimension I could compare them.
  7. Another way  would be to recode the cases with the clusters they form (at a certain distance) and perform another hierarchical cluster analysis across all the dimensions.
  8. The goal of hierarchical cluster analysis is to arrive or generate at hypotheses that can be tested against new data.
  9. On hypothesis could be that the cases are varying so much across the dimensions that no patterns are found within the dataset.
    1. I quickly tested that with one strong cluster w.r.t. to the ‘aim’ (cluster of cases 45, 53, 19, 41, 44) .
    2. In the clusters of ‘involved participating parties’ these cases were all divided over the different clusters below distance 10.
    3. I have to compare this to the results with comparisons I made between these two dimensions based on the two step clustering (e.g. tabel 5).
  10. I tried to cluster the two dimensions over the raw data but that is not informative anymore, that is:
    1. The result becomes messy:  most clusters are formed above distance 10.
    2. Some small weak clusters form, but it is hard to say -based on the data matrix – what causes it.
  11. So it is better to recode the cases for the clusters they are in wrt 1 dimension and then run the test again on more dimensions. Than I get close to what I did with the two step clustering.

What I have to do is:

  1. To-read the comments again on my concept article and then decide on these tests again.
  2. To re-read my article and find weak spots and hunches.
  3. Run the tests as mentioned above, also try the other output formats (new variables, other formats of output).
  4. Make a little report on that for discussion with Liesbet.

Hierarchical Cluster Analysis continued

Published on: 29/11/2011
  1. Possibility to combine this with ‘multi-dimensional scale analysis’. In Dutch: ‘meerdimensionale schaal analyse’.
  2. Look for articles on dichotomous data matrix combined with hierarchical cluster analysis. Or loglinear models for dichotomous datamatrices.
  3. Name that Frans Hubbard gave me : Pieter Kronenberg.

Feedback PhD-Lab commission

Published on: 29/10/2011

About two weeks ago I handed in my first draft article. I received feedback on two aspects: 1) on the content of the article and 2) on my achievements as a PhD-student up till now and, based on that, some advice for the future. The feedback came from Paul de Beer, Judith Metz and Liesbet van Zoonen:

Content of the article

  1. Judith described 3 parts in which my article could be divided:
    1. Finding the organizational model to describe a local memory website with (based on the literature review and fine-tuning of the model by looking at cases from the field).
    2. Mapping the field by comparing the model to a set of local memory websites.
    3. Analyzing the on-line activity and its relations to aspects of the model (passive participation =  visitors and active participation = contributors).
  2. Both Judith and Paul: The parts 1-1 and 1-2 above (organizational model and mapping the field) is enough for an article. This gives the opportunity (demand?) to make both steps more transparent and thus convincing. This becomes a descriptive article which is a bit harder (less exiting) to ‘sell’.
  3. The results of part 1-2 makes 1-3 possible which could also be a independent article based on a question like: “How do organizational aspects or a combination thereof invite people to participate?”
  4. Paul (p. 3) : shouldn’t you use the word dimension instead of category? Yes. (Lesson learned: An earlier reviewer didn’t understand the word dimension, but I shouldn’t have followed here comments so relentlessly)
  5. Paul and Liesbet (p. 5): The research question should be clear much earlier in the article (introduction), at the least roughly. Two remarks on the present question:
    1. “What characterizes …” is too broad; it could become more exciting when you put in something like clusters or patterns.
    2. “…these fifty three cases …” is too narrow, we would like to get the idea that the conclusions (might) apply to all local memory websites.
  6. Paul (p. 5): With respect to inter-rater reliability: how did the two researcher coop with their differences? Explain.
  7. Judith and Paul (table 2 p. 5): For the part where the model is fine-tuned based on the cases (1-1 above), the dispersion of lesser importance. On the other hand, for describing the field (1-2 above) the dispersion is important, just as finding (snowballing) and selecting them is. For this I could use a set of Dutch cases first and then see whether a set of foreign cases make changes in the model. If not, we might say that the model is applicable internationally. If it does, though, then we might be able to claim that there are differences with respect to local memory websites in different nations.
  8. All reviewers (p. 8-9): put these results in tables and in only pay attention in the text to the most remarkable findings.
  9. Paul (p. 11 and further): It is not clear why I use two-step clustering within the dimensions and not across them. The two-step clustering based on log-likely hood produces 2 clusters within the ‘poor model’ area of the results. I could tell the two-step cluster to make for example 5 clusters, then the quality of the model goes up. But, with the hierarchical clustering algorithm it might be easier to decide myself where I make a cut in the dendogram and look at the clusters at that level.
  10. Both: Be careful with claiming that there are relations between characteristics, but also with claiming there is no relation.
  11. Both: percentages (with absolute numbers, because the set of cases is small).
  12. Paul (p. 15) questioned whether I could suggest the existence of three types based on my methodology. I acknowledged that, but now that I look at my text again I doubt whether he was right:
    1. If A seems to have a relation/ co occurs with B and B with C, then it must be allowed to suggest there seems to be a type characterized by A, B and C (?)
  13. Paul thought this peace was a bit disappointing:
    1. “However, it should be noted that a direct relation between involved parties and on-line activity was not identified. In other words: claims about the on-line activity related to the three types of initiatives cannot be made at present. This is one of the methodological consequences of the approach performed here. We applied strong clustering on the levels of all categories in order to make the number of variables manageable and reducing the complexity. At the same time this means a loss of detail about, for example, the involved parties in which libraries and universities were clustered into knowledge institutions.”
    2. When I look at this again, I don’t think there is anything wrong with the conclusion (in bold). The explanation, though, is not adequate and causes the disappointment.  I should have said something like: “This is an interesting finding, because it implies that on-line engagement according to our data relies mostly on the affordances. The hypothesis that involved professionals have an negative influence on the on-line engagement has to be further tested.”
    3. Mental note: I did not cluster the cases into three types to check the relation between type and on-line engagement.
  14. The last thing we talked about was the content. I should make clear that I did not really look at the content, but only at some characteristics of how the content is framed. Like I say in the text, the actual content might be of (social) interest in general, but also for the on-line engagement specifically.


  1. The advice will be send later by Paul, but it is something like “you have done a lot of work in your first year: continue your PhD-project!”. Some tips for me:
    1. There should be more focus in the article; this might mean that I should go for two texts with both their own message.
    2. It is very good to have an empirical start like this in an early stage, but it has taken a lot of energy/ time. The theoretical foundations (e.g how website characteristics invite people to participate) are lacking up till now. Time to put that in.
    3. I should specialize more in quantitative approaches (in general, but specifically in hierarchical clustering techniques).

Possible crosstabs

Published on: 21/09/2011

Since I can not make every crosstab on my data (50+ variables) – and it is not very useful – I will have to make smart combinations. Here some leads:

  1. Investments in affordances but low engagement –> index affordances with number of contributions/ comments.
  2. Also check other hypotheses.
  3. I executed a crosstab-all kind of command and went through them to look for peculiar numbers (V-cramer).
  4. What do we do with the fact that the crosstabs are not ‘pure’, For example involved startup participant ‘individual citizen’ crossed with ‘support by donations’ can be accompanied on ‘both sides’ by other participants or supporters.
  5. I do have asymmetric relations in my data; so I should have used Goodman and Kruskal Tau?

Coding 14 new cases

Published on: 20/09/2011

Here I report on my experiences with coding 14 new cases. This is after having coded about 70 cases that included ones that were not exemplary. I threw 25 out which brought me back to 45 cases and then I used my list of reserves (21) from which I selected 14. I think it is important to document here what it is like to code the new ones after all the experiences up to now. I use this list now:

1. Involved, 2. Aims, 3 Methods, 4. Stories, 5. Affordances, 6. Evolution

  1. It seems to me now that some citizen initiatives do not have such strong aims “sense of place” or none at all (oranjeboompleinbuurt). So I added a code ‘missing’ to ‘aims’.
  2. I do not have a code for advanced social functions (like becoming friend or a group). Flickr has such functions. I added ‘making connections’, but I didn’t go through all the data to check for occurrences. (Memory of East has this too).
  3. Don’t have the affordance ‘map’ yet. And adding your own tags.
  4. Some cases are part of an overall social project or an overall neighborhood website.
  5. Some case set an informal tone, but get formal historical stories (comlumbus neighborhood stories).
  6. I have at least two regional websites in my cases.
  7. I believe by now that countries/ regions  built up local traditions wrt online collections of local stories more or less inspiring each other with storytelling projects.
  8. One of the cases (westpark) mentioned sharing as one of the purposes. So does the buurtwinkelproject.
  9. I am still surprised at a much time I loose with understanding each case in order to be able to code it. Maybe I also get lost a little in the local stories.
  10. With startup I don’t make a distinction between company and individual local social entrepreneurs, although the latter is present about 7 times I think.
  11. In case of elementary schools I coded the youth as ‘citizen’, not as ‘professional’ (which I did with students who were learning how to be a reporter etc).
  12. The difference between formal/informal and everyday/ historical becomes clearer again:
    1. Informal/ formal is how the text on the website talks about – or makes implications about – the style of the stories (e.g. “fascinating memories” is closer to informal then to formal).
    2. Everyday/ historical is sometimes literally in the texts, but can also be paraphrased (e.g. “a flavour of life as it was then” is closer to everyday the to historical).
  13. For example the aims that Buurtwinkels had with their neighborhoodshop-project are not online, but in other documents. In other words: not all the aims will be made explicit on the websites of the cases I have found.
  14. The cats ‘guestbook’ and ‘who knows’ are used often for the same purpose. Or better: the comment function seems to entail these two.
  15. In one case other memory websites and their community are participating partners. I have put the, in ‘other’.
  16. Should I position my field analysis as a pilot study?
  17. I don’t do much with the age of the site, but some of them have been around since 2003 and still active.
  18. Wrt to the contributions: I think I should look at 2010 and 2011 to make a claim about the activity.
  19. News items are counted only when they are not automatically imported from elsewhere (e.g. Floresta).
  20. Note the discrepancies between what’s on the website and what is going behind the scenes. For example buurtwinkels: “Deze site zal bewaard worden door het Amsterdam Museum, ook nadat de tentoonstellingen afgelopen zijn. Uw verhalen gaan dus niet verloren en zullen hopelijk ook in de toekomst door veel mensen gelezen worden.”. I know from the museum this is probably not going to happen.
  21. I am a bit worried about participation. The word means in my context to involve citizens, but in all cases citizens are meant to be involved. There are only a few that mention to let certain isolated groups participate. Those are the ones I coded.

Rethinking storyline article

Published on: 16/09/2011

I seem to forget the main storyline and the branches of my article when I am working on the analysis of my data. Another attempt to refresh my memory:

  1. The literature that I have found makes many claims about the importance of cases like the ones I am mapping.
    1. E.g. claims about cultural citizenship, community development, … sustainability….
    2. (Interestingly enough aims do not say anything about cultural citizenship)
  2. These claims are connected to idiosyncratic descriptions of researched enkelvoudige cases.
  3. When we look at a larger set of cases we see that there is a coherent field.
  4. When we further look at the organizational aspects and how they relate to the literature we see that these claims are not always working out.
  5. So with this article we want to extend the knowledge about these cases in order to prevent (or ask attention for) design mistakes and the wrong expectations.
  6. The next questions becomes:
    1. Why do people become engaged?
      1. What is in it for them explicitly and implicitly?
      2. What is in these stories that invites further action?
      3. How are peripheral professionals experienced?

An article with this message and these questions offers the foundation to continue with a part of these questions with respect to two cases which are successful in terms of engagement or continuity.

To do week August 18-25

Published on: 18/08/2011

The data matrix is a reduction of the available data coded in MaxQDA; e.g. the instances of the categories are not always informative for our questions.

  1. Develop new data matrix.
  2. Adjust code-tree in MaxQDA to this matrix. (number of visitors).
  3. Check and – if necessary – recode each case.
  4. Export to raw data matrix.
  5. Manipulate the raw matrix to the workable one (point 1).
