The Life & Mind Seminar Network

Some problems with information theory

Posted in General by Tom Froese on April 18, 2007

Hi all,

I’ve been thinking about the pros and cons of information theory as applied to cognitive science for a while now. The discussions during yesterday’s COGS talk made it clear to me yet again that I’m just failing to understand the significance of adopting this kind of approach. This is because I think that there are some severe practical, theoretical, and philosophical problems associated with an information theoretic approach to cognitive science.


First of all, a practical consideration: in order to calculate the amount of “information” transfered between the “environment” and an “agent” it is necessary to have a model of both systems and their manner of coupling. For this we need to know all relevant state variables, all the states that these can take, the conditional dependencies between them, and the probability associated with each of the states. This poses severe practical problems for investigating any system that goes beyond a simple GOFAI toy world. How many different states can our environment have? Could we ever know them all? As Inman would say, we do not even have a model of the humble nematode worm!

Second, a theoretical consideration: even if we assume that we had all the relevant models so that we could start applying the methods of information theory, then there is still the question of what more knowledge can be gained from doing this. Doesn’t this imply the additional assumption that we must already have sufficient understanding of the system’s structure and operation to obtain a model in the first place? In other words, we need to know how the system works before we can even begin to apply these methods. Accordingly, it would seem that all that information theory could provide is a theoretical rediscription of a system which we already understand anyway.

Third, a philosophical consideration: can an information theoretic rediscription of such a system provide us with any new knowledge about the way in which an autonomous agent operates? It seems that the answer is no, because information theory requires us to have access to relational knowledge that we only have in virtue of being external observers. Even if we could determine what amount of “information” is transfered into a particular variable of the agent from its environment, for example, this would tell us nothing about the internal operations of that agent.

Finally, even if someone would want to claim that an agent acts on information theoretic principles this would require it to have an internal model of itself, its environment, and the relationship between the two. Of course, this might sound ok to GOFAI practitioners, but it sounds absurd to those who favor a more embodied-embedded approach.  And would such an undertaking even be possible in principle? For does it not end up in an infinite regress of internal models (since the existence of the internal model in the agent needs to be included in the internal model of the agent, etc.)?

These considerations seem to severely limit the viability of an information theoretic approach to cognitive science. Perhaps there is a niche for it in engineering GOFAI robots? However, judging from the excitement that information theory has generated in some areas, I’m probably just missing the point. Could someone please enlighten me?

Cheers,

Tom

Advertisements

14 Responses

Subscribe to comments with RSS.

  1. Simon McGregor said, on April 18, 2007 at 4:38 pm

    OK, lots to comment on here… I’m not sure what your idea of information theory is, but it doesn’t seem to match mine.

    Information theory can be seen as a set of mathematical tools for quantifying the degree to which different variables in a system can vary, and the degree to which their variation depends on one another. It is not about semantic information or philosophical intensionality. Although some people will claim otherwise, there is no particular philosophical commitment in the mathematics of information theory to regarding entropy (“uncertainty”) as a subjective property.

    The reason information theory is so exciting is that it provides a demonstrably “correct” way of quantifying uncertainty and reduction of uncertainty. Entropy is a more fundamental measure of variation and co-variation than ad-hoc measures like standard deviation or Pearson correlation.

    Providing that two variables can be measured under different circumstances, no model of their relationship is necessary to measure their sample entropy or conditional entropy, any more than a model is necessary to measure their standard deviation or Pearson correlation. It is true that some practical issues apply: information-theoretic measures require more samples to estimate accurately and assumptions need to be made to apply them to continuous variables. Note that other statistical measures have the same sort of assumptions, but they are implicit rather than explicit.

  2. nathanielvirgo said, on April 18, 2007 at 4:46 pm

    Hi Tom,

    Unfortunately I missed the COGS talk yesterday so I’m missing some of the background to this discussion. There are many ways in which information theory can be applied and some make more sense than others. I am skeptical of some of the studies out there but I want to point out some of the ways of applying information theory that do make sense.

    For me the main reason to apply information theory would indeed be to analyse a model that has been produced by some other means, perhaps by evolving a CTRNN. (The whole point of evolutionary robotics is that we don’t need to “already have sufficient understanding of the system’s structure and operation to obtain a model.”) Information theory is a tool that can be used to analyse a dynamical system, so why not use it? Perhaps it would produce insights that a classical analysis in terms of attractors might miss. Unfortunately I don’t think anyone has tried to do this — the closest I know of is Daniel Polani’s work with very simple finite state machine based agents in grid worlds.

    Your argument that an information theory analysis of a model cannot give us any new knowledge would apply to a theoretical analysis of any kind, including a Beer style dynamical systems kind of analysis.

    A second good reason to use information theory is to figure out the optimum performance in a given task. Of course, if a human or other agent is able to achieve this optimum it doesn’t imply that they’re using information theory to do it but knowing that they are performing optimally would surely be useful.

    A third reason is simply to do statistics – information theory can provide better measures of correlation than classical statistics.

    In short, if someone says that their results imply that people must be using information theory to perform a task then I’m very skeptical, but if someone says it cannot be a useful tool for analysis then I think they’re being silly.

  3. tomfroese said, on April 19, 2007 at 10:24 am

    Thanks for your comments!

    While I agree that a Beer style dynamical systems kind of analysis also presupposes the existence of a model dynamical system of the agent and its environment, the kind of knowledge that such an analysis generates presumably pertains to the actual operations of the system under study.

    An information theoretic analysis can also provide us with new knowledge, but it is important to note that it is of a fundamentally different kind. The outcome of such an analysis does not tell us anything about the underlying mechanisms that are responsible for the observed phenomenon. For example, while it can tell us that uncertainty has been reduced in the system, it cannot tell us how this happened in operational terms.

    Moreover, I think it would be a category mistake to include this kind of observation in an explanation of the agent’s behavior, because the knowledge of uncertainty only pertains to a relational domain of discourse generated by an external observer who can make the necessary correlations between the agent and its environment. The agent itself, when viewed as an autonomous system, does not have epistemic access to this domain.

    I think what makes me suspicious the most is that an information theory approach to cognitive science is implicitly committed to a representationalist epistemology. Rather than treating the coupling between an agent and its environment in terms of a viable fit, it is viewed in terms of optimality – something that is only possible if we assume the existence of an objective pre-given environment to which the agent stands in a correspondence relation! While taking such a perspective is certainly possible, I doubt whether it is useful in improving our understanding of the behavior of autonomous agents.

  4. Marieke said, on April 19, 2007 at 4:27 pm

    I agree with Simon and Nathaniel that a formalism is not born with an interpretation, even though, obviously, some interpretations seem more intuitive than others. But as a mathematical language information theory is a priori nothing but analytically sound, and how you apply it may or may not be dodgy.

    Some work in which information theory-ish investigation of agent environment interactions has been tried was presented at ECAL 2005 http://www.cs.unimaas.nl/g.decroon/RLEL.pdf by Guido de Croon et al. it’s a discretised version of a Beer agent + environment. I can kind of see the point of that.

  5. Nathaniel Virgo said, on April 25, 2007 at 4:19 pm

    Tom,

    It is not true that an information theory analysis cannot tell us anything about the underlying mechanisms that are responsible for observed phenomena. Anyone who disagrees should read pages 48-58 of Edwin Jaynes’ Where do we Stand on Maximum Entropy?. (anyone else with a strong interest in information theory should read it as well, it is a classic demonstration of the power of information theory as a statistical method in science) Information theory does not just tell us that “uncertainty has been reduced within the system” (whatever that means). With an appropriate set of experiments it tells us which parts of the system affect which other parts of the system, how strongly, and why.

    I agree, as I already said, that attributing such an analysis to the agent itself would be a rather silly category mistake, but the point that both Simon and I are trying to make is that applying information theory does not have to entail doing that. I know that some people do try to do that, and I agree that they are being silly, but you should say that your problem is with those people’s approach rather than with the mathematical formalism of information theory itself.

  6. tomfroese said, on April 26, 2007 at 12:58 pm

    Nathaniel,

    I’m sure that the mathematical formalism of information theory can be very useful in different contexts, but I’m wondering about its applicability to cognitive science. Can it help us to understand the operations of an autonomous agent?

    In the context of perception, for example, it seems to go hand-in-hand with the interpretation of the agent as an input/output system which processes information in such a way that some of its internal states tend to have an increased correlation to some environmental states. In this manner the agent is not treated as autonomous (in the sense of operational closure) and perception is viewed as essentially a process of representation.

    I therefore guess that GOFAI and cognitivism would be quite happy to make use of information theory to support their theories, but how does such an approach help embodied-embedded or enactive approaches to cognitive science?

  7. Nathaniel Virgo said, on April 29, 2007 at 6:00 pm

    “In the context of perception, for example, it seems to go hand-in-hand with the interpretation of the agent as an input/output system which processes information in such a way that some of its internal states tend to have an increased correlation to some environmental states.”

    No, I really don’t think it does. That’s the main point that I’ve been trying to get across. There absolutely, categorically, is no such assumption in the formalism of information theory, and it absolutely, categorically can be applied without making such an assumption.

    Information theory provides a powerful set of tools for measuring correlations, but you don’t have to make any assumptions about what those correlations mean. In fact, if you had an artificial agent that did not work in a representational way then information theory would enable you to prove that it didn’t, simply by showing that there wasn’t a correlation between its internal states and its environment.

    Even if such a correlation did turn out to exist (and to be honest in most interesting cases I think it will) it does not follow that this should be interpreted as a representation. Why should it? Nobody would say that lung cancer is a representation of smoking, though they are certainly correlated.

    In my opinion the “information” in an information theory analysis should not be seen the agent’s information about anything. It’s far better to conceptualise it as our information about the agent and its environment. Hopefully this should make it a bit more clear why I don’t think the use of information theory implies any assumption about information being stored or processed by an agent.

    “but how does such an approach help embodied-embedded or enactive approaches to cognitive science?”

    By providing a powerful set of tools and language for analysing statistical data and dynamical models. I actually think that information theory could be a serious asset to the enactive approach, precisely because it doesn’t make any assumptions about meaning or representation.

  8. tomfroese said, on April 30, 2007 at 12:08 pm

    In my opinion the “information” in an information theory analysis should not be seen the agent’s information about anything. It’s far better to conceptualise it as our information about the agent and its environment.

    I agree, though this is not the way it has been presented in the talks that I’ve seen so far. But maybe it is just very easy to misunderstand. For example, when Simon says:

    The reason information theory is so exciting is that it provides a demonstrably “correct” way of quantifying uncertainty and reduction of uncertainty.

    The immediate questions would be: uncertainty about what? uncertainty for whom? To me at least, when seen in the context of cognitive science it would seem intuitive to interpret the statement as referring to the agent’s uncertainty about its environment and not our uncertainty, and thereby entailing all the problems we’ve been discussing.

  9. Nathaniel Virgo said, on April 30, 2007 at 3:15 pm

    Yeah, there have been quite a dew definite mis-applications of information theory that do treat it as the agent’s information — I think this is a dead end as far as cognitive science goes, although it’s probably the way forward for GOFAI style robotics.

    Information theory can also be difficult to interpret: Daniel Polani’s work doesn’t assume that the information is the agent’s information, but because of the way the formalism works you end up with statements like “the information flows in through the sensors to the agent’s internal state, and then it flows out through the motors,” which makes it sound like it’s the agent’s information. In fact it’s just our uncertainty: we start of with a random distribution of states in the environment, so we’re uncertain about the environment. Then the agent interacts with the environment using its sensors, so we become uncertain about the robot’s internal state, then it interacts using its motors, so we become uncertain about the state of the environment again. (this is, of course, partly because that particular agent is designed to work that way. It would not necessarily be as simple for a properly embodied CTRNN, but one could still do the same kind of analysis to find new and interesting things)

    Doing this allows us to asses the system in a causal way. Using information theory allows us to say which aspects of the final state of the system were due to the agent’s actions, and which were due to other things. I think that this kind of formal causal reasoning will be an important development in the use of dynamical systems theory to analyse artificial agents.

  10. xabier said, on May 22, 2007 at 3:08 pm

    I hope it is not too late to join the discussion ;)

    I agree with Simon and Nathaniel on the usefulness of information theory and I believe that enactive approaches to cognition have a lot to gain from causal-correlational measurements of behavioural mechanisms. Yet, there is the need to avoid representationalist interpretations and I think we would all agree on that it would have been much better for the history of cognitive science if Shannon had decided to use a different word other than “information” to name his theory, with all its semantic load attatched to it and the army of latter representationalists exploiting part of it.

    One way in which information theory comes to help cognitive (neuro)science is precisely on determining which of the potential observables of the nervous system are relevant for the production of behaviour. This problem is generally termed (unfortunatelly again) the “coding problem”. But there is absolutely no need to buy the whole application with the semantic load. If you look at it carefully the problem has a dynamical and non-necesarilly-representaional nature: i.e. Which are the mechanisms that propagate dynamic variability within the nervous system? in other words, more sympathetic to the enactive approach: Which are the mechanisms that sustain sensorimotor correlations? In order to answer this question information theory is not only useful, but even necessary. When we model autonomous agents using CTRNNs we are assuming that some mechanism within a natural autonomous agent is performing similar sensorimotor transformations. And nobody truly believes that CTRNNs are accurate models of real neurons at all. So you are saying really nothing about the mechanisms using CTRNNs unless you are able to link them (as a modelling tool) with their target object: real brains. And in order to do so you need to understand how it is possible for brains to create sensorimotor correlations in a cuasi-un-constrained way (as CTRNN’s universalist approximations capacities do). And, in order to do so, you need to measure which are the (observable and manipulable) variables within real brains that do, in fact, “instantiate” dynamical models. Thus the importance for cognitive neuroscience to determine if activation rates or interspike intervals are doing the job. We need information theory to determine which are the variables that can and do support correlations. And there is no need (not even help) of assuming a representationalist framework to find that out.

    This is but one of the applications of information theory to cognitive science (and with significant relevance for enactive approaches). Another one will be just shown by Anil Seth on the next seminar and it addresses a similiarly critical problem for enactive approaches: what makes a system an autonomous agent? This question cannot be solved solely on the basis of being dynamical (as opposed to a representational input-output device). Studying the causal organization of behaviour that makes it internally generated (rather than externally driven and thus, heteronomous) is critical to solve this problem. And, once again, information theory comes with a nice set of tools to solve it.

    yours,

    xabier

  11. tomfroese said, on June 28, 2007 at 7:57 am

    I was sent the following e-mail regarding this discussion. I’m sure it’s ok with Malcolm if I make his thoughts available here:

    Tom

    Today I came across your note of April 18th on the Life & Mind blog:
    https://lifeandmind.wordpress.com/2007/04/18/some-problems-with-information-theory/#more-82

    As you can see below, I have been researching and writing in this area for some time, and we share an interest in Buddhism.

    The problem you encountered arises time and again when ideas such as Information and Thermodynamics are taken simply as mathematical methods or phenomena, leaving intact the researcher’s previous background and education. To the contrary, I believe that both topics require recognition that they are fundamental philosophical stances, to be taken whole, and as a fresh start. Since one of your respondents mentioned Jaynes, I will add that things go a lot smoother when the statistical philosophies of frequentism and bayesianism are distinguished and segregated.

    Information and Thermodynamics are two sides of the same cosmic coin, best approached from the beginning through Eastern philosophy. To approach them through the baggage of Greek and European thought is to waste much time in comparing and contrasting this and that with appropriate references to Descartes or Searle or whomever. This makes for a fine career in academic blather, but it does not further the investigation.

    Sincerely,

    Malcolm Dean
    Los Angeles CA
    malcolmdean-at-gmail.com

    Recent Lectures/Publications:
    “Hitchhiker’s Guide to (Cognitive) Thermodynamics,” UCLA BRI, May 21, 2007
    “Theology of Information,” EOR, Hawaii, January 5, 2007
    “Outline of Cognitive Thermodynamics,” SCTPLS, Denver, August 5, 2005
    “Cognitive Thermodynamics in Culture & Religion,” SSSR, Kansas City, Oct 22, 2004
    “General Theory of Cognitive Systems,” UCLA BRI, May 13, 2004
    http://www.com.washington.edu/rccs/bookinfo.asp?ReviewID=288&BookID=232
    Los Angeles Times, Op-Ed, April 8, 2004
    DesktopLinux.com, O’Reilly.com
    http://www.oreillynet.com/pub/au/228
    Former News Editor, Maximum Linux, XML Journal
    Former Principal Editor, Academic Computing, UCLA

  12. Dr Mahesh Chand Jain said, on October 5, 2008 at 10:53 am

    In nature any and every event is associated with information generation , in the least about the event itself . This information is stored using natural systems based on some natural code and language such that representation of stored information exactly corresponds reality and this phenomenon is the beginning of cognition and also the point of origin and evolution of information and insttuction sciences in nature . Without such a phenomenon it is difficult to concieve even origin and evolution of physical universe, leave apart origin and evolution of life.
    Newton was once questioned about regularity of planetary motion , and to this the reply was that it is divine. I anticipate this might as well be the result of natural instruction and information systems.

  13. Alex said, on November 3, 2010 at 7:39 pm

    We must be very careful (like great Shannon stated) when we speak about the domain of application of information theory. Its main application domain is telecommunication (it has been developed for communication purposes). Even in Information science there are areas where information theory is not basis(information retrieval and data extraction). In particular intersection between information theory and cognitive science is vague. I would say that information processing has much stronger common points with cognitive science than information theory. Also many so called inter-area, like cognitive radio are not to be long-standing.

  14. Nathaniel Virgo said, on November 4, 2010 at 12:39 pm

    Alex,

    I agree that we must be careful when speaking about the domain of application of information theory. However, I disagree that its main application is in communication theory. Many people do think this, but to my mind it this is a historical accident that stems from the fact that Shannon happened to be a communications engineer. Because of this, the theory was first developed around formal objects such as information sources, channels and receivers. For this reason, people often look for analogues of these concepts in other application domains.

    However, information theory can be seen in a much more general way than this. I know of no better explanation than Edwin Jaynes’ long article “Where do we Stand on Maximum Entropy”, which I linked to earlier in the thread. On this view, information theory is part of Bayesian probability theory, which in turn is best thought of as a uniquely correct extension of logic to deal with reasoning under uncertainty. Information sources, channels, and receivers are then to be thought of as objects from the application domain of communication theory, and when one applies the general theory of probabilistic inference to these types of object one obtains the domain-specific theory derived by Shannon.

    But like I said, one still has to be careful. One must always be able to answer the questions “whose information is it?” and “what is it information about?”. These questions are rarely asked in applications of information theory to cognitive science and neuroscience, which leads to a conflation between information belonging to the subject and information beloning to the experimenter. (This was, broadly speaking, the subject of Mike Beaton’s talk on Tuesday, for those who were there.)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: