It’s Time For Some Philosophical Investigations

Ludwig Wittgenstein’s Philosophical Investigations is a nice piece of work where he attempts to explain his views on language and the consequences of this view on various subjects like logic, semantics, cognition, and psychology.  I’ve mentioned some of his views very briefly in a couple of earlier posts, but I wanted to delve into his work in a little more depth here and comment on what strikes me as most interesting.  Lately, I’ve been looking back at some of the books I’ve read from various philosophers and have been wanting to revisit them so I can explore them in more detail and share how they connect to some of my own thoughts.  Alright…let’s begin.

Language, Meaning, & Their Probabilistic Attributes

He opens his Philosophical Investigations with a quote from St. Augustine’s Confessions that describes how a person learns a language.  St. Augustine believed that this process involved simply learning the names of objects (for example, by someone else pointing to the objects that are so named) and then stringing them together into sentences, and Wittgenstein points out that this is true to some trivial degree but it overlooks a much more fundamental relationship between language and the world.  For Wittgenstein, the meaning of words can not simply be attached to an object like a name can.  The meaning of a word or concept has much more of a fuzzy boundary as it depends on a breadth of context or associations with other concepts.  He analogizes this plurality in the meanings of words with the relationship between members of a family.  While there may be some resemblance between different uses of a word or concept, we aren’t able to formulate a strict definition to fully describe this resemblance.

One problem then, especially within philosophy, is that many people assume that the meaning of a word or concept is fixed with sharp boundaries (just like the fixed structure of words themselves).  Wittgenstein wants to dispel people of this false notion (much as Nietzsche tried to do before him) so that they can stop misusing language, as he believed that this misuse was the cause of many (if not all) of the major problems that had cropped up in philosophy over the centuries, particularly in metaphysics.  Since meaning is actually somewhat fluid and can’t be accounted for by any fixed structure, Wittgenstein thinks that any meaning that we can attach to these words is ultimately going to be determined by how those words are used.  Since he ties meaning with use, and since this use is something occurring in our social forms of life, it has an inextricably external character.  Thus, the only way to determine if someone else has a particular understanding of a word or concept is through their behavior, in response to or in association with the use of the word(s) in question.  This is especially important in the case of ambiguous sentences, which Wittgenstein explores to some degree.

Probabilistic Shared Understanding

Some of what Wittgenstein is trying to point out here are what I like to refer to as the inherently probabilistic attributes of language.  And it seems to me to be probabilistic for a few different reasons, beyond what Wittgenstein seems to address.  First, there is no guarantee that what one person means by a word or concept exactly matches the meaning from another person’s point of view, but at the very least there is almost always going to be some amount of semantic overlap (and possibly 100% in some cases) between the two individual’s intended meanings, and so there is going to be some probability that the speaker and the listener do in fact share a complete understanding.  It seems reasonable to argue that simpler concepts will have a higher probability of complete semantic overlap whereas more complex concepts are more likely to leave a gap in that shared understanding.  And I think this is true even if we can’t actually calculate what any of these probabilities are.

Now my use of the word meaning here differs from Wittgenstein’s because I am referring to something that is not exclusively shared by all parties involved and I am pointing to something that is internal (a subjective understanding of a word) rather than external (the shared use of a word).  But I think this move is necessary if we are to capture all of the attributes that people explicitly or implicitly refer to with a concept like meaning.  It seems better to compromise with Wittgenstein’s thinking and refer to the meaning of a word as a form of understanding that is intimately connected with its use, but which involves elements that are not exclusively external.

We can justify this version of meaning through an example.  If I help teach you how to ride a bike and explain that this activity is called biking or to bike, then we can use Wittgenstein’s conception of meaning and it will likely account for our shared understanding, and so I have no qualms about that, and I’d agree with Wittgenstein that this is perhaps the most important function of language.  But it may be the case that you have an understanding that biking is an activity that can only happen on Tuesdays, because that happened to be the day that I helped teach you how to ride a bike.  Though I never intended for you to understand biking in this way, there was no immediate way for me to infer that you had this misunderstanding on the day I was teaching you this.  I could only learn of this fact if you explicitly explained to me your understanding of the term with enough detail, or if I had asked you additional questions like whether or not you’d like to bike on a Wednesday (for example), with you answering “I don’t know how to do that as that doesn’t make any sense to me.”  Wittgenstein doesn’t account for this gap in understanding in his conception of meaning and I think this makes for a far less useful conception.

Now I think that Wittgenstein is still right in the sense that the only way to determine someone else’s understanding (or lack thereof) of a word is through their behavior, but due to chance as well as where our attention is directed at any given moment, we may never see the right kinds of behavior to rule out any number of possible misunderstandings, and so we’re apt to just assume that these misunderstandings don’t exist because language hides them to varying degrees.  But they can and in some cases do exist, and this is why I prefer a conception of meaning that takes these misunderstandings into account.  So I think it’s useful to see language as probabilistic in the sense that there is some probability of a complete shared understanding underlying the use of a word, and thus there is a converse probability of some degree of misunderstanding.

Language & Meaning as Probabilistic Associations Between Causal Relations

A second way of language being probabilistic is due to the fact that the unique meanings associated with any particular use of a word or concept as understood by an individual are derived from probabilistic associations between various inferred causal relations.  And I believe that this is the underlying cause of most of the problems that Wittgenstein was trying to address in this book.  He may not have been thinking about the problem in this way, but it can account for the fuzzy boundary problem associated with the task of trying to define the meaning of words since this probabilistic structure underlies our experiences, our understanding of the world, and our use of language such that it can’t be represented by static, sharp definitions.  When a person is learning a concept, like redness, they experience a number of observations and infer what is common to all of those experiences, and then they extract a particular subset of what is common and associate it with a word like red or redness (as opposed to another commonality like objectness, or roundness, or what-have-you).  But in most cases, separate experiences of redness are going to be different instantiations of redness with different hues, textures, shapes, etc., which means that redness gets associated with a large range of different qualia.

If you come across a new qualia that seems to more closely match previous experiences associated with redness rather than orangeness (for example), I would argue that this is because the brain has assigned a higher probability to that qualia being an instance of redness as opposed to, say, orangeness.  And the brain may very well test a hypothesis of the qualia matching the concept of redness versus the concept of orangeness, and depending on your previous experiences of both, and the present context of the experience, your brain may assign a higher probability of orangeness instead.  Perhaps if a red-orange colored object is mixed in with a bunch of unambiguously orange-colored objects, it will be perceived as a shade of orange (to match it with the rest of the set), but if the case were reversed and it were mixed in with a bunch of unambiguously red-colored objects, it will be perceived as a shade of red instead.

Since our perception of the world depends on context, then the meanings we assign to words or concepts also depends on context, but not only in the sense of choosing a different use of a word (like Wittgenstein argues) in some language game or other, but also by perceiving the same incoming sensory information as conceptually different qualia (like in the aforementioned case of a red-orange colored object).  In that case, we weren’t intentionally using red or orange in a different way but rather were assigning one word or the other to the exact same sensory information (with respect to the red-orange object) which depended on what else was happening in the scene that surrounded that subset of sensory information.  To me, this highlights how meaning can be fluid in multiple ways, some of that fluidity stemming from our conscious intentions and some of it from unintentional forces at play involving our prior expectations within some context which directly modify our perceived experience.

This can also be seen through Wittgenstein’s example of what he calls a duckrabbit, an ambiguous image that can be perceived as a duck or a rabbit.  I’ve taken the liberty of inserting this image here along with a copy of it which has been rotated in order to more strongly invoke the perception of a rabbit.  The first image no doubt looks more like a duck and the second image, more like a rabbit.

Now Wittgenstein says that when one is looking at the duckrabbit and sees a rabbit, they aren’t interpreting the picture as a rabbit but are simply reporting what they see.  But in the case where a person sees a duck first and then later sees a rabbit, Wittgenstein isn’t sure what to make of this.  However, he claims to be sure that whatever it is, it can’t be the case that the external world stays the same while an internal cognitive change takes place.  Wittgenstein was incorrect on this point because the external world doesn’t change (in any relevant sense) despite our seeing the duck or seeing the rabbit.  Furthermore, he never demonstrates why two different perceptions would require a change in the external world.  The fact of the matter is, you can stare at this static picture and ask yourself to see a duck or to see a rabbit and it will affect your perception accordingly.  This is partially accomplished by you mentally rotating the image in your imagination and seeing if that changes how well it matches one conception or the other, and since it matches both conceptions to a high degree, you can easily perceive it one way or the other.  Your brain is simply processing competing hypotheses to account for the incoming sensory information, and the top-down predictions of rabbitness or duckness (which you’ve acquired over past experiences) actually changes the way you perceive it with no change required in the external world (despite Wittgenstein’s assertion to the contrary).

To give yet another illustration of the probabilistic nature of language, just imagine the head of a bald man and ask yourself, if you were to add one hair at a time to this bald man’s head, at what point does he lose the property of baldness?  If hairs were slowly added at random, and you could simply say “Stop!  Now he’s no longer bald!” at some particular time, there’s no doubt in my mind that if this procedure were repeated (even if the hairs were added non-randomly), you would say “Stop!  Now he’s no longer bald!” at a different point in this transition.  Similarly if you were looking at a light that was changing color from red to orange, and were asked to say when the color has changed to orange, you would pick a point in the transition that is within some margin of error but it wouldn’t be an exact, repeatable point in the transition.  We could do this thought experiment with all sorts of concepts that are attached to words, like cat and dog and, for example, use a computer graphic program to seamlessly morph a picture of a cat into a picture of a dog and ask at what point did the cat “turn into” a dog?  It’s going to be based on a probability of coincident features that you detect which can vary over time.  Here’s a series of pictures showing a chimpanzee morphing into Bill Clinton to better illustrate this point:

At what point do we stop seeing a picture of a chimpanzee and start seeing a picture of something else?  When do we first see Bill Clinton?  What if I expanded this series of 15 images into a series of 1000 images so that this transition happened even more gradually?  It would be highly unlikely to pick the exact same point in the transition two times in a row if the images weren’t numbered or arranged in a grid.  We can analogize this phenomenon with an ongoing problem in science, known as the species problem.  This problem can be described as the inherent difficulty of defining exactly what a species is, which is necessary if one wants to determine if and when one species evolves into another.  This problem occurs because the evolutionary processes giving rise to new species are relatively slow and continuous whereas sorting those organisms into sharply defined categories involves the elimination of that generational continuity and replacing it with discrete steps.

And we can see this effect in the series of images above, where each picture could represent some large number of generations in an evolutionary timeline, where each picture/organism looks roughly like the “parent” or “child” of the picture/organism that is adjacent to it.  Despite this continuity, if we look at the first picture and the last one, they look like pictures of distinct species.  So if we want to categorize the first and last picture as distinct species, then we create a problem when trying to account for every picture/organism that lies in between that transition.  Similarly words take on an appearance of strict categorization (of meaning) when in actuality, any underlying meaning attached is probabilistic and dynamic.  And as Wittgenstein pointed out, this makes it more appropriate to consider meaning as use so that the probabilistic and dynamic attributes of meaning aren’t lost.

Now you may think you can get around this problem of fluidity or fuzzy boundaries with concepts that are simpler and more abstract, like the concept of a particular quantity (say, a quantity of four objects) or other concepts in mathematics.  But in order to learn these concepts in the first place, like quantity, and then associate particular instances of it with a word, like four, one had to be presented with a number of experiences and infer what was common to all of those experiences (as was the case with redness mentioned earlier).  And this inference (I would argue) involves a probabilistic process as well, it’s just that the resulting probability of our inferring particular experiences as an instance of four objects is incredibly high and therefore repeatable and relatively unambiguous.  Therefore that kind of inference is likely to be sustained no matter what the context, and it is likely to be shared by two individuals with 100% semantic overlap (i.e. it’s almost certain that what I mean by four is exactly what you mean by four even though this is almost certainly not the case for a concept like love or consciousness).  This makes mathematical concepts qualitatively different from other concepts (especially those that are more complex or that more closely map on to reality), but it doesn’t negate their having a probabilistic attribute or foundation.

Looking at the Big Picture

Though this discussion of language and meaning is not an exhaustive analysis of Wittgenstein’s Philosophical Investigations, it represents an analysis of the main theme present throughout.  His main point was to shed light on the disparity between how we often think of language and how we actually use it.  When we stray away from the way it is actually used in our everyday lives, in one form of social life or other, and instead misuse it such as in philosophy, this creates all sorts of problems and unwarranted conclusions.  He also wants his readers to realize that the ultimate goal of philosophy should not be to try and make metaphysical theories and deep explanations underlying everyday phenomena, since these are often born out of unwarranted generalizations and other assumptions stemming from how our grammar is structured.  Instead we ought to subdue these temptations to generalize and subdue our temptations to be dogmatic and instead use philosophy as a kind of therapeutic tool to keep our abstract thinking in check and to better understand ourselves and the world we live in.

Although I disagree with some of Wittgenstein’s claims about cognition (in terms of how intimately it is connected to the external world) and take some issue with his arguably less useful conception of meaning, he makes a lot of sense overall.  Wittgenstein was clearly hitting upon a real difference between the way actual causal relations in our experience are structured and how those relations are represented in language.  Personally, I think that work within philosophy is moving in the right direction if the contributions made therein lead us to make more successful predictions about the causal structure of the world.  And I believe this to be so even if this progress includes generalizations that may not be exactly right.  As long as we can structure our thinking to make more successful predictions, then we’re moving forward as far as I’m concerned.  In any case, I liked the book overall and thought that the interlocutory style gave the reader a nice break from the typical form seen in philosophical argumentation.  I highly recommend it!

Predictive Processing: Unlocking the Mysteries of Mind & Body (Part II)

In the first post of this series I introduced some of the basic concepts involved in the Predictive Processing (PP) theory of perception and action.  I briefly tied together the notions of belief, desire, emotion, and action from within a PP lens.  In this post, I’d like to discuss the relationship between language and ontology through the same framework.  I’ll also start talking about PP in an evolutionary context as well, though I’ll have more to say about that in future posts in this series.

Active (Bayesian) Inference as a Source for Ontology

One of the main themes within PP is the idea of active (Bayesian) inference whereby we physically interact with the world, sampling it and modifying it in order to reduce our level of uncertainty in our predictions about the causes of the brain’s inputs.  Within an evolutionary context, we can see why this form of embodied cognition is an ideal schema for an information processing system to employ in order to maximize chances of survival in our highly interactive world.

In order to reduce the amount of sensory information that has to be processed at any given time, it is far more economical for the brain to only worry about the prediction error that flows upward through the neural system, rather than processing all incoming sensory data from scratch.  If the brain is employing a set of predictions that can “explain away” most of the incoming sensory data, then the downward flow of predictions can encounter an upward flow of sensory information (effectively cancelling each other out) and the only thing that remains to propagate upward through the system and do any “cognitive work” (i.e. the only thing that needs to be processed) on the predictive models flowing downward is the remaining prediction error (prediction error = predictions of sensory input minus the actual sensory input).  This is similar to data compression strategies for video files (for example) that only worry about the information that changes over time (pixels that change brightness/color) and then simply compress the information that remains constant (pixels that do not change from frame-to-frame).

The ultimate goal for this strategy within an evolutionary context is to allow the organism to understand its environment in the most salient ways for the pragmatic purposes of accomplishing goals relating to survival.  But once humans began to develop culture and evolve culturally, the predictive strategy gained a new kind of evolutionary breathing space, being able to predict increasingly complex causal relations and developing technology along the way.  All of these inferred causal relations appear to me to be the very source of our ontology, as each hierarchically structured prediction and its ability to become associated with others provides an ideal platform for differentiating between any number of spatio-temporal conceptions and their categorical or logical organization.

An active Bayesian inference system is also ideal to explain our intellectual thirst, human curiosity, and interest in novel experiences (to some degree), because we learn more about the world (and ourselves) by interacting with it in new ways.  In doing so, we are provided with a constant means of fueling and altering our ontology.

Language & Ontology

Language is an important component as well and it fits well within a PP framework as it serves to further link perception and action together in a very important way, allowing us to make new kinds of predictions about the world that wouldn’t have been possible without it.   A tool like language makes a lot of sense from an evolutionary perspective as well since better predictions about the world result in a higher chance of survival.

When we use language by speaking or writing it, we are performing an action which is instantiated by the desire to do so (see previous post about “desire” within a PP framework).  When we interpret language by listening to it or by reading, we are performing a perceptual task which is again simply another set of predictions (in this case, pertaining to the specific causes leading to our sensory inputs).  If we were simply sending and receiving non-lingual nonsense, then the same basic predictive principles underlying perception and action would still apply, but something new emerges when we send and receive actual language (which contains information).  With language, we begin to associate certain sounds and visual information with some kind of meaning or meaningful information.  Once we can do this, we can effectively share our thoughts with one another, or at least many aspects of our thoughts with one another.  This provides for an enormous evolutionary advantage as now we can communicate almost anything we want to one another, store it in external forms of memory (books, computers, etc.), and further analyze or manipulate the information for various purposes (accounting, inventory, science, mathematics, etc.).

By being able to predict certain causal outcomes through the use of language, we are effectively using the lower level predictions associated with perceiving and emitting language to satisfy higher level predictions related to more complex goals including those that extend far into the future.  Since the information that is sent and received amounts to testing or modifying our predictions of the world, we are effectively using language to share and modulate one brain’s set of predictions with that of another brain.  One important aspect of this process is that this information is inherently probabilistic which is why language often trips people up with ambiguities, nuances, multiple meanings behind words and other attributes of language that often lead to misunderstanding.  Wittgenstein is one of the more prominent philosophers who caught onto this property of language and its consequence on philosophical problems and how we see the world structured.  I think a lot of the problems Wittgenstein elaborated on with respect to language can be better accounted for by looking at language as dealing with probabilistic ontological/causal relations that serve some pragmatic purpose, with the meaning of any word or phrase as being best described by its use rather than some clear-cut definition.

This probabilistic attribute of language in terms of the meanings of words having fuzzy boundaries also tracks very well with the ontology that a brain currently has access to.  Our ontology, or what kinds of things we think exist in the world, are often categorized in various ways with some of the more concrete entities given names such as: “animals”, “plants”, “rocks”, “cats”, “cups”, “cars”, “cities”, etc.  But if I morph a wooden chair (say, by chipping away at parts of it with a chisel), eventually it will no longer be recognizable as a chair, and it may begin to look more like a table than a chair or like nothing other than an oddly shaped chunk of wood.  During this process, it may be difficult to point to the exact moment that it stopped being a chair and instead became a table or something else, and this would make sense if what we know to be a chair or table or what-have-you is nothing more than a probabilistic high-level prediction about certain causal relations.  If my brain perceives an object that produces too high of a prediction error based on the predictive model of what a “chair” is, then it will try another model (such as the predictive model pertaining to a “table”), potentially leading to models that are less and less specific until it is satisfied with recognizing the object as merely a “chunk of wood”.

From a PP lens, we can consider lower level predictions pertaining to more basic causes of sensory input (bright/dark regions, lines, colors, curves, edges, etc.) to form some basic ontological building blocks and when they are assembled into higher level predictions, the amount of integrated information increases.  This information integration process leads to condensed probabilities about increasingly complex causal relations, and this ends up reducing the dimensionality of the cause-effect space of the predicted phenomenon (where a set of separate cause-effect repertoires are combined into a smaller number of them).

You can see the advantage here by considering what the brain might do if it’s looking at a black cat sitting on a brown chair.  What if the brain were to look at this scene as merely a set of pixels on the retina that change over time, where there’s no expectations of any subset of pixels to change in ways that differ from any other subset?  This wouldn’t be very useful in predicting how the visual scene will change over time.  What if instead, the brain differentiates one subset of pixels (that correspond to what we call a cat) from all the rest of the pixels, and it does this in part by predicting proximity relations between neighboring pixels in the subset (so if some black pixels move from the right to the left visual field, then some number of neighboring black pixels are predicted to move with it)?

This latter method treats the subset of black-colored pixels as a separate object (as opposed to treating the entire visual scene as a single object), and doing this kind of differentiation in more and more complex ways leads to a well-defined object or concept, or a large number of them.  Associating sounds like “meow” with this subset of black-colored pixels, is just one example of yet another set of properties or predictions that further defines this perceived object as distinct from the rest of the perceptual scene.  Associating this object with a visual or auditory label such as “cat” finally links this ontological object with language.  As long as we agree on what is generally meant by the word “cat” (which we determine through its use), then we can share and modify the predictive models associated with such an object or concept, as we can do with any other successful instance of linguistic communication.

Language, Context, and Linguistic Relativism

However, it should be noted that as we get to more complex causal relations (more complex concepts/objects), we can no longer give these concepts a simple one word label and expect to communicate information about them nearly as easily as we could for the concept of a “cat”.  Think about concepts like “love” or “patriotism” or “transcendence” and realize how there’s many different ways that we use those terms and how they can mean all sorts of different things and so our meaning behind those words will be heavily conveyed to others by the context that they are used in.  And context in a PP framework could be described as simply the (expected) conjunction of multiple predictive models (multiple sets of causal relations) such as the conjunction of the predictive models pertaining to the concepts of food, pizza, and a desirable taste, and the word “love” which would imply a particular use of the word “love” as used in the phrase “I love pizza”.  This use of the word “love” is different than one which involves the conjunction of predictive models pertaining to the concepts of intimacy, sex, infatuation, and care, implied in a phrase like “I love my wife”.  In any case, conjunctions of predictive models can get complicated and this carries over to our stretching our language to its very limits.

Since we are immersed in language and since it is integral in our day-to-day lives, we also end up being conditioned to think linguistically in a number of ways.  For example, we often think with an interior monologue (e.g. “I am hungry and I want pizza for lunch”) even when we don’t plan on communicating this information to anyone else, so it’s not as if we’re simply rehearsing what we need to say before we say it.  I tend to think however that this linguistic thinking (thinking in our native language) is more or less a result of the fact that the causal relations that we think about have become so strongly associated with certain linguistic labels and propositions, that we sort of automatically think of the causal relations alongside the labels that we “hear” in our head.  This seems to be true even if the causal relations could be thought of without any linguistic labels, in principle at least.  We’ve simply learned to associate them so strongly to one another that in most cases separating the two is just not possible.

On the flip side, this tendency of language to associate itself with our thoughts, also puts certain barriers or restrictions on our thoughts.  If we are always preparing to share our thoughts through language, then we’re going to become somewhat entrained to think in ways that can be most easily expressed in a linguistic form.  So although language may simply be along for the ride with many non-linguistic aspects of thought, our tendency to use it may also structure our thinking and reasoning in large ways.  This would account for why people raised in different cultures with different languages see the world in different ways based on the structure of their language.  While linguistic determinism seems to have been ruled out (the strong version of the Sapir-Whorf hypothesis), there is still strong evidence to support linguistic relativism (the weak version of the Sapir-Whorf hypothesis), whereby one’s language effects their ontology and view of how the world is structured.

If language is so heavily used day-to-day then this phenomenon makes sense as viewed through a PP lens since we’re going to end up putting a high weight on the predictions that link ontology with language since these predictions have been demonstrated to us to be useful most of the time.  Minimal prediction error means that our Bayesian evidence is further supported and the higher the weight carried by these predictions, the more these predictions will restrict our overall thinking, including how our ontology is structured.

Moving on…

I think that these are but a few of the interesting relationships between language and ontology and how a PP framework helps to put it all together nicely, and I just haven’t seen this kind of explanatory power and parsimony in any other kind of conceptual framework about how the brain functions.  This bodes well for the framework and it’s becoming less and less surprising to see it being further supported over time with studies in neuroscience, cognition, psychology, and also those pertaining to pathologies of the brain, perceptual illusions, etc.  In the next post in this series, I’m going to talk about knowledge and how it can be seen through the lens of PP.