In the first post in this series I re-introduced the basic concepts underling the Predictive Processing (PP) theory of perception, in particular how the brain uses a form of active Bayesian inference to form predictive models that effectively account for perception as well as action. I also mentioned how the folk psychological concepts of beliefs, desires and emotions can fit within the PP framework. In the second post in this series I expanded the domain of PP a bit to show how it relates to language and ontology, and how we perceive the world to be structured as discrete objects, objects with “fuzzy boundaries”, and other more complex concepts all stemming from particular predicted causal relations.
It’s important to note that this PP framework differs from classical computational frameworks for brain function in a very big way, because the processing and learning steps are no longer considered separate stages within the PP framework, but rather work at the same time (learning is effectively occurring all the time). Furthermore, classical computational frameworks for brain function treat the brain more or less like a computer which I think is very misguided, and the PP framework offers a much better alternative that is far more creative, economical, efficient, parsimonious and pragmatic. The PP framework holds the brain to be more of a probabilistic system rather than a deterministic computational system as held in classical computationalist views. Furthermore, the PP framework puts a far greater emphasis on the brain using feed-back loops whereas traditional computational approaches tend to suggest that the brain is primarily a feed-forward information processing system.
Rather than treating the brain like a generic information processing system that passively waits for new incoming data from the outside world, it stays one step ahead of the game, by having formed predictions about the incoming sensory data through a very active and creative learning process built-up by past experiences through a form of active Bayesian inference. Rather than utilizing some kind of serial processing scheme, this involves primarily parallel neuronal processing schemes resulting in predictions that have a deeply embedded hierarchical structure and relationship with one another. In this post, I’d like to explore how traditional and scientific notions of knowledge can fit within a PP framework as well.
Knowledge as a Subset of Predicted Causal Relations
It has already been mentioned that, within a PP lens, our ontology can be seen as basically composed of different sets of highly differentiated, probabilistic causal relations. This allows us to discriminate one object or concept from another, as they are each “composed of” or understood by the brain as causes that can be “explained away” by different sets of predictions. Beliefs are just another word for predictions, with higher-level beliefs (including those that pertain to very specific contexts) consisting of a conjunction of a number of different lower level predictions. When we come to “know” something then, it really is just a matter of inferring some causal relation or set of causal relations about a particular aspect of our experience.
Knowledge is often described by philosophers using various forms of the Platonic definition: Knowledge = Justified True Belief. I’ve talked about this to some degree in previous posts (here, here) and I came up with what I think is a much better working definition for knowledge which could be defined as such:
Knowledge consists of recognized patterns of causality that are stored into memory for later recall and use, that positively and consistently correlate with reality, and for which that correlation has been validated by empirical evidence (i.e. successful predictions made and/or goals accomplished through the use of said recalled patterns).
We should see right off the bat how this particular view of knowledge fits right in to the PP framework, where knowledge consists of predictions of causal relations, and the brain is in the very business of making predictions of causal relations. Notice my little caveat however, where these causal relations should positively and consistently correlate with “reality” and be supported by empirical evidence. This is because I wanted to distinguish all predicted causal relations (including those that stem from hallucinations or unreliable inferences) from the subset of predicted causal relations that we have a relatively high degree of certainty in.
In other words, if we are in the game of trying to establish a more reliable epistemology, we want to distinguish between all beliefs and the subset of beliefs that have a very high likelihood of being true. This distinction however is only useful for organizing our thoughts and claims based on our level of confidence in their truth status. And for all beliefs, regardless of the level of certainty, the “empirical evidence” requirement in my definition given above is still going to be met in some sense because the incoming sensory data is the empirical evidence (the causes) that support the brain’s predictions (of those causes).
Objective or Scientific Knowledge
Within a domain like science however, where we want to increase the reliability or objectivity of our predictions pertaining to any number of inferred causal relations in the world, we need to take this same internalized strategy of modifying the confidence levels of our predictions by comparing them to the predictions of others (third-party verification) and by testing these predictions with externally accessible instrumentation using some set of conventions or standards (including the scientific method).
Knowledge then, is largely dependent on or related to our confidence levels in our various beliefs. And our confidence level in the truth status of a belief is just another way of saying how probable such a belief is to explain away a certain set of causal relations, which is equivalent to our brain’s Bayesian prior probabilities (our “priors”) that characterize any particular set of predictions. This means that I would need a lot of strong sensory evidence to overcome a belief with high Bayesian priors, not least because it is likely to be associated with a large number of other beliefs. This association between different predictions seems to me to be a crucial component not only for knowledge generally (or ontology as mentioned in the last post), but also for our reasoning processes. In the next post of this series, I’m going to expand a bit on reasoning and how I view it through a PP framework.
Good stuff Lage. I too have become very interested in this framework and have for some time now been sitting on an unfinished draft post relating it to reasoning. I will be very interested to see how you approach the topic.
For this post, I wanted to see if you think we’re in agreement on the interpretation of beliefs and priors under PP. I have come to interpret the relation in terms of prediction error (or surprise, as some like to call it). Specifically, I want to say that our credence in a belief (i.e., the level of confidence we have for a given belief, or more formally, our Bayesian prior) is effectively the amount of prediction error that would be present if our experience were to run counter to the prediction corresponding with the belief. This then means that the subjective perception of credence is akin to running that belief through a mental simulation where it fails to hold true, and then measuring and reporting the prediction error. What do you think?
Hi Travis,
Thanks for commenting!
I don’t think this is quite right. I would say that the Bayesian prior is the confidence in the belief, and the higher this value is, the more prediction error is needed to overcome it. If the source of the prediction error is trusted (which is going to depend on the context we infer), then the precision weighting of the prediction error (some call this the gain on prediction error) is increased. This would mean that even small amounts of prediction error have a much larger chance of overcoming the Bayesian prior of the belief, than in cases where the precision weighting values (synaptic gain) on the prediction error are much lower.
Prediction error is certainly related to the accuracy of one’s beliefs and has the possible effect of overcoming our Bayesian priors (flipping our confidence in the belief upside down), but I don’t think that the prediction error is something that is reportable so much as the Bayesian priors are. If the priors change over time in light of new information or reasoning, then I would think that could be reported, but knowing what the prediction error was (before your priors changed) isn’t likely going to be possible because the gain on the prediction error is unknown and both factors determine how the priors change. I may be misunderstanding your question, so I hope my answer makes sense. If not, we can discuss more to clarify things if it seems like I’m not understanding a point you’re trying to make.
To give a little more info in my response, it is true that one could perhaps establish a Bayesian equivalence between the priors and the amount of prediction error needed to overcome those priors, if one takes precision weighting into account. However, the terminology and translation to subjective experience can be tricky here. On the one hand, if your prior ends up being reduced (due to prediction error) to let’s say 50%, then it’s equally likely as unlikely which seems to be equivalent to having no confidence in the belief. However, if one were to flip their priors from 99% to 1% (or less realistically, from 100% to 0%) based on prediction error, then it’s effectively like saying that you have confidence that the belief is false (not merely a lack of confidence in it being true). Which is why conceptually, these terms are all related even in the way you suggest to some degree, but translating the result to a subjective claim of certainty in a belief requires considering the resulting certainty in the negation of the belief, and other factors that are implicit in the folk psychological concept of belief.
Yes, I was including precision weighting in the measure of prediction error. I agree that it isn’t clear how to translate error into subjective Bayesian reports, but it seems that if you assess each belief in isolation then some of the difficulties you noted go away. Rather than viewing the before and after beliefs as A and ¬A, view them as A and B (even though B is conceptually understood as ¬A). Is this sense, there is then a confidence in A that is in proportion to the weighted error for its non-occurrence, and after the update there is a confidence in B that is in proportion to weighted error for its non-occurrence. The fact that B is conceptually equivalent to ¬A doesn’t have to come into play.
I still don’t think that’s quite right but I do think there is some value in your describing the belief A being replaced by belief B rather than belief A being replaced by belief ¬A. One issue here is that I don’t think B is conceptually equivalent to ¬A as you say since you need to be able to differentiate between not believing something to be true (for example never having the belief or no longer having the belief that “apples are red”) and believing something to be false or more specifically believing a negation (for example, believing that “apples are not red”). It may be possible that a person loses the belief that “apples are red” (belief A), but that this is not replaced with a belief that “apples are not red” (¬A). Instead it may be replaced with the belief “apples are green” (belief B) which may logically imply that they are also “not red” but that’s an extra belief (as I see it anyway) that requires further reasoning. In this case that further reasoning would involve cognitive rules involved in our ontology that prevent more than one element of a certain category of properties from applying to some object at the same time. For example, if one has inferred that apples can only be one color, then that belief combined with the new belief B would likely lead one to form a belief ¬A, since it’s no longer inferred as possible for “apples are green” and “apples are red” to be true at the same time. See what I mean?
And I certainly agree that this is reasonable, once you add the precision weighting factor as a part of this criteria.
Two more things to add to my last comment. It should be noted that the way I’ve been talking about beliefs is highly oversimplified for the purposes of an easier discussion (since a belief like “apples are red” is way oversimplified and contains a lot more information than is described with such a simple phrase). Second, with regard to believing something to be true versus not having a belief about it or believing it to be false, has a lot to do with what kinds of logical restrictions are entailed in the belief. We may be able to better describe a belief “apples are red” as “all apples are red” or “apples are red and only red”, and any other combination of not/and/or relations that may come into play. But one can conceivably have a belief like “apples are red” without it entailing that “apples are red and only red”. It may be that the belief that “apples are red” is supported to remain a viable belief even after encountering an apple that has some green color on it as well. It all depends how discriminatory, restricted, and specific the belief really is. But if somebody is talking about a general belief that’s more of a heuristic, they may still have a high confidence in it, even though if they were to think about it with more specifics in mind, they’d realize that it’s not actually true in every case.
I think I’m generally in agreement. Aside from weighted error, is there any other description that you would postulate for the PP accounting of the subjective experience of credence?
Well as I said before, I’m not sure that I’d describe subjective credence with weighted error (but rather with Bayesian priors instead), though I concede that you could perhaps find some kind of Bayesian equivalence in order to quantify it on some relativistic scale.
Aside from priors, the only other description within a PP framework, may come from the factors constituting those priors. I’m writing about this in my most recent post (a draft for part 4 in this PP-series), for example, the number of associations between any concept/belief and any other concepts/beliefs would serve as a means of increasing the priors. This is because the evidence needed to reduce the probability or overcome a belief should increase if that belief has overlap with other beliefs. I think that the hierarchical predictive structure (and its hierarchical neural implementation) has a lot to do with this. If I have a belief (to re-use my recent example) that “apples are red”, the priors for this belief should increase if I have a huge number of other beliefs that contain this belief implicitly or overlap with it in certain ways (such as the beliefs “I like red apples”, or “I remember eating a red apple 5 days ago”, or “my favorite painting contains a bowl of fruit with red apples”, “I heard that Isaac Newton’s Laws of Gravitation were inspired by a red apple falling to the ground”, etc.). Any number of beliefs could contain some or all of the conceptual structures of other beliefs, and each of these beliefs, experiences, etc., should lead to higher priors for any overlapping conceptual structure found therein.
That makes sense and is largely how I would also describe the underlying structure of credence, but the question is then how that translates to a process which yields the subjective experience in the PP framework (yeah, hard problem). My thought is essentially that those reinforced and overlapping beliefs are part of the scaffolding of the generative model from which the predictions flow, such that the weighted error for a mismatched prediction (even if simulated) is a consequence of those factors you describe. If this is not the predictive process by which we experience those priors, what other process might enable our experience of credence?