Draft

A Question-answering Model of LLM Capabilities

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

This note treats chatbots as lowering the cost of access to existing public knowledge. A chatbot is essentially a database of questions for which the answer already exists in the public domain (and hence in the LLM training data). It implies that people will consult a chatbot when they encounter a question for which (i) they do not know the answer, and (ii) they expect that someone else does know the answer, and has documented it in the public domain.

The model expresses a very common view of LLMs, but has not been stated as explicitly before as far as I am aware. LLM chatbots clearly do many things which would not normally be described as answering questions (drawing pictures, drafting text, editing text, writing code), however the majority share of chatbot queries appear to be asking for information (). The core predictions can be expressed assuming a discrete set of questions, but I also derive predictions from a fuller model which allows for the user and chatbot to interpolate between answers among different related questions.

This model gives a variety of predictions about when chatbots will be used:

The model also gives us basic predictions about the equilibrium effect of chatbots:

We can compare this to a few existing models of AI:

We also describe a model in which each question is a vector, based on an earlier model of LLMs written for a different purpose, .

  Suppose each question $\bm{q}$ is a vector of binary characteristics, and the true answer is a scalar, $a$, determined by a set of unobserved weights, $\bm{w}$, with $a=\bm{w}\bm{q}$. The human guesses the answer to the new question by interpolating among previously-seen questions and answers ($(\bm{q}^i,a^i)_{i=1,\ldots,n}$). They can also consult a chatbot which answers questions in the same way, but with a different set of previously-seen questions (i.e. the chatbot's training data). We can then give a crisp closed-form expression for the expected benefit of consulting a chatbot, based on the relationship between the question $q$, the human's experience $Q_1$, and the chatbot's experience, $Q_2$. We expect a chatbot to be consulted when you encounter a question that has components which fall outside the space formed from by your knowledge set (questions for which you know the answer), but which fall inside the space formed by the chatbot's knowledge set.

We can make some conjectures about adoption by occupation and by task:

% Chatbots is more likely to be used for domains with higher dimensionality (\(p\)). %

Chatbots are more likely to be used for domains with lower dimensionality, as this reduces the cost of specifying the question.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Here we define the simplest model: given some question you will consult the chatbot if and only if (1) you do not know the answer to this question; (2) the chatbot does know the answer to this question.

Suppose you are confronted by a question (\(q\in\mathcal{Q}\)), and you have to supply an answer, \(\hat{a}\in\mathcal{R}\).

The chatbot’s set of prior questions-observed is \(\bm{Q}_1\subseteq \mathcal{Q}\), the user’s set is \(\bm{Q}_2\subseteq \mathcal{Q}\), with the composition of both sets public knowledge (i.e. you know the chatbot knows the answer to each question, without you knowing what that answer is).

It is clear that the user will consult the chatbot if and only if \(q\in Q_1\) and \(q\not{\in}Q_2\). In the vector model below, this discrete rule is replaced by a continuous analogue: you benefit from consulting the chatbot when the components of the question that lie outside your own experience overlap with the subspace spanned by the chatbot’s experience.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The state of the world is defined by a vector of \(p\) unobserved parameters, \(\bm{w} \in \mathbb{R}^p\). A question is a vector of \(p\) binary features, \(\bm{q} \in \{-1, 1\}^p\). The true answer to a question \(\bm{q}\) is a scalar \(a\) determined by the linear relationship: [a = ’ = _{k=1}^p q_k w_k]

There is a set of agents, indexed by \(i \in \mathcal{I}\). Each agent \(i\) possesses an information set \(\mathcal{D}_i\), which consists of \(n_i\) questions they have previously encountered, along with their true answers. We can represent this information as a pair \((\bm{Q}_i, \bm{a}_i)\):

All agents share a common prior belief about the state of the world, assuming the weights \(\bm{w}\) are drawn from a multivariate Gaussian distribution: [ N(, )] where \(\Sigma\) is a \(p \times p\) positive-semidefinite covariance matrix. A common assumption we will use is an isotropic prior, where \(\Sigma = \sigma^2 \bm{I}_p\) for some scalar \(\sigma^2 > 0\). This implies that, a priori, the weights are uncorrelated and have equal variance.

Given their information set \(\mathcal{D}_i\), agent \(i\) forms a posterior belief about \(\bm{w}\). When a new question \(\bm{q}_{\text{new}}\) arises, the agent uses their posterior distribution to form an estimate of the answer, \(\hat{a}_{\text{new}} = \bm{q}_{\text{new}}' \mathbb{E}[\bm{w} \mid \mathcal{D}_i]\).

Throughout the analysis below we make two simplifying assumptions. First, observations are noiseless: when an agent has seen a question before, they observe its exact true answer, so that \(\bm{a}_i = \bm{Q}_i\bm{w}\). Second, the matrices \(\bm{Q}_i\Sigma\bm{Q}_i^{\top}\) (and, under an isotropic prior, \(\bm{Q}_i\bm{Q}_i^{\top}\)) are invertible, so that the posterior expressions are well defined. Both assumptions can be relaxed (for example by allowing noisy answers), at the cost of slightly more involved algebra but with the same basic geometry: what matters is how a new question projects onto the subspaces spanned by past questions.

The intuition behind Proposition 5 is straightforward:

More precisely:

Put informally: consultation becomes valuable when there is overlap between agent 2’s knowledge gaps and agent 1’s strengths.

In the context of the ChatGPT model, this suggests that an AI assistant is most valuable for questions where:

We can characterize the conditions for the three possible actions:

Suppose there is some small \(\varepsilon\) cost to delegating a question to the chatbot, and some slightly larger \(\delta>\varepsilon\) cost to consulting the chatbot first, and then giving a human-adjusted answer. Then we can characterize the conditions for the three regions:

\[S_{1\to 2} = \bm{q}'(\bm{I}-\bm{P}_2)\bm{P}_1^{\top}\bm{q} \quad\text{(chatbot has info the human lacks)}\] \[S_{2\to 1} = \bm{q}'(\bm{I}-\bm{P}_1)\bm{P}_2^{\top}\bm{q} \quad\text{(human has info the chatbot lacks)}\]

\end{document}