Is personal information in a generative AI output ‘collected’?

Transcript

The issue

So here’s a quick thought for the day: Let’s say you’re thinking about using a generative AI tool to process data you put into a prompt that contains some personal information. You’ve asked the AI tool to provide some type of output. Is it accurate to say personal information contained in the tool’s output – the response to your prompt – is ‘collected’ by you? This is an important question because, if there is a collection, issues may arise under the collection-related information privacy principles, or IPPs, in the Privacy Act.

The question arises (or has arisen in my mind) because of something the Office of the Privacy Commissioner has said in its September 2023 guidance on Artificial Intelligence and the IPPs. When talking about collection issues under the IPPs, the guidance says this:

“Any time you seek out or obtain information, you are collecting it.”

(That sentence is not actually 100% correct, but we can ignore that for now.)

The guidance goes on to say:

“In relation to AI tools this could include [among other things]… using an AI tool to write a letter to a person or about a person based on prompts you give it.”

So, the guidance is saying that could amount to a “collection” of personal information.

Not clear there would be a collection

To my mind, it’s by no means clear that this would amount to a collection of personal information. As usual, much will depend on context, but I think there could be many situations in which an AI tool provides an output that summarises or reorganises personal information in a prompt, including setting it out in a draft letter that the AI tool creates in response to your prompt, that does not involve any collection of personal information. In many situations, you’ll simply be transferring personal information to the AI tool by including it in your prompt, and the AI tool will be returning it to you in some shape or form but without doing anything that could reasonably be considered a ‘collection’ for Privacy Act purposes.

Analogues

We can also consider this issue by reference to some factual analogues. If I use a macro in a cloud-based version of Microsoft Excel to manipulate tabular data that contains personal information, I am not collecting any personal information. I’m just processing what I already have in software that’s been available for decades. Similarly, if I use an online contract automation tool to produce a draft contract, and I include the names and addresses of individual party representatives in my form inputs which are then inserted into relevant places of the draft contract together with a description of their roles, again I am not collecting any personal information. I’m just processing what I already have. And so too, in my view, when I get an AI tool to process personal information provided in my prompt, for example to insert into a draft letter the tool creates for me, I am not collecting personal information. Again, I’m just using the tool to process what I already have.

So that’s my initial reaction.

Privacy Act’s definition of ‘collect’

We do need to ask, though, is what I’m saying consistent with the Privacy Act’s definition of the word ‘collect’? I think it is. This is what the Act says, and I’m quoting here:

“collect, in relation to personal information, means to take any step to seek or obtain the personal information, but does not include receipt of unsolicited information”

In my view, in the kinds of situations I described earlier, including the example of an AI tool providing a draft letter based on prompts I might provide, I am not taking steps to seek or obtain personal information. I already have it. I am merely using the information I already have to generate a particular output.

Perhaps OPC has something else in mind

Now I acknowledge that the OPC guidance says that using an AI tool to write a letter to a person or about a person based on prompts you give it “COULD” amount to a collection of personal information, rather than “WOULD” amount to a collection of personal information. And I also acknowledge that OPC may have a scenario in mind that’s different from my take on what the guidance says. For example, you might enter a bunch of factual information about a person into a prompt and then ask the AI tool to draw conclusions about the person based on its pre-existing knowledge. In that scenario, you would be adding to the corpus of information you have about that person and so you might well be ‘collecting’ personal information.

What guidance intended remains unclear

The fact remains, though, that precisely what the guidance intended to say on this point – in relation to activities like an AI tool’s drafting of letters based on prompts you give it – is not clear. And I note here that, in addition to giving the letter drafting example we’ve been discussing, the guidance gives a separate example, right after it, of asking an AI system a question that generates information about that person. In other words, it contrasts something like drafting a letter based on prompts you provide, with the situation where you ask the AI tool to generate information about a person.

As I said earlier, this is an important issue, because if there were a ‘collection’ in the kind of situation I’m looking at here – getting an AI tool to write a letter based on personal information you already have and include in a prompt – then issues would or could arise under various of the collection-related IPPs, and that in turn could affect your privacy impact assessment. You would need to think about IPP1, IPP2, if the Privacy Amendment Bill before Parliament is enacted, IPP3A, and potentially IPP4.

So I respectfully suggest that this is an issue on which it would be helpful to have greater clarity from OPC. Either that, or we need to be careful not to read statements like the ones I read out earlier as meaning there would necessarily be a collection. In many instances, in my view, there would not. I think we need to clearly distinguish between situations where an AI tool is only processing information we provide in a prompt to produce or refine a draft output without producing any new personal information, versus situations where the AI tool is being asked to build upon or supplement the personal information the user already holds and provides in the prompt.

Now, I don’t mean to suggest that this distinction will always be clear cut. Sometimes it will be, but sometimes it may not be. If it’s not clear cut, the user of the AI tool may wish to err on the side of caution and assume there will be a collection.

Parting reflection

Now I’m just going to stand back from the detail of what I’ve been discussing, to make the point that this is an example of nuanced questions that can arise when we assess various uses of AI tools against the Privacy Act’s information privacy principles. You know, we are going to have to grapple with questions like these, and I think we need to appreciate that they will not always be straight-forward, and they will not always be clear cut. I do think we need more guidance on them, and the more that lawyers and privacy professionals can comment on them the better.

Sign up to our newsletter

Your personal information be handled in accordance with our privacy statement.

Is personal information in a generative AI output ‘collected’?

Transcript

Sign up to our newsletter

You may be interested in

Privacy Commissioner’s compliance notice to Oranga Tamariki… Wow!

Information sharing MOUs and generative AI tools

An insightful judgement on s 24 of the Privacy Act: H v AG

Legal stuff

Get in touch