heygrc
GDPR data minimisation in code

Collecting more than the purpose needs.

Data minimisation, Art. 5(1)(c), says you should collect and keep only the personal data a purpose actually needs, and no more. In code it breaks whenever a change starts capturing, logging, or sending more personal data than the feature actually requires, which is easy to do because sending the whole object is usually less work than picking the fields you need.

How it shows up in a diff

The shapes the same control failure takes.

Minimisation rarely breaks on purpose. It breaks because the convenient thing carries more than the necessary thing. The recurring shapes:

  • A payload widens to the whole object

    An event, request, or message is changed to send a full user or customer object where only an id and a field or two were needed.

  • A new field captures more than needed

    A form, model, or import starts collecting personal data the feature has no use for, just because it was available.

  • A log line carries personal data

    A debugging log records a whole request or record, putting personal data into a log store that did not need it.

  • A third party receives more than necessary

    An analytics, support, or marketing integration starts receiving fields it does not need, widening who holds the data.

  • Free-text is sent unredacted

    A field that may contain personal data (a note, a message) is forwarded or stored without trimming it to what the purpose requires.

Worked example

An analytics event that ships the whole user.

A product wants to track sign-ups in its analytics tool. The quickest call sends the entire user object as event properties, including email, name, and address, when the analytics only needs an id and a plan.

analytics/track.ts+1 -1
analytics.track("signup", {-  userId: user.id, plan: user.plan,+  ...user, // send everything, filter later})
heygrcGDPR Art. 5(1)(c)

Spreading the whole user object sends email, name, and address to the analytics provider, more personal data than tracking a signup needs, and to a new processor. Art. 5(1)(c) (data minimisation) expects a change to carry only the personal data the purpose actually needs. Send the id and the specific fields the analytics actually uses, not the whole record.

What an auditor does with this

Minimisation is checked at every flow.

A data-protection review looks at what personal data each processing activity collects and sends, and compares it to the stated purpose: anything beyond what is necessary is the finding. New data flows to third parties get particular attention, because they widen who holds the data and can raise transfer questions too. These flows are created in code, one integration or event at a time, which is where the minimisation decision is actually made.

What this is, and is not

A review, not legal advice.

heygrc flags changes that touch data minimisation and cites the article so the fix happens in the pull request. It does not decide your lawful basis or maintain your records of processing. It catches the moment a change starts carrying more personal data than it needs, at the diff. heygrc is in early access.