Generative AI – powerful surveillance and profiling tool

Generative AI can act as powerful aggregation and profiling systems, turning many small, seemingly harmless inputs into detailed, identifiable user profiles

pexels-murry-30255045

Several EU national data protection authorities and EU bodies have published guidance on generative AI (privacy, security, governance and workplace use). Examples include: European Data Protection Board (EDPB),
Information Commissioner’s Office (UK), CNIL (France), IMY (Sweden) and other various national DPAs and supervisory authorities. They have issued discussion papers or practical advice on model training, anonymisation, and workplace use.

The Guidance indicates that, as generative AI tools become increasingly integrated into employees’ day-to-day workflows, their use should be assessed not only in terms of efficiency and speed, but also from the perspectives of personal data protection, information security, trade secret protection, decision quality, and corporate governance.

Shadow AI: the growing risk of uncontrolled use within organizations

One of the key concepts highlighted in the guidelines is “Shadow AI.” This term refers to the use of generative AI tools by employees in business processes without the organization’s knowledge or control. Such uncontrolled use may give rise to risks particularly in relation to accountability, the protection of trade secrets, and information security, including personal data.

Let’s imagine the HR lead, Marta, who spread the note across two Slack channels: “Marketing brainstorm tool – free prompts for campaign ideas. Try it and drop one favorite prompt.” The link led to a clean, simple page offering categorized prompt packs and a cheerful chat widget that asked for the team’s industry and biggest challenge “so the prompts fit better.” Excited junior staff pasted internal campaign briefs and topline metrics into the chat to see tailored prompts. Within weeks, the marketing calendar filled with ideas that matched the company’s next-quarter push almost uncannily.

The page logged every submitted prompt and attached files, then funneled that data into an internal analytics dashboard used to seed competitor strategy sessions. The fallout was immediate: a preemptive rival campaign that undercut a planned product launch, and an internal investigation that revealed multiple employees had used other “free” tools from unknown vendors.

Even a quick data cleaner is a threat for secret information

Another story – Daniel, a junior analyst, was swamped converting a messy export of supplier data into a clean CSV for a quarterly budget review. A teammate shared a link in their chat: “Quick CSV cleaner – paste your data and get a download!” The page looked professional and offered a live-preview area labelled “Paste raw text here.” Under pressure and running late, Daniel pasted a block that included names, contract values, vendor contact emails and a supplier bank account reference – information he assumed was harmless and only intended for internal use.

The service returned a perfectly formatted CSV within seconds. Daniel downloaded it and uploaded the file to the shared drive used by the finance team. Over the next week a recruiting firm sent an email referencing one supplier’s unusually high payments; a vendor called to ask why a payment reference had been exposed.

Quick fix had leaked trade-sensitive contract values and personal contact details into a service outside company control, creating compliance risk and reputational exposure. It happens in thousands or millions of companies every day. Useful tools can be traps – never paste secrets into unknown services (and known as well), even when short on time. This approach suggests that companies should focus not only on whether AI is being used, but also on which tools are used, by whom, for what purposes, and what categories of data are involved.

Risks relating to personal data, trade secrets and sensitive information

Uncontrolled use of generative AI tools may give rise to significant risks not only in relation to personal data, but also with respect to trade secrets, intellectual property rights and other sensitive corporate information. Sharing materials such as source code, product designs, business strategies, internal correspondence, human resources data and customer files with external AI tools may weaken organizational control over such information.

Published guidelines endorse anonymisation/pseudoanonymisation where feasible. The employees shouldn’t provide identifiable data in prompts at all. Some cases (but not on an employee level) would need identifiable data to work correctly (e.g., personalized customer support, HR case handling, fraud detection, medical or legal advice tied to an individual). Anonymisation and pseudoanonymisation must remove all personal data while keeping context needed for accurate and relevant results.

Employees must be guided to include data strictly necessary for the task in prompts or uploads. The good approach is to implement prompt‑level controls (e.g., blocking fields) and role‑based restrictions so staff cannot submit unnecessary personal or proprietary information. Explicitly forbid entering sensitive categories (special category personal data, trade secrets, source code, unreleased designs, customer databases) into third‑party models unless exceptional controls are in place.

Generative AI – powerful surveillance tool

When misused, generative AI can act like a powerful surveillance or exfiltration tool. Employees pasting confidential files or prompts with personal identifiers into external models can leak trade secrets and personal data. Models trained or fine‑tuned on collected inputs can unintentionally retain or expose sensitive information.

Malicious prompts can coax models to reveal training data or infer sensitive attributes from innocuous inputs.
Integrated AI agents could continuously analyze communications or documents, creating pervasive profiling without proper notice or legal basis.

We had employees who used attractive free prompt tools to quickly solve work tasks, but those pages harvested sensitive strategic and personal data which competitors or third parties later exploited. Multiple employees across companies pasted different sensitive data sets (medical visits, medicines, groceries, travel) into the same popular prompt tool using a shared identifier, enabling the service to stitch those inputs into a single, identifiable profile. Not realistic? And what most of us are doing in ChatGPT or Claude? That aggregation created a privacy and security breach with serious re-identification, compliance, and competitive risks. And the question is how much we can trust ChatGPT or Claude.

When widely used without controls, generative AI tools can unintentionally act as powerful aggregation and profiling systems, turning many small, seemingly harmless inputs into detailed, identifiable user profiles. Organizations should treat prompt-entry points as potential data-collection vectors and enforce strict policies, vetted internal tools, and monitoring to prevent cross-source re-identification and misuse.

Use private/on‑premise models or enterprise offerings that prohibit reuse of uploaded data. It’s the only healthy solution.