We used 10 top LLMs to estimate domestic worker job displacement by AI Humanoids in the US, UK & Germany

How will AI humanoids affect house helps and domestic worker jobs?

Probably not the typical question a research firm would ask, but we asked this question to 10 LLMs, and the responses were interesting. Absolutely fascinating, with most LLMs acknowledging that AI will affect jobs of domestic workers. Some LLMs went through the trouble of combing through hundreds of sources (one even clocked 180 sources), and unearthed data which... in our wildest dreams, we wouldn't even have thought of considering. Whereas other LLMs were a lot more conservative and restrained, relying mostly on official sources. But what separated the best responses, from the less interesting ones was the 'thought process'.

LLMs that have elucidation or "thinking" in their responses, and for whom that reasoning is observable appeared to provide more convincing estimates. Then there were some AI's whose responses were somewhat lazy, or at best perfunctory. At least one LLM produced a hallucination. But even if you didn't know much about domestic workers, most keen eyed folks would probably have identified the hallucination as an outlier - a clear anomaly. Yet this LLM didn't seem to notice it, and had to be nudged into correcting it's mistake. Such are the limitations in these models.

Meta's Llama 3.1

Meta Llama 3.1 was the fastest and most concise, and produced a response in under 30 seconds. It quoted data from the US Bureau of Labor Statistics (BLS), UK's Office for National Statistics (ONS) and The German Federal Statistical Office (Destatis). But it was also the least speculative and appeared to rely less on computing it's own estimates, and more on averages (e.g. 'assuming a midpoint of 1.65 million' ). Llama made at least one basic error, which it corrected when quizzed. Further responses to requests of clarification were clear and easy to understand.

Claude 3.7 Sonnet by Anthropic

Claude was also fast and broke down the house help job roles into several occupations. It also used figures from the UK's Office for National Statistics (ONS) and The German Federal Statistical Office (Destatis) but arrived at different results, probably a prompt limitation. It's notable that among the largest LLMs Claude was the least transparent in terms of listing sources ('Informal/undocumented domestic workers are estimated to comprise 16-22% of the total domestic workforce in the US'). But it provided details of the source of the data once asked. It was also great at giving context regarding the accuracy of the data. Claude also noticed Dual categorization (Some workers may be counted in multiple categories...a person who provides both cleaning and childcare), and was one of only a few LLMs to state that that some workers serviced multiple households. Claude also flagged regional variations (Regional variations are substantial, with higher concentrations in states like California, New York, Texas, and Florida)

Open AI's GPT-4 Turbo

GPT-4 initially failed to reference to public data that's available from the BLS, ONS or Destatis. It also seemed too overly focused on "migrant domestic workers", which is not true of all domestic workers. But it correctly identified women as the majority of domestic workers. In order to obtain the data we sought, it took a follow-up prompt to point it to the three above data sources, for it to recalculate its estimates. GPT-4 had the second largest total estimate at 7 million jobs (before being pointed to the data sources, it initially estimated the total as 5.5 million). Interestingly, it arrived at the 7 million figure partly using the same data which other LLMs had used, even though they had all each declared individual estimates of around (or just under) 4 million jobs. But GPT-4 was a lot better at understanding the demographics ('significant number of Filipino domestic workers in the UK'). It's response about the percentage of German households who employ a domestic worker was in line with official figures(9%). And it seemed to be making a more concerted effort for the user to understand the overall picture - beyond the data that was being requested (this informal status often excludes them from labour protections and makes accurate data collection difficult). GPT-4 was the only LLM to make Policy recommendations in its response ('strategies to support displaced workers, including retraining programs, legal protections, and pathways to formal employment to mitigate the socioeconomic impact of such technological advancements').

xAI's Grok 3 (beta)

Grok 3 was phenomenal in responding to our query.

It's probably one of the best at handling these sort of prompts. It spent a fair bit of time in analysing the questions (the "DeepSearch") and went through 180 web pages, after the initial search comprising 90 pages. It flagged up the problem of under-counting early on, and was the first to use and reference to data from the Trades Union Congress (TUC) and the Economic Policy Institute (EPI). But it was in the detailed analysis where the gems were to be found. A whole section titled Methodology & Challenges broke down the difficulty in computing the estimates that we were looking to find. Grok was also the first to highlight data from the National Domestic Workers Alliance (NDWA) and the American Community Survey (ACS). In addition to UK figures from the ONS, Grok also cited data from the Business Register and Employment Survey (BRES) ('shows 108,000 employees in "private Households with Employed Persons" in 2023'), which makes it some of the most recent data across all estimates by all the LLMs. Regarding Germany, Grok highlighted data from The University of Kassel, and qualified those findings with similar data from The German Trade Union Federation (DGB). The other LLMs largely ignored our instructions relating to language and cultural considerations, but Grok picked on that stating The user noted that Spanish and German terms for domestic workers (e.g., "Hausangestellte" in German, "empleada doméstica" in Spanish). It also acknowledged the restrictions and exclusions we specified. Grok then provided a brief critique of its findings, referencing to data from the International Labour Organisation (ILO) to fact-check the UK and US figures, and to explain the mismatch on the figures from Germany.

DeepSeek- R1

A lot has been written about DeepSeek-R1 in recent weeks. And many people agree that the thoroughness with which it tackles questions is a breath of fresh air. Recently, we've found DeepSeek to be a bit slow ("Server busy, try again later"). But when it's accessible, it's still very efficient. Spending about a minute to analyse the question ("DeepThink") this bit of that process in particular deserves replication here: For high-income countries, maybe 3-5% of workforce? US workforce is about 160 million, so 3% would be 4.8M, but that seems high. Probably domestic workers are around 1-2% in developed countries. US: 1.5% of 160M is 2.4M. UK: 1% of 33M workforce is 330K. Germany: 1% of 45M is 450K. Total around 3.18 million. But this is a rough estimate. Not only did it's estimates correlate with and corroborate the figures found but the other LLMs, but it also highlighted certain unique insights (Migrant workers from Eastern Europe might be part of the informal sector.) which the other LLMs did not pick on, and part-time roles in Germany (Putzkräfte). As well as how some Part-time workers (e.g., 2-hour weekly cleaners) may not self-identify as domestic workers. DeepSeek also correctly pointed out that adoption [of humanoid robots] will depend on cost, cultural acceptance, and task complexity. It's estimate of between 2.75 million to 3.2 million is statistically similar to the estimates of most of the other LLMs, although it is less than half of GPT-4's 7 million estimate.

One downside was an unconvincing response when questioned about it's percentage estimates ('The 1–2% range reflects structural constraints in developed economies, but it’s a conservative estimate. For your robotics impact analysis, focus on task-based displacement (e.g., cleaning vs. childcare) rather than total worker numbers, as adoption will vary by chore complexity and cost.')

What does that even mean?

Google Gemini 2.0 Pro

Google Gemini was a bit disappointing. And not because it can't find the data we were looking for. Instead, it seems as if it oversimplifies or redacts the responses to certain prompts, which may be great for certain kinds of conversations, but is altogether unhelpful when one is looking for data rich information from multiple sources.

Gemini started with a breakdown of how it would approach the task. And included facts which the others didn't touch upon ('There is a large informal economy, so this number will be much higher'.). Gemini placed the estimate at 'between 4 and 10 Million people' without showing any breakdown of the data. This approach is probably suitable for someone who just wants a number. But it does nothing for those to whom transparency and methodology is important. There were too many mentions of 'into the millions' or 'in the millions' or 'in the hundreds of thousands', or 'low millions ' or 'several millions' without giving a numerical estimate, which is a bit shoddy.

It took several other follow-up prompts before Gemini produced a clear breakdown of the figures.

It listed the impact of humanoids and recommended further research that should include in-depth social media analysis, surveys targeting households that employ domestic workers, studies on automation's impact on the service sector, and investigation into the growth of the home robotics industry.

Mistral's Le Chat

Mistral's Le Chat boldly began it's response with Python code:

It then went into the process of computing the estimate, considering various factors like under-counting, informal employment, global estimates, future growth, social media and online footprint. Some of these considerations are specified in the prompt, but not all of them.

What followed was an estimate of the US figure. Just the US figure, which although correlating with the data from the other LLMs, isn't everything we were looking for.

When asked about the UK and Germany, more Python code followed...

But then Le Chat makes a glaring error, stating incorrectly that in the UK there are 10,627 domestic workers - a figure it says ('using the number of potential victims referred as a proxy'). This figure is most likely from NRM's 2019 report (10,627 potential victims of modern slavery were referred to the NRM in 2019). Le Chat recognises that there is an error in its computation, in particular in its UK figure, but does nothing to rectify it.

Further, it doesn't compute the total estimate at the first prompt (even though our prompt specifically mentions that we need a total estimate) So, it takes another follow-up prompt (a third one) to get Le Chat to compute the total calculation, which it places at 5,469,596 million.

So it takes several prompts to extract estimates that correlate to the findings of the other models.

Comment

A lot to digest here, and a lot more in our brief report on this topic, including responses from several other LLMs (e.g. Qwen 2.5).

The number of domestic workers who are likely to lose their jobs because of humanoid robots will depend on a number of factors. Key considerations will include the cost of the humanoids relative to worker wages, technological capabilities, cultural acceptance, personal preferences, economic conditions and regulatory frameworks. Further, with the current cost of living challenges in most parts of the world, it may be a while before large numbers of people who regularly use the services of say a cleaner, can afford to switch to a humanoid. So job losses will likely occur gradually rather than immediately, with certain routine tasks being automated before others. This pattern may create a transitional period where domestic workers oversee, instruct or even complement robotic labor in many households before complete automation becomes feasible. It's more likely that a hybrid picture will emerge.

Want to see more?

For full details of our findings, including the prompts used and full responses, and follow up questions for clarification, contact us via our contact page or by sending an email to info@sanrixa.co.uk.