New Metric Measures AI Performance Against Human Tasks

|

The rapid advancement of artificial intelligence (AI) has created a pressing need for more accurate ways to evaluate its capabilities in comparison to human performance. In response, researchers have developed the Task Completion Time Horizon (TCTH), a new metric that measures how long a person would take to complete tasks that an AI model can solve with a 50% success rate.


This innovative approach offers a more realistic and contextualized assessment of AI performance, addressing the limitations of traditional benchmarks that focus on isolated skills rather than long-duration, complex tasks.


How the TCTH Metric Works


The study, titled Measuring AI Ability to Complete Long Tasks, analyzed 170 real-world tasks across fields such as programming, cybersecurity, and machine learning. These tasks, many of which require hours of human effort, provide a clear empirical benchmark for evaluating AI capabilities.


The TCTH methodology operates on a simple but powerful principle: if an AI model can successfully complete 50% of the tasks within a certain timeframe, that time is considered its "horizon." For instance, if an AI model solves half of the tasks that a human would complete in 60 minutes, its task completion horizon is set at one hour.


This approach not only enables comparisons between different AI models but also tracks their progress over time based on human reference points.


Key Findings from the Study


Researchers observed a clear trend: AI performance declines as task duration increases. For example, GPT-2 failed to complete any task that required more than one minute of human effort. In contrast, the Claude 3.7 Sonnet model, launched in 2025, successfully completed half of the tasks that take humans an average of 59 minutes to finish.


One of the most striking findings of the study is that the AI task completion horizon has doubled every seven months since 2019. Moreover, in 2024, the rate of progress accelerated, with horizons doubling every three months. This suggests that AI capabilities are improving at an exponential pace.


Implications and the Future of AI Evaluation


The introduction of the TCTH represents a major breakthrough in AI performance assessment. Unlike traditional benchmarks, which often become outdated or fail to measure real-world capabilities, the TCTH provides a more comprehensive perspective on AI efficiency in long-duration tasks.


This metric has significant implications for industries such as education, cybersecurity, and software development, where AI’s ability to handle extended tasks is crucial. Additionally, it allows researchers and developers to determine which models are approaching or surpassing human-level performance in specific domains.


As AI continues to evolve, metrics like the TCTH will be essential for understanding its progress, limitations, and potential applications in the real world.




Source: Infobae

Comentarios

Related Articles

AI jobs
Tech

As artificial intelligence reshapes the labor market, skills once overlooked are becoming critical assets for employability and business performance. This shift is redefining how education systems and employers identify and develop talent.

Comment
Dolar americano yen japones
Business

Market signals point to increased coordination between U.S. and Japanese authorities on currency volatility, moving global exchange rates and asset strategies. Traders worldwide are recalibrating positions amid uncertainty over formal intervention timing and scope.

Comment
Artificial intelligence
Business

ChatGPT’s strategic prompts are reshaping entrepreneurial education by enabling the creation of automated, high-revenue business models without traditional employee structures. This trend highlights the growing role of AI as both an instructional and operational tool for founders and business learners.

Comment