AI News and Blog Articles
Curated updates from the most trusted sources in artificial intelligence. Stay ahead without the noise.
Top AI News
Hand-picked stories worth reading right now30 articles found

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark
NVIDIA reports that its Blackwell Ultra-based systems lead the first round of AgentPerf, an agentic AI infrastructure benchmark from Artificial Analysis. In the published results, the NVIDIA GB300 NVL72 platform ran up to 20x more agents per megawatt than older NVIDIA systems. The post explains why agentic AI is a fundamentally different workload than single chat completions and how AgentPerf measures real-world agentic performance using coding agent trajectories.
Introducing the Open Knowledge Format
Google Cloud introduced the Open Knowledge Format (OKF), an open specification that turns the emerging LLM-wiki pattern into a portable, vendor-neutral standard. OKF v0.1 represents knowledge as a directory of markdown files with YAML frontmatter and a small set of shared conventions. The goal is to let knowledge written by one producer be consumed by different AI agents without translation, addressing the fragmented context landscape inside most organizations.
The consequences of relying on AI for accurate news
A new open-access MIT Media Lab study found that people who leaned on AI chatbots to fact-check news grew worse at spotting misinformation on their own once the AI was removed. Across four weeks, 67 participants were 21 percent more accurate while assisted, but their unassisted accuracy on fresh news items fell 15 percentage points by week four. Researchers call this the AI dependency paradox, and they compare it to how GPS has dulled our natural sense of direction. About a quarter of participants thought they were improving even as their real performance dropped.
Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
ServiceNow AI researchers built a benchmark to test how well speech recognition systems transcribe code-switched speech, the everyday habit of bilingual people who swap languages mid-sentence. They ran seven frontier ASR systems across 918 synthetic utterances covering four language pairs. ElevenLabs Scribe V2 produced the best transcription accuracy, while OpenAI Whisper Large V3 Turbo finished last and often translated the speech instead of transcribing it.
The best Docusign alternatives in 2026
More than 20 Docusign alternatives compete in 2026, ranging from free signing apps like SignWell to enterprise tools like Adobe Acrobat Sign and open-source platforms such as DocuSeal. Because all of these methods produce legally binding signatures under US and EU law, the right pick comes down to your signing volume, budget, and the software your team already uses. Many tools offer free tiers covering three to five documents a month, while pay-as-you-go options charge per document instead of a flat subscription.

New Server Hopes to Break Through AI's "Memory Wall"
Memory is arguably the most serious constraint on modern AI large language models (LLMs). According to one influential paper, LLM token generation is an inherently memory-bound task, meaning the rate at which models output text is limited by how quickly data can be read in from memory. The severity of this bottleneck grows with model size. This creates a "memory wall" that holds back LLM inference performance. AI hardware startup Majestic Labs is taking a direct-and comprehensive-approach to solving this problem. It's developing a new AI server, Prometheus, with up to 128 terabytes of memory.
Evolving Dataflow to process massive datasets for machine learning
Google created MapReduce more than 20 years ago to solve the scaling problems in data processing that the then young company was running into. The AI era that we are in now demands efficient, large-scale data processing for everything from training frontier models like Gemini by Google DeepMind to powering fully autonomous vehicles like Waymo. Many aspects of machine learning, including data ingestion, transformation, and feature extraction, rely heavily on processing massive datasets. To meet this astronomical scale required by efforts across Google, we evolved our data platform, Flume, the s

AI Rings on Fingers Can Interpret Sign Language
Electronic rings wirelessly connected to an AI system are capable of translating multiple sign languages into text, a new study finds. "I believe this is an important step toward making sign language translation systems more practical, lightweight, and usable in real-world environments," says Ki Jun Yu, an associate professor of electrical and electronic engineering at Yonsei University in Seoul, Korea. More than 300 different sign languages are used worldwide, and many research projects are developing translation devices for communicating with people who do not know a sign language. However,

Identifying Interactions at Scale for LLMs
--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process more transparent to model builders and impacted humans, a step toward safer and more trustworthy AI. To gain a comprehensive understanding, we can analyze these systems through different lenses: feature attribution, which isolates the specific input features driving a prediction (Lundberg & Lee, 2017; Ribeiro et al., 2022); data attribution, which links model b

Main Character Energy: 2025 trend recap
At MAI, we just dropped our plan to build human-centered superintelligence. We haven't solved nuclear fusion or revolutionized medical treatment yet, but Copilot's mission this year was to make YOU the main character. How'd we do? 👉👈 Stats curated by Sophia Chen (MAI technical staff): To dive deeper into how people use Copilot across time, check out our MAI blog post and paper. The post Main Character Energy: 2025 trend recap appeared first on Microsoft Copilot Blog.
Deepening our collaboration with the U.S. Department of Energy
OpenAI and the U.S. Department of Energy have signed a memorandum of understanding to deepen collaboration on AI and advanced computing in support of scientific discovery. The agreement builds on ongoing work with national laboratories and helps establish a framework for applying AI to high-impact research across the DOE ecosystem.
Advancing science and math with GPT-5.2
GPT-5.2 is OpenAI's strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.

What exactly does word2vec learn?
What exactly does word2vec learn, and how? Answering this question amounts to understanding representation learning in a minimal yet interesting language modeling task. Despite the fact that word2vec is a well-known precursor to modern language models, for many years, researchers lacked a quantitative and predictive theory describing its learning process. In our new paper, we finally provide such a theory. We prove that there are realistic, practical regimes in which the learning problem reduces to unweighted least-squares matrix factorization. We solve the gradient flow dynamics in closed for
GPT-4
We've created GPT-4, the latest milestone in OpenAI's effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk
OpenAI researchers collaborated with Georgetown University's Center for Security and Emerging Technology and the Stanford Internet Observatory to investigate how large language models might be misused for disinformation purposes. The collaboration included an October 2021 workshop bringing together 30 disinformation researchers, machine learning experts, and policy analysts, and culminated in a co-authored report building on more than a year of research. This report outlines the threats that language models pose to the information environment if used to augment disinformation campaigns and int
DALL·E 2: Extending creativity
As part of our DALL·E 2 research preview, more than 3,000 artists from more than 118 countries have incorporated DALL·E into their creative workflows. The artists in our early access group have helped us discover new uses for DALL·E and have served as key voices as we've made decisions about DALL·E's features.
Learning to play Minecraft with Video PreTraining
We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions). Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents.
Deep double descent
We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don't yet fully understand why it happens, and view further study of this phenomenon as an important research direction.
GPT-2: 6-month follow-up
We're releasing the 774 million parameter GPT-2 language model after the release of our small 124M model in February, staged release of our medium 355M model in May, and subsequent research with partners and the AI community into the model's potential for misuse and societal benefit. We're also releasing an open-source legal agreement to make it easier for organizations to initiate model-sharing partnerships with each other, and are publishing a technical report about our experience in coordinating with the wider AI research community on publication norms.
Why responsible AI development needs cooperation on safety
We've written a policy research paper identifying four strategies that can be used today to improve the likelihood of long-term industry cooperation on safety norms in AI: communicating risks and benefits, technical collaboration, increased transparency, and incentivizing standards. Our analysis shows that industry cooperation on safety will be instrumental in ensuring that AI systems are safe and beneficial, but competitive pressures could lead to a collective action problem, potentially causing AI companies to under-invest in safety. We hope these strategies will encourage greater cooperatio
Implicit generation and generalization methods for energy-based models
We've made progress towards stable and scalable training of energy-based models (EBMs) resulting in better sample quality and generalization ability than existing models. Generation in EBMs spends more compute to continually refine its answers and doing so can generate samples competitive with GANs at low temperatures, while also having mode coverage guarantees of likelihood-based models. We hope these findings stimulate further research into this promising class of models.
AI safety needs social scientists
We've written a paper arguing that long-term AI safety research needs social scientists to ensure AI alignment algorithms succeed when actual humans are involved. Properly aligning advanced AI systems with human values requires resolving many uncertainties related to the psychology of human rationality, emotion, and biases. The aim of this paper is to spark further collaboration between machine learning and social science researchers, and we plan to hire social scientists to work on this full time at OpenAI.
Better language models and their implications
We've trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization-all without task-specific training.
Learning complex goals with iterated amplification
We're proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into simpler sub-tasks, rather than by providing labeled data or a reward function. Although this idea is in its very early stages and we have only completed experiments on simple toy algorithmic domains, we've decided to present it in its preliminary state because we think it could prove to be a scalable approach to AI safety.
Improving language understanding with unsupervised learning
We've obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we're also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored in the past, and we hope our result motivates further research into applying this idea on larger and more diverse datasets.
Evolved Policy Gradients
We're releasing an experimental metalearning approach called Evolved Policy Gradients, a method that evolves the loss function of learning agents, which can enable fast training on novel tasks. Agents trained with EPG can succeed at basic tasks at test time that were outside their training regime, like learning to navigate to an object on a different side of the room from where it was placed during training.
Ingredients for robotics research
We're releasing eight simulated robotics environments and a Baselines implementation of Hindsight Experience Replay, all developed for our research over the past year. We've used these environments to train models which work on physical robots. We're also releasing a set of requests for robotics research.
Preparing for malicious uses of AI
We've co-authored a paper that forecasts how malicious actors could misuse AI technology, and potential ways we can prevent and mitigate these threats. This paper is the outcome of almost a year of sustained work with our colleagues at the Future of Humanity Institute, the Centre for the Study of Existential Risk, the Center for a New American Security, the Electronic Frontier Foundation, and others.
Interpretable machine learning through teaching
We've designed a method that encourages AIs to teach each other with examples that also make sense to humans. Our approach automatically selects the most informative examples to teach a concept-for instance, the best images to describe the concept of dogs-and experimentally we found our approach to be effective at teaching both AIs
More on Dota 2
Our Dota 2 result shows that self-play can catapult the performance of machine learning systems from far below human level to superhuman, given sufficient compute. In the span of a month, our system went from barely matching a high-ranked player to beating the top pros and has continued to improve since then. Supervised deep learning systems can only be as good as their training datasets, but in self-play systems, the available data improves automatically as the agent gets better.