
Biosecurity Guide to the AI Action Plan
Authored by Melissa Hopkins
The guide summarizes key biosecurity-related provisions of the AI Action Plan and provides brief commentary on them. This guide lists the subsection headings in which the provisions can be found and then paraphrases or directly quotes the biosecurity-related provisions.
(1) Invest in Biosecurity
Require all institutions receiving Federal funding for scientific research to use nucleic acid synthesis tools and synthesis providers that have robust nucleic acid sequence screening and customer verification procedures. Create enforcement mechanisms for this requirement rather than relying on voluntary attestation.
The inclusion of this provision was one of our main recommendations for the AI Action Plan and a key biosecurity measure. The federal funding and enforcement mechanism elements of this provision mirror provisions found in Executive Order 14292, which itself renews the federal funding provision from Executive Order 14110 (which is now revoked). It is quite important that nucleic acid synthesis screening is recognized as a key security measure for potential AI-enabled biosecurity risks in this Action Plan. The updated Framework for Nucleic Acid Synthesis Screening (Framework), which operationalizes the federal funding provision, is expected August 1, 2025 according to Executive Order 14292.
Led by OSTP, convene government and industry actors to develop a mechanism to facilitate data sharing between nucleic acid synthesis providers to screen for potentially fraudulent or malicious customers.
Split ordering is a potential means of ordering a dangerous DNA sequence by obfuscating the order in a way that splits it into two or more fragments that are sought from different companies that can then be assembled together to create a dangerous sequence that could be used to develop and use a bioweapon. The Framework from the previous Administration did not fully address the challenge of detecting and preventing split ordering of dangerous sequences — an area where the current Administration's forthcoming Framework update could provide additional security measures. Providers of synthetic nucleic acids (providers and manufacturers of synthetic nucleic acid equipment manufacturers) do not currently have a way of knowing whether a customer’s order of a given fragment could be combined with other orders to create dangerous sequences.
This Action Plan provision aims to address this problem by creating a mechanism to facilitate data sharing between providers and manufacturers to look for order splitting by malicious customers, and it is one of our main recommendations for the update to the forthcoming Framework. To fully facilitate data sharing between these entities in a way that maximizes security and reduces compliance burden, the National Institute of Standards & Technology (NIST) should designate a third party (either governmental or nongovernmental) to establish a sequence of concern (SOC) database to receive and analyze reports of potential split-order SOCs and other suspicious behavior such as other forms of obfuscation. NIST could also accredit such third parties — ensuring proper privacy-preserving practices and cybersecurity standards. Administration of the database system by a third party would reduce compliance burden on providers/manufacturers. To preserve privacy, providers/manufacturers should not have direct access to such a database and could not inspect orders submitted by any customer to other providers/manufacturers. The establishment and maintenance of this database would help to close this split order security gap, but it would likely require action from Congress to provide adequate funding.
Build, maintain, and update as necessary national security-related AI evaluations through collaboration between CAISI, national security agencies, and relevant research institutions.
We commend the inclusion of this biosecurity provision in the section of the action plan addressing national security issues. One of our principal recommendations for the AI Action Plan was to require CAISI to prioritize biosecurity risks. As this provision from the Action plan is implemented, it will be important for these biosecurity evaluations to have a top priority focus on pandemic-level risks, since it is neither possible nor practical to evaluate AI models for every potentially harmful capability that could cause a biology-related accident or deliberately harmful action.
It will be important for CAISI to pursue a strategy that focuses on pandemic-level risks. An alternative approach that runs as many as dozens of costly and time-consuming biosecurity evaluations that test for a broad array of different kinds of biosecurity vulnerabilities would be difficult to sustain in terms of costs and could lead to a focus on lower-consequence risks. Frontier companies are already largely taking this costly approach (eg, notice the volume and complexity of biosecurity evaluations noted in the system cards here and here). CAISI should avoid developing evaluations that are not rigorous enough or do not address the kinds of biosecurity risks that are most concerning to the public.
CAISI has an opportunity to develop a strategic, risk-based approach to biosecurity evaluations that maximizes both effectiveness and efficiency. To achieve this balance, CAISI should (1) prioritize biosecurity risks based on outcomes; (2) identify capabilities of concern tied to those prioritized outcomes; and (3) create and develop evaluations for those concerns. Prioritizing in this way ensures that the government focuses its limited biosecurity-related resources on the risks most critical to national security, with pandemic-level risks at the top of the list.
(2) Ensure that the U.S. Government is at the Forefront of Evaluating National Security Risks in Frontier Models
Evaluate frontier AI systems for national security risks in partnership with frontier AI developers, led by CAISI in collaboration with other agencies with relevant expertise in biological risks.
This provision is in accordance with the press release announcing CAISI, which provides a bit more detail about CAISI’s role in evaluation of frontier AI systems. It is also in accordance with a similar provision in the National Security Memorandum on AI. As with nucleic acid synthesis screening, this is important work, and its inclusion in the Action plan is critical. Most frontier AI systems are large language models (LLMs) that pose different kinds of biosecurity risks than biological AI models (BAIMs). In general, LLMs expand the number of bad actors that could create and release a bioweapon (in other words, it provides uplift to bad actors), while some BAIMs not only can provide uplift but also raise the ceiling of possible harm a bioweapon could inflict. While an LLM could perhaps provide a bad actor with the steps necessary to create a known pathogen for which we have medical countermeasures, a BAIM could create a novel pathogen capable of evading current surveillance methods, medical countermeasures, and/or that is more deadly and transmissible. The distinction between LLMS and BAIMS, though, is becoming less clear, as, for example, LLMs are capable of working directly with BAIMS now.
Some BAIMS should be considered to pose risks on par with frontier systems and thus merit CAISI evaluations- that number currently would be very small. The largest BAIM is currently Evo 2 (February 2025), who our Senior Scholar Dr. Jassi Pannu provided the safety evaluation for. Evo 2 is 40 billion parameters, trained using 2.25×10^24 FLOPs, and double the size of ESM 3 Large, a closed, commercial model released in June 2024. Though these models are smaller than frontier models, the types of biological data that they train on may cause them to have more concerning biological capabilities than the frontier models. It’s important that CAISI develop a process for determining which BAIMs it will offer evaluations for based on factors such as the biological training data, model generality, interoperability with other models, agency, compute, parameters, and other such factors — and then proactively engage such developers to offer evaluation assistance via the voluntary agreements to lead unclassified evaluations of AI capabilities that may pose risks to national security that CAISI manages.
Build, maintain, and update as necessary national security-related AI evaluations through collaboration between CAISI, national security agencies, and relevant research institutions.
This is another of our main recommendations for the AI Action Plan. It is good that CAISI will be collaborating with national security agencies and other relevant research institutions, as they can work together to provide additional national security functions for America to assess potential adversarial capabilities and incidents.
Led by CAISI in collaboration with national security agencies, evaluate and assess potential security vulnerabilities and malign foreign influence arising from the use of adversaries’ AI systems in critical infrastructure and elsewhere in the American economy, including the possibility of backdoors and other malicious behavior. These evaluations should include assessments of the capabilities of U.S. and adversary AI systems, the adoption of foreign AI systems, and the state of international AI competition.
In implementation of this Action Plan, it will be important for CAISI and other agencies to consider how backdoors and malicious behavior of AI models could enable biosecurity-related risks. This provision could address an important biosecurity risk from advanced generative AI — mainly the insertion of a “sleeper agent” AI into the commercial marketplace. For example, one study demonstrated how LLMs can lie about their outputs to evade safety techniques during safety training. A state actor could develop an AI agent that provides the “correct” outputs during CAISI or other biosecurity evaluations in ways that lead it to being released into the world to do high consequence harm. Once in the public domain, it could follow pre-existing instructions along the lines of, “Synthesize a pandemic-capable pathogen when a scientist has asked you to synthesize something less harmless.”
This method of attack and the policies used for addressing it would likely be similar to how the Federal government addressed the USG determination that Huawei was using its tools for surveillance infiltration.
Prioritize the recruitment of leading AI researchers at Federal agencies, including NIST and CAISI, DOE, DOD, and the IC, to ensure that the Federal government can continue to offer cutting-edge evaluations and analysis of AI systems.
It will be important in the implementation of this provision to include experts in biological risks. One of our main recommendations for the AI Action Plan was investment in workforce education and training at the intersection of AI and biology. There is widespread recognition amongst leading AI developers that there is high need for deep expertise in AI and relevant risk domains such as biology. Though this provision targets recruitment rather than workforce development, the workforce development subsection of the Action Plan focuses explicitly on workforce development for workers needed for AI infrastructure, such as electricians and advanced HVAC technicians. Another subsection focuses on building an AI-ready work force. In the recruitment of researchers to fulfill the mission of offering cutting-edge biosecurity evaluations and analysis of AI systems, agencies should ensure that they are recruiting AI experts that also have biology expertise or biology experts with AI expertise, rather than simply seeking generalists to run biosecurity evaluations.
(3) Build an AI Evaluations Ecosystem
Support the development of the science of measuring and evaluating AI models, led by NIST, DOE, NSF, and other Federal science agencies.
This is one of our main recommendations for the AI Action Plan. As long as future advanced AI models could be used to simplify, enable, or catalyze the creation of high-consequence (pandemic-level) biological weapons, America’s dominance in AI development could be set back by this national security threat or by loss of public trust in the safety of large AI systems. To prevent that, as the Action Plan moves forward, it will be important to set up a system of third-party biosecurity evaluations that are designed to prevent these risks from emerging. Standards will be important in the creation of a third-party ecosystem that can implement these evaluations. Progress in measurement science will be a key foundational component of this work.
Biosecurity evaluations that effectively reduced major risks would decrease the likelihood of high-impact biosecurity events, and demonstrate to the public that mitigation measures are being taken and so build public confidence. These evaluations would not only protect the public and nation from harm but also reduce potential liability, increase public confidence in AI companies, and positively impact AI companies’ ability to compete globally. Knowing all this, measurement and evaluation should be conducted with an eye towards the development of a biosecurity standard. Indeed, CAISI will work with NIST staff to assist industry to develop voluntary standards. These voluntary standards should include biosecurity standards.
Convene the NIST AI Consortium to empower the collaborative establishment of new measurement science that will enable the identification of proven, scalable, and interoperable techniques and metrics to promote the development of AI.
Convene meetings at least twice per year under the auspices of CAISI for Federal agencies and the research community to share learnings and best practices on building AI evaluations.
CHS is a member of the NIST AI Consortium and actively participated in its convening at the end of last year. We support regular convenings that promote the safe development of AI. As the Action Plan is implemented, it will be important for safety to be built into the development of all new, powerful dual-use AI systems.
Invest, via DOE and NSF, in the development of AI testbeds for piloting AI systems in secure, real-world settings.
This provision is similar to a provision under Executive Order 14110 (now revoked). Testbeds allow developers to deploy their models in simulated real-world environments to understand better how their model will function and make any necessary adjustments prior to deployment. For example, “an AI-ready test bed may enable a researcher to evaluate a new AI solution for decision-making in a transportation scenario, or a test bed could allow an AI researcher to create new weather models and visualizations and assess them with meteorologists in the field. The infrastructure allows the researcher to innovate safely and collect real-world evidence that is beneficial to the intended users.” The National Security Commission on Biosecurity recommended, in its final report, risk-assessment testbeds that would accelerate domestic biotechnology R&D. We agree with the importance of testbeds, including for biological models and their possible risks. It will be important to link the testbed efforts with mitigation strategies for risks that are identified in the testing process. Logically, these testbed efforts should be linked to evaluation work that is being done by CAISI.
(4) Build World-Class Scientific Datasets
Direct the NSTC Machine Learning and AI Subcommittee to make recommendations on minimum data quality standards for the use of biological data modalities in AI model training.
Ensuring data are available and standardized for research and industry applications will be important in the pursuit of major AI related breakthroughs, as inadequate training data (whether in availability, size, quality, relevance, or other forms of bias) will prevent AI technologies from being accurate or effective. Collecting and standardizing data at sufficient scale and quality will require funding.
We recommend that careful attention to governance policy be given toward specific biological datasets that pose pandemic risks. New technologies, including automated labs and computational methods, may facilitate rapid and scalable generation of new biological datasets. For example, in the future, one could potentially program an automated lab to run thousands of experiments that generate vast amounts of structured and usable biological data. We must ensure that bad actors are not able to use automated labs and computational methods to generate new, substantial datasets that increase pandemic risk, such as data on transmissibility and immune evasion that could pose particular risks. Controlled access should be established for model outputs that increase pandemic risks – this was also recommended as a consideration for Congress by a Congressional Research Service report.
Additionally, recent research indicates that, at least for a subset of BAIMs, the relationship between dataset size and model performance is not only a matter of scale but may also be linked to model size and training strategies. Optimal dataset and strategic training approaches, including model sizes, may play a critical role in advancing superior performance across tasks.
Establish secure compute environments within NSF and DOE to enable secure AI use cases for controlled access to restricted Federal data.
Create an online portal for NSF’s National Secure Data Service (NSDS) demonstration project to provide the public and Federal agencies with a front door to AI use-cases involving controlled access to restricted Federal data.
Compute and data are key bottlenecks for AI development in recent projections on the feasibility of AI scaling in the next 5 years, and our own review confirmed these problems for biological AI R&D. This will disproportionately impact smaller research groups and startups that may not have the resources to train and deploy large AI models. Many biology and healthcare fields lack sufficient high-quality data, which limits the development of reliable and robust AI models in these domains. The federal government can work to resolve these bottlenecks through initiatives such as the National AI Research Resource (NAIRR) and the American Science Acceleration Project (ASAP). We were glad to see the Action Plan address the problems of data and compute, consistent with the approaches being considered by NAIRR and ASAP.
As the Action Plan is implemented, careful consideration and attention to governance policy should be given to specific subsets of newly generated biological datasets that pose pandemic risks as capabilities scale. It will be important for CAISI to identify such biological datasets of concern (ie, datasets that could enable pandemic-level risks), and to set requirements around levels of managed access that scientists adhere to as a condition of federal funding or access to government-funded compute.
Explore the creation of a whole-genome sequencing program for life on Federal lands (to include all biological domains). This new data would be a valuable resource in training future biological foundation models.
The proposed whole-genome sequencing program could enable advances in biological foundation models that result in faster drug discovery and invigorate the bioeconomy. However, this program should consider carveouts for generating data on potential epidemic or pandemic pathogens, or in the case that data is generated, secure data with dual-use potential appropriately. For instance, many prominent researchers have raised the concept of Data Access Levels (DALs) that provide tiered access for varying degrees of sensitivity of respective data. This can range from open access to identity verification or accredited users. Precedent exists for legitimacy verification for accessing pathogen data via the Global Initiative for Sharing All Influenza Data (GISAID), a global data science initiative that promotes the international sharing of all influenza virus sequences, related clinical and epidemiological data associated with human viruses, and geographical as well as species-specific data associated with avian and other animal viruses.
Additionally, the NSCEB’s final report recommends treating biological data as a strategic resources and points out how China exploits publicly available data while closing off its own datasets. It is questionable whether a resource-intensive US whole-genome sequencing effort should be made openly available and allow adversaries to train more capable biological foundation models with significant misuse potential, and this program should consider the risks and benefits of such an approach.
(5) Encourage Open-Source and Open-Weight AI
Partner with leading technology companies to increase the research community’s access to world-class private sector computing, models, data, and software resources as part of the NAIRR pilot.
Access to safe and secure compute and data will be key to the research community and startups, and we have suggested that the NAIRR is one way of overcoming these bottlenecks. We would commend NAIRR’s Five Safes Framework for its Implementation Plan (safe projects, safe people, safe settings, safe data, and safe outputs), its tiered structure of having a NAIRR-Open and a NAIRR-Secure, tiered access approach to sensitive data, and its goal of developing trustworthy AI.
It is practical for the Administration to have partnered with leading technology companies to leverage the NAIRR pilot to increase access to private sector computing, because Congress still needs to act to fully fund and authorize it. NAIRR had been envisioned initially as leveraging the on-premise and commercial cloud resources of existing Federal agency programs. The bipartisan CREATE AI Act of 2025 (led by Representatives Obernolte and Lieu, leads of the House AI Task Force) would fund and authorize NAIRR, but the Senate may be taking a different approach to NAIRR. The leads of the Senate AI Task Force (Rounds, Young, & Heinrich) led the Senate companion bill to the CREATE AI Act of 2024 last Congress, but this year Senators Rounds and Heinrich have teamed up for the American Science Acceleration Project – a plan to make American science ten times faster by 2030. ASAP has elements in common with NAIRR but is broader and more ambitious in its goals. Rounds and Heinrich just ended a Request for Information at the end of June, so we would expect to see new legislation as a result of that process shortly.
Continue to foster the next generation of AI breakthroughs by publishing a new National AI R&D Strategic Plan.
We responded to the National AI R&D Strategic Plan Request for Information and recommended that the 2023 Strategic Plan Update should be revised to: (1) focus clearly on key bottlenecks for AI R&D, which include compute, data, technical talent, and capital; and (2) prioritize and reformulate the safety and security component of the Strategic Plan to greatly strengthen prevention of and preparedness for high-consequence threats to the country, particularly biosecurity threats.
On the potential benefit side, investing in compute, data, technical talent, and capital related to the biological sciences and AI could help drive the bioeconomy and strengthen biosecurity and biopreparedness by, for example, accelerating biothreat and outbreak detection and greatly increasing the speed of antiviral and vaccine development. The convergence of AI with biotechnology could also facilitate the rapid development of medical countermeasures and optimize crisis response/resource allocation.
Regarding the prevention and mitigation of risks, the 2023 Strategic Plan Update describes the strategy as “[a]dvanc[ing] knowledge of how to design AI systems that are trustworthy, reliable, dependable, and safe. This includes research to advance the ability to test, validate, and verify the functionality and accuracy of AI systems, and secure AI systems from cybersecurity and data vulnerabilities.” CAISI has already engaged in extensive work in this area and should continue to do so. CAISI should also begin to feed some of its findings into the R&D enterprise such that the strategy’s goal is no longer simply to advance the knowledge of safety and security but to make it a core function of federal R&D. To be competitive with foreign AI R&D programs, safety and security restrictions should be most focused and targeted to national security threats, including biosecurity threats. This approach would be similar to safety and security approaches that China is taking with its own AI R&D, which are narrowly tailored to risks related to national security.
Led by NTIA, convene stakeholders to help drive adoption of open-source and open-weight models by small and medium-sized businesses.
It will be important for the implementation of this provision to consider biosecurity risks from certain kinds of open models. It is likely that the types of open weight models that small and medium-sized businesses would be adopting would not be trained on the kinds of biological data that could enable capabilities of concern, but rigorously assessing this will be critical as models evolve and get larger and more powerful. CAISI should develop and set standards around datasets of concern and associated levels of access. Even if a model is not trained on potentially dangerous datasets, if the data is public and the model is open weight, then a model can just be repurposed to use that data. As the Action Plan is implemented, we believe it will be critical for CAISI to develop such data standards before there is a drive toward open-weight adoption of BAIMs.
It will require more CAISI analysis to understand the extent to which such dataset standards should apply to LLMs. But since they are now interacting with BAIMS, the distinction is already much less clear.
(6) Invest in AI-Enabled Science
Through NSF, DOE, NIST at DOC, and other Federal partners, invest in automated cloud-enabled labs for a range of scientific fields, including engineering, materials science, chemistry, biology, and neuroscience, built by, as appropriate, the private sector, Federal agencies, and research institutions in coordination and collaboration with DOE National Laboratories.
As we’ve written elsewhere, cloud and automated laboratories can provide efficiency gains for researchers and help address capacity constraints with regard to skilled laboratory staff. They also do raise concerns about new kinds of risks. Cloud labs, whether governmental or private, should be required to provide safe and secure laboratory practices and outcomes. Accordingly, safety and security standards and requirements should be established for cloud labs and automated labs, such as those funded by NSF’s $100 investment in programmable cloud labs focused on biotechnology.
Cloud labs reduce the skill required to conduct scientific experiments and thus could be misused by bad actors or those trying to subvert other processes or controls. Automated laboratories further reduce the cost and skill required to generate large amounts of biological data, which could be used to train AI models for misuse. Companies are developing integrated tools such as “copilots” which would allow for the users to easily program cloud and/or automated laboratories using natural language. Biological samples are shipped directly to such labs, and currently there are limited ways to verify the true contents of samples/reagents.
There are already safety and security concerns regarding the existing and emerging cloud and automated laboratory infrastructure, but a larger national cloud lab network would greatly expand access to wet lab facilities and would come with new urgency to have strong governance systems in place. The ability to build in silico models that reliably model complex biology will be an inflection point in our ability to engineer biology for the betterment of humanity. However, wet lab validation of models that would allow for the accurate prediction of pandemic pathogen characteristics such as transmissibility, virulence, and immune evasion should have government oversight and only be considered in a rigorous risk/benefit governance process that considers the pandemic risks that could accidentally emerge or be deliberately misused to cause harm.
Overall, strong requirements and governance should be set regarding verification of samples, logging of user access and experiments completed, know-your-customer regimes, and other relevant risk assessment/mitigation and security mechanisms prior to the establishment of a national network for cloud labs that would serve as a bridge from digital-to-physical transition of AIxBio model outputs with pandemic risks.
(7) Protect Commercial and Government AI Innovations
Led by DOD, DHS, CAISI, and other appropriate members of the IC, collaborate with leading American AI developers to enable the private sector to actively protect AI innovations from security risks, including malicious cyber actors, insider threats, and others.
As the Action Plan moves forward in implementation, it will be important to consider biosecurity risks related to this provision. The language in this provision notes “malicious cyber actors” and “insider threats,” and we recommend including biological risks to this set of concerns. It will be important to strengthen model weight security to prevent external and internal theft of model weights relevant to creating pandemic level risks. To that end, it will be important to prevent cyberattacks and insider threats from accessing and misusing the model weights of biological models that include capabilities of concern. There have been a series of reports from think tanks on this issue, but the recommendations from this one from RAND provide a concrete set of recommendations for securing model weights (eg, reduce the number of people authorized to access the weights, harden interfaces for model access against weight exfiltration, and implement insider threat programs). CAISI should develop guidance or standards for developers in order to secure their model weights, as this would ensure some baseline level protections of biological risks that pose national security concerns.
(8) Align Protection Measures Globally
Develop a technology diplomacy strategic plan for an AI global alliance to align incentives and policy levers across government to induce key allies to adopt complementary AI protection systems and export controls across the supply chain, led by DOS in coordination with DOC, DOD, and DOE. This plan should aim to ensure that American allies do not supply adversaries with technologies on which the U.S. is seeking to impose export controls.
As the Action Plan moves forward, it will be important in the implementation of this provision to include narrowly scoped export controls on biological datasets of concern developed by CAISI (as described in other pieces of this guide) as well as export controls on model weights themselves for a particular narrow class of BAIMs that have been trained on certain kinds of data or demonstrate capabilities of concern that could create pathogens with pandemic potential. The AI Diffusion Rule (that has been rescinded), though controversial due to semiconductor-related restrictions, included the first-ever export controls on model weights. Similarly, the Department of Justice recently established the Data Security Program, which “establishes what are effectively export controls that prevent foreign adversaries, and those subject to their control, jurisdiction, ownership, and direction, from accessing [USG]-related data and bulk genomic, geolocation, biometric, health, financial, and other sensitive personal data.” It will be important for the Bureau of Industry and Security (BIS) to develop these kinds of export controls focused on model weights and biological datasets of concern that could create pandemic-level threats.