Modern Scientific Computing Enables Health Missions

Trends

Modern Scientific Computing Enables Health Missions

colleagues reviewing xray scan of human body

How recent developments empower scientific advancement

Scientific computing is crucial for advancing the federal government's scientific missions, including in health and biomedical research and development. There have been tremendous scientific advancements in the first two decades of the 21st听century. Standout examples include the sequencing of the听,听听techniques that can change DNA in living cells, and the world鈥檚 first听mRNA vaccines. These advancements and many others like them are the direct result of the scientific computing technology, infrastructure, and analysis that supports scientists, enhances their research, and helps foster progress toward their scientific missions.听

However, today the incredible and rapidly growing volume and variety of sensitive biomedical data, including omics data (e.g., genomics, transcriptomics, and proteomics), real-world data (e.g., electronic health records and physiological monitoring devices), and medical image data (e.g., radiological images), are straining the capacity of traditional scientific computing approaches and resources. These approaches and resources鈥攖hings like isolated scientific data and technology, on-premises high-performance computing clusters, and scientific computing laptops鈥攃annot scale to meet today鈥檚 data management and advanced analytics computational demands, nor do they easily enable data sharing and collaborative research. Modernization and transformation of scientific computing approaches and resources are needed and will:听

Improve听data quality and harmonization
Enable scalable and secure data storage
Ensure data is findable, accessible, interoperable, and reusable (FAIR)
Facilitate data merging and aggregation听
Enhance workforce skills and engagement听
Empower virtual collaboration听
Accelerate knowledge extraction

听

4 Keys to Transforming Scientific Computing

Novel, intuitive, and collaborative approaches for data management and analysis will be necessary to meet these challenges. We offer the following next-generation scientific computing considerations to better equip relevant parties to expedite data collection, effectively respond to data challenges, and facilitate data analysis:

1. Utilize Privacy-Preserving Data Mining Techniques and Synthetic Data to Protect Sensitive Data While Enabling Advanced Analytic Model Development.

The collection and use of sensitive data鈥攊ncluding personally identifiable information and protected health information such as genomics, clinical data, and other real-world data鈥攊s increasing dramatically across the scientific community. Example uses of sensitive data in scientific computing are analysis of post-market surveillance adverse event reports, development of advanced analytic clinical decision support models, and genetic biomarker research.听This new paradigm in scientific computing relies on data access and collaborative analysis of real-world data rather than the听traditional isolationist model, which makes innovative, privacy-preserving approaches to data mining and synthetic data generation absolutely crucial. Below we describe four notable approaches to privacy-preserving data mining and synthetic data generation. Considerations for selecting the appropriate method include privacy guarantees, data realism, and data access requirements.

听is a privacy-preserving data mining technique that makes it possible for data to be available in a collaborative, accessible environment while still remaining secure in its original server. Federated learning works by training a machine learning algorithm on multiple local datasets contained in local nodes without explicitly exchanging data samples.听
听is a method in which algorithms that are differentially private manifest themselves by presenting behavior that does not change in a way that is traceable to any single individual either joining or leaving the dataset.
听approaches听also address data privacy concerns by releasing synthetic data for artificial intelligence and machine learning (AI/ML) model development while withholding sensitive data in a secure private computational environment for model evaluation.听
, based on algorithms such as generative adversarial networks, is able to produce data that is a realistic alternative to sensitive real-world data.听

2. Leverage Hybrid and Cloud Computing to Improve Data Sharing, Accelerate Computing, and Reduce Development Burden.

As the computational infrastructure demands of data management and analytics have increased and as workforces, in general, have become more geographically dispersed, it is critical for scientific computing to provide computing environments that are scalable and available on-demand. A hybrid on-premises and听cloud computing听approach enables scientific computing organizations to leverage their already-owned infrastructure, such as on-premises high-performance computing clusters, while still receiving the benefits of cloud computing. For example, bursting excess computational demand to a public cloud can accelerate scientific discovery by reducing the time spent waiting for available computational resources. Public cloud environments also offer services (e.g., assistance with automating parts of the ML and AI model development process), models (e.g., for natural language processing, demand forecasting, and equipment monitoring), and specialized hardware (e.g., graphics processing units) that can accelerate听AI/ML adoption, development, and operationalization.听听

3. Boost Workforce Knowledge Management, Training, and User Adoption Programs to Increase Organizational Resilience to a Competitive Scientific Computing Labor Market.听

Knowledge management, training, and user-friendliness are crucial for promoting the adoption of novel technology. Within the scope of scientific computing, ease-of-use can drastically impact long-term rates of innovation by democratizing access to powerful toolsets, and听freeing those with specialized training and experience to focus on the most challenging problems. Computational scientists are in high demand making it critical for organizations to effectively onboard staff and to ensure the retention of knowledge that may otherwise be lost to attrition. The management and maintenance of technical institutional knowledge鈥攆or example, the standard operating procedures for computing infrastructure and laboratory processes鈥攄oesn鈥檛 happen by itself. It must be incentivized. Retrieval of documented knowledge is improved through听cognitive search听capabilities that combine indexing, automated curation, and artificial intelligence to provide personalized, relevant results, and reduce information overload. Knowledge management must enable reproducibility in order to increase efficiency in large working groups. Computational research practices including the use of version control systems, documentation of code using markup languages, and provision of execution instructions, increase scientific computing reproducibility. Furthermore, workflow languages allow for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments from workstations to cluster, cloud, and high-performance computing (HPC) environments. Training programs have shown marked success in motivating usage of avant-garde technology. The most effective听听emphasize addressing user needs, using clear communication, and providing an engaging experience. Oftentimes, incorporating data science training with access to HPC infrastructure is an enticing proposition for experts unfamiliar with the intricacies of technical scientific computing. Coaching and paired programming are attractive pathways to improving technical skillsets as these provide a human component to learning.听

4.听Utilize Tech Scouting to Enable Informed Strategic Planning and Technology Adoption while Improving Staff Retention.听听

As demonstrated through innovations in data generation, computing infrastructure, and advanced analytics, scientific computing resources are constantly evolving and improving. Technology scouting is the process by which emerging trends and new applications are captured from existing technologies, products, or services from both domestic and international public and private sectors. Tech scouting not only facilitates the adoption of such technologies in a timely, streamlined manner but also helps organizations develop effective strategic plans for integrating or otherwise responding to current and developing technologies. On top of this, exposure to emerging technology trends can increase scientific computing talent retention by keeping staff engaged and excited.听

Conclusion

Traditional approaches to advancing health and biomedical research are ill-equipped to handle challenges around听computational infrastructure, analytics, and the听rapidly growing volume, velocity, and variety of sensitive听biomedical data generation and collection. Effective responses to these challenges must leverage next-generation听scientific computing听breakthroughs in data privacy, cloud computing, management, training and adoption, and technology scouting. The ability to take advantage of such breakthroughs is critical to ongoing efforts to discover novel health biomarkers, protect public health through pathogen surveillance, ensure the continuing safety and efficacy of therapeutics, and enhance medical workforce skills through modeling and simulation-based training.

Article

蘑菇视频