Convolutional neural networks depend on deep network architectures to extract accurate information for image super‐resolution.However,obtained information of these con-volutional neural networks cannot completely exp...Convolutional neural networks depend on deep network architectures to extract accurate information for image super‐resolution.However,obtained information of these con-volutional neural networks cannot completely express predicted high‐quality images for complex scenes.A dynamic network for image super‐resolution(DSRNet)is presented,which contains a residual enhancement block,wide enhancement block,feature refine-ment block and construction block.The residual enhancement block is composed of a residual enhanced architecture to facilitate hierarchical features for image super‐resolution.To enhance robustness of obtained super‐resolution model for complex scenes,a wide enhancement block achieves a dynamic architecture to learn more robust information to enhance applicability of an obtained super‐resolution model for varying scenes.To prevent interference of components in a wide enhancement block,a refine-ment block utilises a stacked architecture to accurately learn obtained features.Also,a residual learning operation is embedded in the refinement block to prevent long‐term dependency problem.Finally,a construction block is responsible for reconstructing high‐quality images.Designed heterogeneous architecture can not only facilitate richer structural information,but also be lightweight,which is suitable for mobile digital devices.Experimental results show that our method is more competitive in terms of performance,recovering time of image super‐resolution and complexity.The code of DSRNet can be obtained at https://github.com/hellloxiaotian/DSRNet.展开更多
In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) a...In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.展开更多
BACKGROUND Artificial intelligence in colonoscopy is an emerging field,and its application may help colonoscopists improve inspection quality and reduce the rate of missed polyps and adenomas.Several deep learning-bas...BACKGROUND Artificial intelligence in colonoscopy is an emerging field,and its application may help colonoscopists improve inspection quality and reduce the rate of missed polyps and adenomas.Several deep learning-based computer-assisted detection(CADe)techniques were established from small single-center datasets,and unrepresentative learning materials might confine their application and generalization in wide practice.Although CADes have been reported to identify polyps in colonoscopic images and videos in real time,their diagnostic performance deserves to be further validated in clinical practice.AIM To train and test a CADe based on multicenter high-quality images of polyps and preliminarily validate it in clinical colonoscopies.METHODS With high-quality screening and labeling from 55 qualified colonoscopists,a dataset consisting of over 71000 images from 20 centers was used to train and test a deep learning-based CADe.In addition,the real-time diagnostic performance of CADe was tested frame by frame in 47 unaltered full-ranged videos that contained 86 histologically confirmed polyps.Finally,we conducted a selfcontrolled observational study to validate the diagnostic performance of CADe in real-world colonoscopy with the main outcome measure of polyps per colonoscopy in Changhai Hospital.RESULTS The CADe was able to identify polyps in the test dataset with 95.0%sensitivity and 99.1%specificity.For colonoscopy videos,all 86 polyps were detected with 92.2%sensitivity and 93.6%specificity in frame-by-frame analysis.In the prospective validation,the sensitivity of CAD in identifying polyps was 98.4%(185/188).Folds,reflections of light and fecal fluid were the main causes of false positives in both the test dataset and clinical colonoscopies.Colonoscopists can detect more polyps(0.90 vs 0.82,P<0.001)and adenomas(0.32 vs 0.30,P=0.045)with the aid of CADe,particularly polyps<5 mm and flat polyps(0.65 vs 0.57,P<0.001;0.74 vs 0.67,P=0.001,respectively).However,high efficacy is not realized in colonoscopies with inadequate bowel preparation and withdrawal time(P=0.32;P=0.16,respectively).CONCLUSION CADe is feasible in the clinical setting and might help endoscopists detect more polyps and adenomas,and further confirmation is warranted.展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
Organoid models have provided a powerful platform for mechanistic investigations into fundamental biological processes involved in the development and function of organs.Despite the potential for image-based phenotypi...Organoid models have provided a powerful platform for mechanistic investigations into fundamental biological processes involved in the development and function of organs.Despite the potential for image-based phenotypic quantification of organoids,their complex 3D structure,and the time-consuming and labor-intensive nature of immunofluorescent staining present significant challenges.In this work,we developed a virtual painting system,PhaseFIT(phase-fluorescent image transformation)utilizing customized and morphologically rich 2.5D intestinal organoids,which generate virtual fluorescent images for phenotypic quantification via accessible and low-cost organoid phase images.This system is driven by a novel segmentation-informed deep generative model that specializes in segmenting overlap and proximity between objects.The model enables an annotation-free digital transformation from phase-contrast to multi-channel fluorescent images.The virtual painting results of nuclei,secretory cell markers,and stem cells demonstrate that PhaseFIT outperforms the existing deep learning-based stain transformation models by generating fine-grained visual content.We further validated the efficiency and accuracy of PhaseFIT to quantify the impacts of three compounds on crypt formation,cell population,and cell stemness.PhaseFIT is the first deep learning-enabled virtual painting system focused on live organoids,enabling large-scale,informative,and efficient organoid phenotypic quantification.PhaseFIT would enable the use of organoids in high-throughput drug screening applications.展开更多
We present SinGRAV, an attempt to learn a generative radiance volume from multi-view observations of a single natural scene, in stark contrast to existing category-level 3D generative models that learn from images of ...We present SinGRAV, an attempt to learn a generative radiance volume from multi-view observations of a single natural scene, in stark contrast to existing category-level 3D generative models that learn from images of many object-centric scenes. Inspired by SinGAN, we also learn the internal distribution of the input scene, which necessitates our key designs w.r.t. the scene representation and network architecture. Unlike popular multi-layer perceptrons (MLP)-based architectures, we particularly employ convolutional generators and discriminators, which inherently possess spatial locality bias, to operate over voxelized volumes for learning the internal distribution over a plethora of overlapping regions. On the other hand, localizing the adversarial generators and discriminators over confined areas with limited receptive fields easily leads to highly implausible geometric structures in the spatial. Our remedy is to use spatial inductive bias and joint discrimination on geometric clues in the form of 2D depth maps. This strategy is effective in improving spatial arrangement while incurring negligible additional computational cost. Experimental results demonstrate the ability of SinGRAV in generating plausible and diverse variations from a single scene, the merits of SinGRAV over state-of-the-art generative neural scene models, and the versatility of SinGRAV by its use in a variety of applications. Code and data will be released to facilitate further research.展开更多
The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position an...The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position and viewpoint-conditioned neural networks,3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images.Apart from fast rendering,the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction,geometry editing,and physical simulation.Considering the rapid changes and growing number of works in this field,we present a literature review of recent 3D Gaussian splatting methods,which can be roughly classified by functionality into 3D reconstruction,3D editing,and other downstream applications.Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique.This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview,aiming to stimulate future development of the 3D Gaussian splatting representation.展开更多
Neural fields can efficiently encode three-dimensional(3D)scenes,providing a bridge between two-dimensional(2D)images and virtual reality.This method becomes a trendsetter in bringing the metaverse into vivo life.It h...Neural fields can efficiently encode three-dimensional(3D)scenes,providing a bridge between two-dimensional(2D)images and virtual reality.This method becomes a trendsetter in bringing the metaverse into vivo life.It has initially captured the attention of macroscopic biology,as demonstrated by computed tomography and magnetic resonance imaging,which provide a 3D field of view for diagnostic biological images.Meanwhile,it has also opened up new research opportunities in microscopic imaging,such as achieving clearer de novo protein structure reconstructions.Introducing this method to the field of biology is particularly significant,as it is refining the approach to studying biological images.However,many biologists have yet to fully appreciate the distinctive meaning of neural fields in transforming 2D images into 3D perspectives.This article discusses the application of neural fields in both microscopic and macroscopic biological images and their practical uses in biomedicine,highlighting the broad prospects of neural fields in the future biological metaverse.We stand at the threshold of an exciting new era,where the advancements in neural field technology herald the dawn of exploring the mysteries of life in innovative ways.展开更多
Large language models(LLMs)have made unprecedented progress,demonstrating human-like language proficiency and an extraordinary ability to encode complex knowledge.The emergence of high-level cognitive capabilities in ...Large language models(LLMs)have made unprecedented progress,demonstrating human-like language proficiency and an extraordinary ability to encode complex knowledge.The emergence of high-level cognitive capabilities in LLMs,such as in-context learning and complex reasoning,suggests a path toward the realization of artificial general intelligence(AGI).However,we lack scientific theories and tools to assess and interpret such an emergence of the advanced intelligence of LLMs.Artificial intelligence(AI)has been extensively applied in various areas of fundamental science to accelerate scientific research.展开更多
目的本征图像分解是计算视觉和图形学领域的一个基本问题,旨在将图像中场景的纹理和光照成分分离开来。基于深度学习的本征图像分解方法受限于现有的数据集,存在分解结果过度平滑、在真实数据泛化能力较差等问题。方法首先设计基于图卷...目的本征图像分解是计算视觉和图形学领域的一个基本问题,旨在将图像中场景的纹理和光照成分分离开来。基于深度学习的本征图像分解方法受限于现有的数据集,存在分解结果过度平滑、在真实数据泛化能力较差等问题。方法首先设计基于图卷积的模块,显式地考虑图像中的非局部信息。同时,为了使训练的网络可以处理更复杂的光照情况,渲染了高质量的合成数据集。此外,引入了一个基于神经网络的反照率图像优化模块,提升获得的反照率图像的局部平滑性。结果将不同方法在所提的数据集上训练,相比之前合成数据集CGIntrinsics进行训练的结果,在IIW(intrinsic images in the wild)测试数据集的平均WHDR(weighted human disagreement rate)降低了7.29%,在SAW(shading annotations in the wild)测试集的AP(average precision)指标上提升了2.74%。同时,所提出的基于图卷积的神经网络,在IIW、SAW数据集上均取得了较好的结果,在视觉结果上显著优于此前的方法。此外,利用本文算法得到的本征结果,在重光照、纹理编辑和光照编辑等图像编辑任务上,取得了更优的结果。结论所提出的数据集质量更高,有利于基于神经网络的本征分解模型的训练。同时,提出的本征分解模型由于显式地结合了非局部先验,得到了更优的本征分解结果,并通过一系列应用任务进一步验证了结果。展开更多
随着数据科学和材料科学的进步,人们如今可构建出较为准确的人工智能模型,用于材料性质预测.本文中,我们以170,714个无机晶体化合物的高通量第一性原理计算数据集为基础,训练得到了可精确预测无机化合物形成能的机器学习模型.相比于同...随着数据科学和材料科学的进步,人们如今可构建出较为准确的人工智能模型,用于材料性质预测.本文中,我们以170,714个无机晶体化合物的高通量第一性原理计算数据集为基础,训练得到了可精确预测无机化合物形成能的机器学习模型.相比于同类工作,本项研究以超大数据集为出发点,构建出无机晶体形成能的高精度泛化模型,可外推至广阔相空间,其中的Dense Net神经网络模型精度可以达到R^(2)=0.982和平均绝对误差(MAE)=0.072 eV atom^(-1).上述模型精度的提升源自一系列新型特征描述符,这些描述符可有效提取出原子与领域原子间的电负性和局域结构等信息,从而精确捕捉到原子间的相互作用.本文为新材料搜索提供了一种高效、低成本的结合能预测手段.展开更多
The high-content image-based assay is commonly leveraged for identifying the phenotypic impact of genetic perturbations in biology field.However,a persistent issue remains unsolved during experiments:the interferentia...The high-content image-based assay is commonly leveraged for identifying the phenotypic impact of genetic perturbations in biology field.However,a persistent issue remains unsolved during experiments:the interferential technical noises caused by systematic errors(e.g.,temperature,reagent concentration,and well location)are always mixed up with the real biological signals,leading to misinterpretation of any conclusion drawn.Here,we reported a mean teacher-based deep learning model(Deep Noise)that can disentangle biological signals from the experimental noises.Specifically,we aimed to classify the phenotypic impact of 1108 different genetic perturbations screened from 125,510 fluorescent microscopy images,which were totally unrecognizable by the human eye.We validated our model by participating in the Recursion Cellular Image Classification Challenge,and Deep Noise achieved an extremely high classification score(accuracy:99.596%),ranking the 2nd place among 866 participating groups.This promising result indicates the successful separation of biological and technical factors,which might help decrease the cost of treatment development and expedite the drug discovery process.The source code of Deep Noise is available at https://github.com/Scu-sen/Recursion-Cellular-Image-Classification-Challenge.展开更多
The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the...The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the problem scale increases.Troubled by this,researchers may prefer fast methods even if they are not exact,so approximation algorithms,heuristic algorithms,and machine learning have been proposed.Some works proposed chaotic simulated annealing(CSA)based on the Hopfield neural network and did a good job.However,CSA is not something that current general-purpose processors can handle easily,and there is no special hardware for it.To efficiently perform CSA,we propose a software and hardware co-design.In software,we quantize the weight and output using appropriate bit widths,and then modify the calculations that are not suitable for hardware implementation.In hardware,we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor.COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration.The results show that COPPER can perform CSA remarkably well in both speed and energy.展开更多
基金the TCL Science and Technology Innovation Fundthe Youth Science and Technology Talent Promotion Project of Jiangsu Association for Science and Technology,Grant/Award Number:JSTJ‐2023‐017+4 种基金Shenzhen Municipal Science and Technology Innovation Council,Grant/Award Number:JSGG20220831105002004National Natural Science Foundation of China,Grant/Award Number:62201468Postdoctoral Research Foundation of China,Grant/Award Number:2022M722599the Fundamental Research Funds for the Central Universities,Grant/Award Number:D5000210966the Guangdong Basic and Applied Basic Research Foundation,Grant/Award Number:2021A1515110079。
文摘Convolutional neural networks depend on deep network architectures to extract accurate information for image super‐resolution.However,obtained information of these con-volutional neural networks cannot completely express predicted high‐quality images for complex scenes.A dynamic network for image super‐resolution(DSRNet)is presented,which contains a residual enhancement block,wide enhancement block,feature refine-ment block and construction block.The residual enhancement block is composed of a residual enhanced architecture to facilitate hierarchical features for image super‐resolution.To enhance robustness of obtained super‐resolution model for complex scenes,a wide enhancement block achieves a dynamic architecture to learn more robust information to enhance applicability of an obtained super‐resolution model for varying scenes.To prevent interference of components in a wide enhancement block,a refine-ment block utilises a stacked architecture to accurately learn obtained features.Also,a residual learning operation is embedded in the refinement block to prevent long‐term dependency problem.Finally,a construction block is responsible for reconstructing high‐quality images.Designed heterogeneous architecture can not only facilitate richer structural information,but also be lightweight,which is suitable for mobile digital devices.Experimental results show that our method is more competitive in terms of performance,recovering time of image super‐resolution and complexity.The code of DSRNet can be obtained at https://github.com/hellloxiaotian/DSRNet.
文摘In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.
基金the National Key R&D Program of China,No.2018YFC1313103the National Natural Science Foundation of China,No.81670473 and No.81873546+1 种基金the“Shu Guang”Project of Shanghai Municipal Education Commission and Shanghai Education Development Foundation,No.19SG30the Key Area Research and Development Program of Guangdong Province,China,No.2018B010111001.
文摘BACKGROUND Artificial intelligence in colonoscopy is an emerging field,and its application may help colonoscopists improve inspection quality and reduce the rate of missed polyps and adenomas.Several deep learning-based computer-assisted detection(CADe)techniques were established from small single-center datasets,and unrepresentative learning materials might confine their application and generalization in wide practice.Although CADes have been reported to identify polyps in colonoscopic images and videos in real time,their diagnostic performance deserves to be further validated in clinical practice.AIM To train and test a CADe based on multicenter high-quality images of polyps and preliminarily validate it in clinical colonoscopies.METHODS With high-quality screening and labeling from 55 qualified colonoscopists,a dataset consisting of over 71000 images from 20 centers was used to train and test a deep learning-based CADe.In addition,the real-time diagnostic performance of CADe was tested frame by frame in 47 unaltered full-ranged videos that contained 86 histologically confirmed polyps.Finally,we conducted a selfcontrolled observational study to validate the diagnostic performance of CADe in real-world colonoscopy with the main outcome measure of polyps per colonoscopy in Changhai Hospital.RESULTS The CADe was able to identify polyps in the test dataset with 95.0%sensitivity and 99.1%specificity.For colonoscopy videos,all 86 polyps were detected with 92.2%sensitivity and 93.6%specificity in frame-by-frame analysis.In the prospective validation,the sensitivity of CAD in identifying polyps was 98.4%(185/188).Folds,reflections of light and fecal fluid were the main causes of false positives in both the test dataset and clinical colonoscopies.Colonoscopists can detect more polyps(0.90 vs 0.82,P<0.001)and adenomas(0.32 vs 0.30,P=0.045)with the aid of CADe,particularly polyps<5 mm and flat polyps(0.65 vs 0.57,P<0.001;0.74 vs 0.67,P=0.001,respectively).However,high efficacy is not realized in colonoscopies with inadequate bowel preparation and withdrawal time(P=0.32;P=0.16,respectively).CONCLUSION CADe is feasible in the clinical setting and might help endoscopists detect more polyps and adenomas,and further confirmation is warranted.
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.
基金supported by funding from the National Institutes of Health(R01DK123219 and K01DK103947 to N.S.).
文摘Organoid models have provided a powerful platform for mechanistic investigations into fundamental biological processes involved in the development and function of organs.Despite the potential for image-based phenotypic quantification of organoids,their complex 3D structure,and the time-consuming and labor-intensive nature of immunofluorescent staining present significant challenges.In this work,we developed a virtual painting system,PhaseFIT(phase-fluorescent image transformation)utilizing customized and morphologically rich 2.5D intestinal organoids,which generate virtual fluorescent images for phenotypic quantification via accessible and low-cost organoid phase images.This system is driven by a novel segmentation-informed deep generative model that specializes in segmenting overlap and proximity between objects.The model enables an annotation-free digital transformation from phase-contrast to multi-channel fluorescent images.The virtual painting results of nuclei,secretory cell markers,and stem cells demonstrate that PhaseFIT outperforms the existing deep learning-based stain transformation models by generating fine-grained visual content.We further validated the efficiency and accuracy of PhaseFIT to quantify the impacts of three compounds on crypt formation,cell population,and cell stemness.PhaseFIT is the first deep learning-enabled virtual painting system focused on live organoids,enabling large-scale,informative,and efficient organoid phenotypic quantification.PhaseFIT would enable the use of organoids in high-throughput drug screening applications.
基金supported by the International(Regional)Cooperation and Exchange Program of National Natural Science Foundation of China under Grant No.62161146002the Shenzhen Collaborative Innovation Program under Grant No.CJGJZD2021048092601003.
文摘We present SinGRAV, an attempt to learn a generative radiance volume from multi-view observations of a single natural scene, in stark contrast to existing category-level 3D generative models that learn from images of many object-centric scenes. Inspired by SinGAN, we also learn the internal distribution of the input scene, which necessitates our key designs w.r.t. the scene representation and network architecture. Unlike popular multi-layer perceptrons (MLP)-based architectures, we particularly employ convolutional generators and discriminators, which inherently possess spatial locality bias, to operate over voxelized volumes for learning the internal distribution over a plethora of overlapping regions. On the other hand, localizing the adversarial generators and discriminators over confined areas with limited receptive fields easily leads to highly implausible geometric structures in the spatial. Our remedy is to use spatial inductive bias and joint discrimination on geometric clues in the form of 2D depth maps. This strategy is effective in improving spatial arrangement while incurring negligible additional computational cost. Experimental results demonstrate the ability of SinGRAV in generating plausible and diverse variations from a single scene, the merits of SinGRAV over state-of-the-art generative neural scene models, and the versatility of SinGRAV by its use in a variety of applications. Code and data will be released to facilitate further research.
基金supported by the National Natural Science Foundation of China(62322210)Beijing Municipal Natural Science Foundation for Distinguished Young Scholars(JQ21013)+1 种基金Beijing Municipal Science and Technology Commission(Z231100005923031)2023 Tencent AI Lab Rhino-Bird Focused Research Program.
文摘The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position and viewpoint-conditioned neural networks,3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images.Apart from fast rendering,the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction,geometry editing,and physical simulation.Considering the rapid changes and growing number of works in this field,we present a literature review of recent 3D Gaussian splatting methods,which can be roughly classified by functionality into 3D reconstruction,3D editing,and other downstream applications.Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique.This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview,aiming to stimulate future development of the 3D Gaussian splatting representation.
文摘Neural fields can efficiently encode three-dimensional(3D)scenes,providing a bridge between two-dimensional(2D)images and virtual reality.This method becomes a trendsetter in bringing the metaverse into vivo life.It has initially captured the attention of macroscopic biology,as demonstrated by computed tomography and magnetic resonance imaging,which provide a 3D field of view for diagnostic biological images.Meanwhile,it has also opened up new research opportunities in microscopic imaging,such as achieving clearer de novo protein structure reconstructions.Introducing this method to the field of biology is particularly significant,as it is refining the approach to studying biological images.However,many biologists have yet to fully appreciate the distinctive meaning of neural fields in transforming 2D images into 3D perspectives.This article discusses the application of neural fields in both microscopic and macroscopic biological images and their practical uses in biomedicine,highlighting the broad prospects of neural fields in the future biological metaverse.We stand at the threshold of an exciting new era,where the advancements in neural field technology herald the dawn of exploring the mysteries of life in innovative ways.
基金This work was funded by National Natural Science Foundation of China(62001205)National Key R&D Program of China(2021YFF1200804)Shenzhen Science and Technology Innovation Committee(2022410129,KCXFZ20201221173400001,and SGDX2020110309280100).
文摘Large language models(LLMs)have made unprecedented progress,demonstrating human-like language proficiency and an extraordinary ability to encode complex knowledge.The emergence of high-level cognitive capabilities in LLMs,such as in-context learning and complex reasoning,suggests a path toward the realization of artificial general intelligence(AGI).However,we lack scientific theories and tools to assess and interpret such an emergence of the advanced intelligence of LLMs.Artificial intelligence(AI)has been extensively applied in various areas of fundamental science to accelerate scientific research.
文摘目的本征图像分解是计算视觉和图形学领域的一个基本问题,旨在将图像中场景的纹理和光照成分分离开来。基于深度学习的本征图像分解方法受限于现有的数据集,存在分解结果过度平滑、在真实数据泛化能力较差等问题。方法首先设计基于图卷积的模块,显式地考虑图像中的非局部信息。同时,为了使训练的网络可以处理更复杂的光照情况,渲染了高质量的合成数据集。此外,引入了一个基于神经网络的反照率图像优化模块,提升获得的反照率图像的局部平滑性。结果将不同方法在所提的数据集上训练,相比之前合成数据集CGIntrinsics进行训练的结果,在IIW(intrinsic images in the wild)测试数据集的平均WHDR(weighted human disagreement rate)降低了7.29%,在SAW(shading annotations in the wild)测试集的AP(average precision)指标上提升了2.74%。同时,所提出的基于图卷积的神经网络,在IIW、SAW数据集上均取得了较好的结果,在视觉结果上显著优于此前的方法。此外,利用本文算法得到的本征结果,在重光照、纹理编辑和光照编辑等图像编辑任务上,取得了更优的结果。结论所提出的数据集质量更高,有利于基于神经网络的本征分解模型的训练。同时,提出的本征分解模型由于显式地结合了非局部先验,得到了更优的本征分解结果,并通过一系列应用任务进一步验证了结果。
基金the financial support from the Chinese Academy of Sciences(CAS-WX2021PY-0102,ZDBS-LY-SLH007,and XDB33020000)。
文摘随着数据科学和材料科学的进步,人们如今可构建出较为准确的人工智能模型,用于材料性质预测.本文中,我们以170,714个无机晶体化合物的高通量第一性原理计算数据集为基础,训练得到了可精确预测无机化合物形成能的机器学习模型.相比于同类工作,本项研究以超大数据集为出发点,构建出无机晶体形成能的高精度泛化模型,可外推至广阔相空间,其中的Dense Net神经网络模型精度可以达到R^(2)=0.982和平均绝对误差(MAE)=0.072 eV atom^(-1).上述模型精度的提升源自一系列新型特征描述符,这些描述符可有效提取出原子与领域原子间的电负性和局域结构等信息,从而精确捕捉到原子间的相互作用.本文为新材料搜索提供了一种高效、低成本的结合能预测手段.
文摘The high-content image-based assay is commonly leveraged for identifying the phenotypic impact of genetic perturbations in biology field.However,a persistent issue remains unsolved during experiments:the interferential technical noises caused by systematic errors(e.g.,temperature,reagent concentration,and well location)are always mixed up with the real biological signals,leading to misinterpretation of any conclusion drawn.Here,we reported a mean teacher-based deep learning model(Deep Noise)that can disentangle biological signals from the experimental noises.Specifically,we aimed to classify the phenotypic impact of 1108 different genetic perturbations screened from 125,510 fluorescent microscopy images,which were totally unrecognizable by the human eye.We validated our model by participating in the Recursion Cellular Image Classification Challenge,and Deep Noise achieved an extremely high classification score(accuracy:99.596%),ranking the 2nd place among 866 participating groups.This promising result indicates the successful separation of biological and technical factors,which might help decrease the cost of treatment development and expedite the drug discovery process.The source code of Deep Noise is available at https://github.com/Scu-sen/Recursion-Cellular-Image-Classification-Challenge.
基金Project supported by the National Natural Science Foundation of China(Nos.61832020,62032001,92064006,and 62274036)the Beijing Academy of Artificial Intelligence(BAAI)of Chinathe 111 Project of China(No.B18001)。
文摘The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the problem scale increases.Troubled by this,researchers may prefer fast methods even if they are not exact,so approximation algorithms,heuristic algorithms,and machine learning have been proposed.Some works proposed chaotic simulated annealing(CSA)based on the Hopfield neural network and did a good job.However,CSA is not something that current general-purpose processors can handle easily,and there is no special hardware for it.To efficiently perform CSA,we propose a software and hardware co-design.In software,we quantize the weight and output using appropriate bit widths,and then modify the calculations that are not suitable for hardware implementation.In hardware,we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor.COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration.The results show that COPPER can perform CSA remarkably well in both speed and energy.