The emergence of software-defined vehicles(SDVs),combined with autonomous driving technologies,has en-abled a new era of vehicle computing(VC),where vehicles serve as a mobile computing platform.However,the interdisci...The emergence of software-defined vehicles(SDVs),combined with autonomous driving technologies,has en-abled a new era of vehicle computing(VC),where vehicles serve as a mobile computing platform.However,the interdisci-plinary complexities of automotive systems and diverse technological requirements make developing applications for au-tonomous vehicles challenging.To simplify the development of applications running on SDVs,we propose a comprehen-sive suite of vehicle programming interfaces(VPIs).In this study,we rigorously explore the nuanced requirements for ap-plication development within the realm of VC,centering our analysis on the architectural intricacies of the Open Vehicu-lar Data Analytics Platform(OpenVDAP).We then detail our creation of a comprehensive suite of standardized VPIs,spanning five critical categories:Hardware,Data,Computation,Service,and Management,to address these evolving pro-gramming requirements.To validate the design of VPIs,we conduct experiments using the indoor autonomous vehicle,Ze-bra,and develop the OpenVDAP prototype system.By comparing it with the industry-influential AUTOSAR interface,our VPIs demonstrate significant enhancements in programming efficiency,marking an important advancement in the field of SDV application development.We also show a case study and evaluate its performance.Our work highlights that VPIs significantly enhance the efficiency of developing applications on VC.They meet both current and future technologi-cal demands and propel the software-defined automotive industry toward a more interconnected and intelligent future.展开更多
Edge computing enabled Intelligent Road Network(EC-IRN)provides powerful and convenient computing services for vehicles and roadside sensing devices.The continuous emergence of transportation applications has caused a...Edge computing enabled Intelligent Road Network(EC-IRN)provides powerful and convenient computing services for vehicles and roadside sensing devices.The continuous emergence of transportation applications has caused a huge burden on roadside units(RSUs)equipped with edge servers in the Intelligent Road Network(IRN).Collaborative task scheduling among RSUs is an effective way to solve this problem.However,it is challenging to achieve collaborative scheduling among different RSUs in a completely decentralized environment.In this paper,we first model the interactions involved in task scheduling among distributed RSUs as a Markov game.Given that multi-agent deep reinforcement learning(MADRL)is a promising approach for the Markov game in decision optimization,we propose a collaborative task scheduling algorithm based on MADRL for EC-IRN,named CA-DTS,aiming to minimize the long-term average delay of tasks.To reduce the training costs caused by trial-and-error,CA-DTS specially designs a reward function and utilizes the distributed deployment and collective training architecture of counterfactual multi-agent policy gradient(COMA).To improve the stability of performance in large-scale environments,CA-DTS takes advantage of the action semantics network(ASN)to facilitate cooperation among multiple RSUs.The evaluation results of both the testbed and simulation demonstrate the effectiveness of our proposed algorithm.Compared with the baselines,CA-DTS can achieve convergence about 35%faster,and obtain average task delay that is lower by approximately 9.4%,9.8%,and 6.7%,in different scenarios with varying numbers of RSUs,service types,and task arrival rates,respectively.展开更多
Distributed Shared Memory (DSM) systems have gained popularacceptance by combining the scalability and low cost of distributed system with theease of use of single address space. Many new hardware DSM and software DSM...Distributed Shared Memory (DSM) systems have gained popularacceptance by combining the scalability and low cost of distributed system with theease of use of single address space. Many new hardware DSM and software DSMsystems have been proposed in recent years. In general, benchmarking is widely usedto demonstrate the performance advantages of new systems. However, the commonmethod used to summarize the measured results is the arithmetic mean of ratios,which is incorrect in some cases. Furthermore, many published papers list a lot ofdata only, and do not summarize them effectively, which confuse users greatly. Infact, many users want to get a single number as conclusion, which is not providedin old summarizing techniques. Therefore, a new data-summarizing technique basedon confidence interval is proposed in this paper. The new technique includes twodata-summarizing methods: (1) paired confidence interval method; (2) unpairedconfidence interval method. With this new technique, it is concluded that at someconfidence one system is better than others. Four examples are shown to demonstratethe advantages of this new technique. Furthermore, with the help of confidence level,it is proposed to standardize the benchmarks used for evaluating DSM systems sothat a convincing result can be got. In addition, the new summarizing technique fitsnot only for evaluating DSM systems, but also for evaluating other systems, such asmemory system and communication systems.展开更多
The performance gap between software DSM systems and message passing platforms prevents the prevalence of software DSM system greatly, though great efforts have been delivered in this area in the past decade. In this ...The performance gap between software DSM systems and message passing platforms prevents the prevalence of software DSM system greatly, though great efforts have been delivered in this area in the past decade. In this paper, we take the challenge to find where we should focus our efforts in the future design. The components of total system overhead of software DSM systems are analyzed in detail firstly. Based on a state-of-the-art software DSM system JIAJIA, we measure these components on Dawning parallel system and draw five important conclusions which are different from some traditional viewpoints. (1) The performance of the JIAJIA software DSM system is acceptable. For four of eight applications, the parallel ef ficiency achieved by JIAJIA is about 80%, while for two others, 70% efficiency can be obtained. (2) 40.94% interrupt service time is overlapped with waiting time. (3) Encoding and decoding diffs do not cost much time (<1%), so using hardware sup port to encode/decode diffs and send/receive messages is not worthwhile. (4) Great endeavours should be put to reduce data miss penalty and optimize synchronization operations, which occupy 11.75% and 13.65% of total execution time respectively.(5) Communication hardware overhead occupies 66.76% of the whole communication time in the experimental environment, and communication software overhead does not take much time as expected. Moreover, by studying the effect of CPU speed to system overhead, we find that the common speedup formula for distributed memory systems does not work under software DSM systems. Therefore, we design a new speedup formula special to software DSM systems, and point out that when the CPU speed increases the speedup can be increased too even if the network speed is fixed, which is impossible in message passing systems. Finally, we argue that JIAJIA system has desired scalability.展开更多
Workflows are prevailing in scientific computation. sources, benefiting workflows but also challenging the traditional Multicluster environments emerge and provide more reworkftow scheduling heuristics. In a multiclu...Workflows are prevailing in scientific computation. sources, benefiting workflows but also challenging the traditional Multicluster environments emerge and provide more reworkftow scheduling heuristics. In a multicluster environment, each cluster has its own independent workload management system. Jobs are queued up before getting executed, they experience different resource availability and wait time if dispatched to different clusters. However, existing scheduling heuristics neither consider the queue wait time nor balance the performance gain with data movement cost. The proposed algorithm leverages the advancement of queue wait time prediction techniques and empirically studies if the tunability of resource requirements helps scheduling. The extensive experiment with both real workload traces and test bench shows that the queue wait time aware algorithm improves workflow performance by 3 to 10 times in terms of average makespan with relatively very low cost of data movement.展开更多
Previous descriptions of memory consistency models in shared-memory multiprocessor systems are mainly expressed as constraints on the memory access event ordering and hence are hardwae-centric. This paper presents a ...Previous descriptions of memory consistency models in shared-memory multiprocessor systems are mainly expressed as constraints on the memory access event ordering and hence are hardwae-centric. This paper presents a framework of memory consistency models which describes the memory consistency model on the behavior level.Based on the understanding that the behavior of an execution is determined by the execution order of confiicting accesses, a memory consistency model is defined as an interprocessor synchronization mechanism which orders the execution of operations from different processors. Synchronization order of an execution under certain consistency model is also defined. The synchronization order, together with the program order,determines the behavior of an execution.This paper also presents criteria for correct program and correct implementation of consistency models. Regarding an implementation of a consistency model as certain memory event ordering constraints, this paper provides a method to prove the correctness of consistency model implementations, and the correctness of the lock-based cache coherence protocol is proved with this method.展开更多
Vehicular networks have attracted extensive attention in recent years for their promises in improving safety and enabling other value-added services. Most previous work focuses on designing the media access and physic...Vehicular networks have attracted extensive attention in recent years for their promises in improving safety and enabling other value-added services. Most previous work focuses on designing the media access and physical layer protocols. Privacy issues in vehicular systems have not been well addressed. We argue that privacy is a user-specific concept, and a good privacy protection mechanism should allow users to select the levels of privacy they wish to have. To address this requirement, we propose an adaptive anonymous authentication mechanism that can trade off the anonymity level with computational and communication overheads (resource usage). This mechanism, to our knowledge, is the first effort on adaptive anonymous authentication. The resources used by our protocol are few. A high traffic volume of 2000 vehicles per hour consumes about 60kbps bandwidth, which is less than one percent of the bandwidth of DSRC (Dedicated Short Range Communications). By using adaptive anonymity, the protocol response time can further be improved 2-4 times with less than 20% bandwidth overheads.展开更多
基金Bao-Fu Wu,Jian Wan,and Ji-Lin Zhang were supported by the National Natural Science Foundation of China under Grant No.62072146the Key Research and Development Program of Zhejiang Province of China under Grant Nos.2023C03194,2021C03187,and 2023C01044the National Natural Science Foundation of China(Youth Fund)under Grant No.62302133.
文摘The emergence of software-defined vehicles(SDVs),combined with autonomous driving technologies,has en-abled a new era of vehicle computing(VC),where vehicles serve as a mobile computing platform.However,the interdisci-plinary complexities of automotive systems and diverse technological requirements make developing applications for au-tonomous vehicles challenging.To simplify the development of applications running on SDVs,we propose a comprehen-sive suite of vehicle programming interfaces(VPIs).In this study,we rigorously explore the nuanced requirements for ap-plication development within the realm of VC,centering our analysis on the architectural intricacies of the Open Vehicu-lar Data Analytics Platform(OpenVDAP).We then detail our creation of a comprehensive suite of standardized VPIs,spanning five critical categories:Hardware,Data,Computation,Service,and Management,to address these evolving pro-gramming requirements.To validate the design of VPIs,we conduct experiments using the indoor autonomous vehicle,Ze-bra,and develop the OpenVDAP prototype system.By comparing it with the industry-influential AUTOSAR interface,our VPIs demonstrate significant enhancements in programming efficiency,marking an important advancement in the field of SDV application development.We also show a case study and evaluate its performance.Our work highlights that VPIs significantly enhance the efficiency of developing applications on VC.They meet both current and future technologi-cal demands and propel the software-defined automotive industry toward a more interconnected and intelligent future.
基金supported by the National Natural Science Foundation of China under Grant Nos.62202140,61832005,62072216,62372214,and 62101463the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20220974+2 种基金the Future Network Scientific Research Foundation Project FNSRFP-2021-ZD-7the Natural Science Foundation of Sichuan Province of China under Grant No.2022NSFSC0863the Sichuan Science and Technology Program under Grant Nos.2023YFH0012 and 2023ZHCG0010.
文摘Edge computing enabled Intelligent Road Network(EC-IRN)provides powerful and convenient computing services for vehicles and roadside sensing devices.The continuous emergence of transportation applications has caused a huge burden on roadside units(RSUs)equipped with edge servers in the Intelligent Road Network(IRN).Collaborative task scheduling among RSUs is an effective way to solve this problem.However,it is challenging to achieve collaborative scheduling among different RSUs in a completely decentralized environment.In this paper,we first model the interactions involved in task scheduling among distributed RSUs as a Markov game.Given that multi-agent deep reinforcement learning(MADRL)is a promising approach for the Markov game in decision optimization,we propose a collaborative task scheduling algorithm based on MADRL for EC-IRN,named CA-DTS,aiming to minimize the long-term average delay of tasks.To reduce the training costs caused by trial-and-error,CA-DTS specially designs a reward function and utilizes the distributed deployment and collective training architecture of counterfactual multi-agent policy gradient(COMA).To improve the stability of performance in large-scale environments,CA-DTS takes advantage of the action semantics network(ASN)to facilitate cooperation among multiple RSUs.The evaluation results of both the testbed and simulation demonstrate the effectiveness of our proposed algorithm.Compared with the baselines,CA-DTS can achieve convergence about 35%faster,and obtain average task delay that is lower by approximately 9.4%,9.8%,and 6.7%,in different scenarios with varying numbers of RSUs,service types,and task arrival rates,respectively.
文摘Distributed Shared Memory (DSM) systems have gained popularacceptance by combining the scalability and low cost of distributed system with theease of use of single address space. Many new hardware DSM and software DSMsystems have been proposed in recent years. In general, benchmarking is widely usedto demonstrate the performance advantages of new systems. However, the commonmethod used to summarize the measured results is the arithmetic mean of ratios,which is incorrect in some cases. Furthermore, many published papers list a lot ofdata only, and do not summarize them effectively, which confuse users greatly. Infact, many users want to get a single number as conclusion, which is not providedin old summarizing techniques. Therefore, a new data-summarizing technique basedon confidence interval is proposed in this paper. The new technique includes twodata-summarizing methods: (1) paired confidence interval method; (2) unpairedconfidence interval method. With this new technique, it is concluded that at someconfidence one system is better than others. Four examples are shown to demonstratethe advantages of this new technique. Furthermore, with the help of confidence level,it is proposed to standardize the benchmarks used for evaluating DSM systems sothat a convincing result can be got. In addition, the new summarizing technique fitsnot only for evaluating DSM systems, but also for evaluating other systems, such asmemory system and communication systems.
文摘The performance gap between software DSM systems and message passing platforms prevents the prevalence of software DSM system greatly, though great efforts have been delivered in this area in the past decade. In this paper, we take the challenge to find where we should focus our efforts in the future design. The components of total system overhead of software DSM systems are analyzed in detail firstly. Based on a state-of-the-art software DSM system JIAJIA, we measure these components on Dawning parallel system and draw five important conclusions which are different from some traditional viewpoints. (1) The performance of the JIAJIA software DSM system is acceptable. For four of eight applications, the parallel ef ficiency achieved by JIAJIA is about 80%, while for two others, 70% efficiency can be obtained. (2) 40.94% interrupt service time is overlapped with waiting time. (3) Encoding and decoding diffs do not cost much time (<1%), so using hardware sup port to encode/decode diffs and send/receive messages is not worthwhile. (4) Great endeavours should be put to reduce data miss penalty and optimize synchronization operations, which occupy 11.75% and 13.65% of total execution time respectively.(5) Communication hardware overhead occupies 66.76% of the whole communication time in the experimental environment, and communication software overhead does not take much time as expected. Moreover, by studying the effect of CPU speed to system overhead, we find that the common speedup formula for distributed memory systems does not work under software DSM systems. Therefore, we design a new speedup formula special to software DSM systems, and point out that when the CPU speed increases the speedup can be increased too even if the network speed is fixed, which is impossible in message passing systems. Finally, we argue that JIAJIA system has desired scalability.
基金supported by the US National Science Foundation CAREER Grant No.CCF-0643521
文摘Workflows are prevailing in scientific computation. sources, benefiting workflows but also challenging the traditional Multicluster environments emerge and provide more reworkftow scheduling heuristics. In a multicluster environment, each cluster has its own independent workload management system. Jobs are queued up before getting executed, they experience different resource availability and wait time if dispatched to different clusters. However, existing scheduling heuristics neither consider the queue wait time nor balance the performance gain with data movement cost. The proposed algorithm leverages the advancement of queue wait time prediction techniques and empirically studies if the tunability of resource requirements helps scheduling. The extensive experiment with both real workload traces and test bench shows that the queue wait time aware algorithm improves workflow performance by 3 to 10 times in terms of average makespan with relatively very low cost of data movement.
文摘Previous descriptions of memory consistency models in shared-memory multiprocessor systems are mainly expressed as constraints on the memory access event ordering and hence are hardwae-centric. This paper presents a framework of memory consistency models which describes the memory consistency model on the behavior level.Based on the understanding that the behavior of an execution is determined by the execution order of confiicting accesses, a memory consistency model is defined as an interprocessor synchronization mechanism which orders the execution of operations from different processors. Synchronization order of an execution under certain consistency model is also defined. The synchronization order, together with the program order,determines the behavior of an execution.This paper also presents criteria for correct program and correct implementation of consistency models. Regarding an implementation of a consistency model as certain memory event ordering constraints, this paper provides a method to prove the correctness of consistency model implementations, and the correctness of the lock-based cache coherence protocol is proved with this method.
文摘Vehicular networks have attracted extensive attention in recent years for their promises in improving safety and enabling other value-added services. Most previous work focuses on designing the media access and physical layer protocols. Privacy issues in vehicular systems have not been well addressed. We argue that privacy is a user-specific concept, and a good privacy protection mechanism should allow users to select the levels of privacy they wish to have. To address this requirement, we propose an adaptive anonymous authentication mechanism that can trade off the anonymity level with computational and communication overheads (resource usage). This mechanism, to our knowledge, is the first effort on adaptive anonymous authentication. The resources used by our protocol are few. A high traffic volume of 2000 vehicles per hour consumes about 60kbps bandwidth, which is less than one percent of the bandwidth of DSRC (Dedicated Short Range Communications). By using adaptive anonymity, the protocol response time can further be improved 2-4 times with less than 20% bandwidth overheads.