WO2014140697A1 - Procédé de partitionnement et d'attribution de données - Google Patents
Procédé de partitionnement et d'attribution de données Download PDFInfo
- Publication number
- WO2014140697A1 WO2014140697A1 PCT/IB2013/052098 IB2013052098W WO2014140697A1 WO 2014140697 A1 WO2014140697 A1 WO 2014140697A1 IB 2013052098 W IB2013052098 W IB 2013052098W WO 2014140697 A1 WO2014140697 A1 WO 2014140697A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- assignment
- decision function
- data partitioning
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Definitions
- the present invention relates to a method for data partitioning and assignment for optimal multi-processor processing where processors can be stand-alone parallel processors or cores within such stand-alone processors. Background of the invention
- data in a conventional hard disk drive consist of ones and zeros. These ones and zeros are processed so that the type of file they form can be understood. Let's say one group of ones and zeros form a music file. In order to convert these ones and zeros to sound, again, the data has to be processed.
- Data processing is used in a very broad list of fields. Security systems use data processing for analyzing the audiovisual feedbacks from the devices lying around in order to detect any security breaches. Data processing is also used in analyzing connection parameters in phone systems. While data processing can be realized by a single processing unit, harder tasks require more processing units in order to complete the required work in a shorter time.
- state-of-the-art processing platforms are increasingly multi-core and heterogeneous due to the power efficiency and computation power advantages provided by such architectures.
- state-of-the-art heterogeneous multi-core processing platforms typically include GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) along with other multi-core processors such as ARM (Advances RISC Machines) processors, DSPs (Digital Signal Processors), CPUs (Central Processing Units) and CGRAs (Coarse Grain Reconfigurable Arrays).
- GPUs Graphics Processing Units
- FPGAs Field Programmable Gate Arrays
- ARM Advanced RISC Machines
- DSPs Digital Signal Processors
- CPUs Central Processing Units
- CGRAs Coarse Grain Reconfigurable Arrays
- OpenCL Open Computing Language, currently developed by the Khronos Group
- CUDA Computer Unified Device Architecture, owned and developed by NVIDIA Corporation
- OpenCL supports a wide array of multi-core processors including ARM, CPU, GPU, DSP and FPGA, whereas CUDA is dedicated to NVIDIA GPUs.
- massive multi-threading APIs engineers can program FPGAs and GPUs to execute massively multi-threaded (64 - 1536 threads) programs and gain substantial performance increases over single threaded programs or simple multi- threaded programs with a small number of threads (2 - 16 threads) typically observed for CPUs, ARMs or DSPs.
- Multi-threaded execution (with 2 - 16 threads) of generic algorithms and processing tasks on CPUs, ARM processors and DSPs can provide performance gains that roughly scale with the number of available cores.
- massive multithreading 64 - 1536 threads
- GPUs and FPGAs currently present on state-of-the-art heterogeneous multi-core processing platforms are designed for massive multi-threaded execution, and for the algorithms that allow a high level of parallelism these processors are the best choice of execution.
- the international patent application numbered WO2011079942 discloses a method for compiling code for a multi-core processor, comprising: detecting and optimizing a loop, partitioning the loop into partitions executable and mappable on physical hardware with optimal instruction level parallelism, optimizing the loop iterations and/or loop counter for ideal mapping on hardware, chaining the loop partitions generating a list representing the execution sequence of the partitions.
- WO2012142069 discloses elastic computing tools which analyze available devices and resources (e.g., cores, GPUs, FPGAs, etc.) and current run-time parameters, and then transparently select from numerous pre-analyzed implementation possibilities to optimize for performance, power, energy, size, or any combination of these goals.
- available devices and resources e.g., cores, GPUs, FPGAs, etc.
- the object of the invention is to provide a method for data partitioning and assignment for optimal multi-processor processing where processors can be standalone parallel processors or cores within such stand-alone processors
- the object of the invention is to provide a method for data partitioning and assignment where the partitions of data are assigned to processors according to constraints.
- Another object of the invention is to provide a method for data partitioning and assignment, which can be utilized for heterogeneous systems.
- Figure 1 is the flowchart of the method for data partitioning and assignment
- a method for data partitioning and assignment (100) comprises the steps of;
- constraints of the current ecosystem of processing units are analyzed (101). These constraints and capabilities include but are not limited to power consumption, thread branching capabilities, workloads per thread, memory access patterns, thread synchronization requirements, processor functionality checks (to see a specific processor is still functional), etc.
- expectations from the process is also analyzed (102). The expectations include but are not limited to maximum power consumption, maximum process time, the quality of the outcomes etc.
- the capabilities, constraints and expectations are fed to a decision function (103).
- the decision function is a function, which uses operations such as comparisons, mathematical operations or any other operations, by itself or in combination with other operations, in order to calculate the optimum set of processing units, taking the expectations and constraints of every different application into consideration.
- the decision function yields a set of processing units for the data to be fed.
- the data is then partitioned in order to meet the set of processing units yielded by the decision function (104).
- the partitions of data are distributed to different processing units in order to be processed (105).
- the processing units then process the partitions and yield outcomes (106). As the outcomes of the processing units represent the partitioned data that is fed to the processing units, the outcomes are unified in order to form the non-partitioned data (107).
- the decision function has learning capability. It keeps an internal set of parameters, which pertain to linear or non-linear decision and/or update function(s), which it updates upon every batch of input values.
- This decision function is named decision function with learning.
- the constraints of the process remain constant throughout the process.
- running the decision function once and processing the data according to the outcome of this decision function is sufficient.
- This decision function is named off-line decision function.
- the constraints of the process are subject to change throughout the process. These changes include, but are not limited to remaining battery power, number of different types of processing units, changing time constraints etc. For example, if the system on which the method for data partitioning and assignment (100) runs is a standalone system, the remaining battery power can be taken into consideration. If the number of processing units can change in a system during the execution of the method for data partitioning and assignment (100) (because of burnt processors etc.), the decision function can recalculate the set of processing units for the data to be fed. In order to achieve this, the method for data partitioning and assignment (100) is executed occasionally. This decision function is named on-line decision function.
- the input data may be a color image in planar format consisting of red, green and blue color planes. Typically green color plane is sampled twice more densely compared to red and blue color planes and contains more spatial detail.
- green color plane is typically targeted for heavier spatial processing.
- each of the color planes can be processed on different processing unit.
- green plane can be processed on a GPU while red and blue planes can be processed on FPGAs or vice versa.
- the input data may be a color image in planar format consisting of Luminance (Y), and chrominance (Cb, Cr) planes. Due to the nature of the YCbCr color space, most of the spatial details are present in the Y plane. Furthermore, based on this observation Cb and Cr planes are down-sampled in the spatial domain to save space. As a result, Y color plane is typically targeted for heavier spatial processing. In such a scenario, the luminance plane can be processed on a GPU and chrominance (Cb, Cr) planes can be processed on FPGAs or vice versa.
- Y Luminance
- Cb, Cr chrominance
- the input data may be consisting of separate frequency bands.
- each of the frequency bands can be processed on different processor.
- higher frequency bands can be processed on GPUs and lower frequency bands can be processed on FPGAs or vice versa.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un procédé de partitionnement et d'attribution de données (100) permettant d'assurer un traitement multiprocesseur optimal pouvant être utilisé dans des systèmes hétérogènes, où les processeurs peuvent être des processeurs parallèles autonomes ou des cœurs dans lesdits processeurs autonomes, et où les partitions de données sont attribuées en fonction de contraintes.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2013/052098 WO2014140697A1 (fr) | 2013-03-15 | 2013-03-15 | Procédé de partitionnement et d'attribution de données |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2013/052098 WO2014140697A1 (fr) | 2013-03-15 | 2013-03-15 | Procédé de partitionnement et d'attribution de données |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2014140697A1 true WO2014140697A1 (fr) | 2014-09-18 |
Family
ID=48289563
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2013/052098 Ceased WO2014140697A1 (fr) | 2013-03-15 | 2013-03-15 | Procédé de partitionnement et d'attribution de données |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2014140697A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107038059A (zh) * | 2016-02-03 | 2017-08-11 | 阿里巴巴集团控股有限公司 | 虚拟机部署方法及装置 |
| CN113688137A (zh) * | 2021-08-27 | 2021-11-23 | 中国电信股份有限公司 | 数据划分方法、装置、存储介质及电子设备 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2011079942A1 (fr) | 2009-12-28 | 2011-07-07 | Hyperion Core, Inc. | Optimisation de boucles et de sections de circulation de données |
| WO2012142069A2 (fr) | 2011-04-11 | 2012-10-18 | University Of Florida Research Foundation, Inc. | Informatique élastique |
-
2013
- 2013-03-15 WO PCT/IB2013/052098 patent/WO2014140697A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2011079942A1 (fr) | 2009-12-28 | 2011-07-07 | Hyperion Core, Inc. | Optimisation de boucles et de sections de circulation de données |
| WO2012142069A2 (fr) | 2011-04-11 | 2012-10-18 | University Of Florida Research Foundation, Inc. | Informatique élastique |
Non-Patent Citations (3)
| Title |
|---|
| BINGSHENG HE ET AL: "Mars", PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT '08, 29 October 2008 (2008-10-29), New York, New York, USA, pages 260, XP055111303, ISBN: 978-1-60-558282-5, DOI: 10.1145/1454115.1454152 * |
| COLBY RANGER ET AL: "Evaluating MapReduce for Multi-core and Multiprocessor Systems", 2007 IEEE 13TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE, 1 January 2007 (2007-01-01), pages 13 - 24, XP055111222, ISBN: 978-1-42-440804-7, DOI: 10.1109/HPCA.2007.346181 * |
| YI SHAN ET AL: "FPMR", PROCEEDINGS OF THE 18TH ANNUAL ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, FPGA '10, 21 February 2010 (2010-02-21), New York, New York, USA, pages 93, XP055111216, ISBN: 978-1-60-558911-4, DOI: 10.1145/1723112.1723129 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107038059A (zh) * | 2016-02-03 | 2017-08-11 | 阿里巴巴集团控股有限公司 | 虚拟机部署方法及装置 |
| EP3413193A4 (fr) * | 2016-02-03 | 2019-02-13 | Alibaba Group Holding Limited | Procédé et appareil de déploiement de machine virtuelle |
| US10740194B2 (en) | 2016-02-03 | 2020-08-11 | Alibaba Group Holding Limited | Virtual machine deployment method and apparatus |
| CN113688137A (zh) * | 2021-08-27 | 2021-11-23 | 中国电信股份有限公司 | 数据划分方法、装置、存储介质及电子设备 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Coelho et al. | A GPU deep learning metaheuristic based model for time series forecasting | |
| Hou et al. | Auto-tuning strategies for parallelizing sparse matrix-vector (spmv) multiplication on multi-and many-core processors | |
| Teodoro et al. | High-throughput analysis of large microscopy image datasets on CPU-GPU cluster platforms | |
| US8799858B2 (en) | Efficient execution of human machine interface applications in a heterogeneous multiprocessor environment | |
| Mushtaq et al. | Sparkga: A spark framework for cost effective, fast and accurate dna analysis at scale | |
| Kurzak et al. | LU factorization with partial pivoting for a multicore system with accelerators | |
| CN112434785B (zh) | 一种面向超级计算机的分布式并行深度神经网络性能评测方法 | |
| Gouveia et al. | Speeding up Rao-Blackwellized particle filter SLAM with a multithreaded architecture | |
| Varghese et al. | Acceleration-as-a-service: Exploiting virtualised GPUs for a financial application | |
| Ocaña et al. | Optimizing phylogenetic analysis using scihmm cloud-based scientific workflow | |
| Prajapati et al. | Analytical study of parallel and distributed image processing | |
| WO2014140697A1 (fr) | Procédé de partitionnement et d'attribution de données | |
| Ren et al. | Exploration of alternative GPU implementations of the pair-HMMs forward algorithm | |
| Andrade et al. | Efficient execution of microscopy image analysis on CPU, GPU, and MIC equipped cluster systems | |
| CN102833200A (zh) | 基于对称多处理器的dpd自适应方法及装置 | |
| Indragandhi et al. | An Application based Efficient Thread Level Parallelism Scheme on Heterogeneous Multicore Embedded System for Real Time Image Processing. | |
| Singh et al. | Accelerating smith-waterman on heterogeneous cpu-gpu systems | |
| Hartog et al. | Configuring a mapreduce framework for performance-heterogeneous clusters | |
| Wang et al. | An efficient architecture for floating-point eigenvalue decomposition | |
| Borhade et al. | Image Classification using Parallel CPU and GPU Computing | |
| Liu et al. | Iris matching algorithm on many-core platforms | |
| Milluzzi et al. | A multi-tiered optimization framework for heterogeneous computing | |
| Fan et al. | Optimizing image sharpening algorithm on GPU | |
| Prashanth et al. | Evaluating Deep Neural Network Performance on Edge Accelerators: A Roofline Model Adopted Benchmarking Approach | |
| Moolchandani et al. | Performance prediction for multi-application concurrency on GPUs |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13720595 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 13720595 Country of ref document: EP Kind code of ref document: A1 |