WO2014140697A1 - Procédé de partitionnement et d'attribution de données - Google Patents

Procédé de partitionnement et d'attribution de données Download PDF

Info

Publication number
WO2014140697A1
WO2014140697A1 PCT/IB2013/052098 IB2013052098W WO2014140697A1 WO 2014140697 A1 WO2014140697 A1 WO 2014140697A1 IB 2013052098 W IB2013052098 W IB 2013052098W WO 2014140697 A1 WO2014140697 A1 WO 2014140697A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
assignment
decision function
data partitioning
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IB2013/052098
Other languages
English (en)
Inventor
Toygar Akgun
Serhat ÖZDEMİR
Süleyman Alpay ASLANGÜL
Mehmet Fatih KARAGÖZ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aselsan Elektronik Sanayi ve Ticaret AS
Original Assignee
Aselsan Elektronik Sanayi ve Ticaret AS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aselsan Elektronik Sanayi ve Ticaret AS filed Critical Aselsan Elektronik Sanayi ve Ticaret AS
Priority to PCT/IB2013/052098 priority Critical patent/WO2014140697A1/fr
Publication of WO2014140697A1 publication Critical patent/WO2014140697A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • the present invention relates to a method for data partitioning and assignment for optimal multi-processor processing where processors can be stand-alone parallel processors or cores within such stand-alone processors. Background of the invention
  • data in a conventional hard disk drive consist of ones and zeros. These ones and zeros are processed so that the type of file they form can be understood. Let's say one group of ones and zeros form a music file. In order to convert these ones and zeros to sound, again, the data has to be processed.
  • Data processing is used in a very broad list of fields. Security systems use data processing for analyzing the audiovisual feedbacks from the devices lying around in order to detect any security breaches. Data processing is also used in analyzing connection parameters in phone systems. While data processing can be realized by a single processing unit, harder tasks require more processing units in order to complete the required work in a shorter time.
  • state-of-the-art processing platforms are increasingly multi-core and heterogeneous due to the power efficiency and computation power advantages provided by such architectures.
  • state-of-the-art heterogeneous multi-core processing platforms typically include GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) along with other multi-core processors such as ARM (Advances RISC Machines) processors, DSPs (Digital Signal Processors), CPUs (Central Processing Units) and CGRAs (Coarse Grain Reconfigurable Arrays).
  • GPUs Graphics Processing Units
  • FPGAs Field Programmable Gate Arrays
  • ARM Advanced RISC Machines
  • DSPs Digital Signal Processors
  • CPUs Central Processing Units
  • CGRAs Coarse Grain Reconfigurable Arrays
  • OpenCL Open Computing Language, currently developed by the Khronos Group
  • CUDA Computer Unified Device Architecture, owned and developed by NVIDIA Corporation
  • OpenCL supports a wide array of multi-core processors including ARM, CPU, GPU, DSP and FPGA, whereas CUDA is dedicated to NVIDIA GPUs.
  • massive multi-threading APIs engineers can program FPGAs and GPUs to execute massively multi-threaded (64 - 1536 threads) programs and gain substantial performance increases over single threaded programs or simple multi- threaded programs with a small number of threads (2 - 16 threads) typically observed for CPUs, ARMs or DSPs.
  • Multi-threaded execution (with 2 - 16 threads) of generic algorithms and processing tasks on CPUs, ARM processors and DSPs can provide performance gains that roughly scale with the number of available cores.
  • massive multithreading 64 - 1536 threads
  • GPUs and FPGAs currently present on state-of-the-art heterogeneous multi-core processing platforms are designed for massive multi-threaded execution, and for the algorithms that allow a high level of parallelism these processors are the best choice of execution.
  • the international patent application numbered WO2011079942 discloses a method for compiling code for a multi-core processor, comprising: detecting and optimizing a loop, partitioning the loop into partitions executable and mappable on physical hardware with optimal instruction level parallelism, optimizing the loop iterations and/or loop counter for ideal mapping on hardware, chaining the loop partitions generating a list representing the execution sequence of the partitions.
  • WO2012142069 discloses elastic computing tools which analyze available devices and resources (e.g., cores, GPUs, FPGAs, etc.) and current run-time parameters, and then transparently select from numerous pre-analyzed implementation possibilities to optimize for performance, power, energy, size, or any combination of these goals.
  • available devices and resources e.g., cores, GPUs, FPGAs, etc.
  • the object of the invention is to provide a method for data partitioning and assignment for optimal multi-processor processing where processors can be standalone parallel processors or cores within such stand-alone processors
  • the object of the invention is to provide a method for data partitioning and assignment where the partitions of data are assigned to processors according to constraints.
  • Another object of the invention is to provide a method for data partitioning and assignment, which can be utilized for heterogeneous systems.
  • Figure 1 is the flowchart of the method for data partitioning and assignment
  • a method for data partitioning and assignment (100) comprises the steps of;
  • constraints of the current ecosystem of processing units are analyzed (101). These constraints and capabilities include but are not limited to power consumption, thread branching capabilities, workloads per thread, memory access patterns, thread synchronization requirements, processor functionality checks (to see a specific processor is still functional), etc.
  • expectations from the process is also analyzed (102). The expectations include but are not limited to maximum power consumption, maximum process time, the quality of the outcomes etc.
  • the capabilities, constraints and expectations are fed to a decision function (103).
  • the decision function is a function, which uses operations such as comparisons, mathematical operations or any other operations, by itself or in combination with other operations, in order to calculate the optimum set of processing units, taking the expectations and constraints of every different application into consideration.
  • the decision function yields a set of processing units for the data to be fed.
  • the data is then partitioned in order to meet the set of processing units yielded by the decision function (104).
  • the partitions of data are distributed to different processing units in order to be processed (105).
  • the processing units then process the partitions and yield outcomes (106). As the outcomes of the processing units represent the partitioned data that is fed to the processing units, the outcomes are unified in order to form the non-partitioned data (107).
  • the decision function has learning capability. It keeps an internal set of parameters, which pertain to linear or non-linear decision and/or update function(s), which it updates upon every batch of input values.
  • This decision function is named decision function with learning.
  • the constraints of the process remain constant throughout the process.
  • running the decision function once and processing the data according to the outcome of this decision function is sufficient.
  • This decision function is named off-line decision function.
  • the constraints of the process are subject to change throughout the process. These changes include, but are not limited to remaining battery power, number of different types of processing units, changing time constraints etc. For example, if the system on which the method for data partitioning and assignment (100) runs is a standalone system, the remaining battery power can be taken into consideration. If the number of processing units can change in a system during the execution of the method for data partitioning and assignment (100) (because of burnt processors etc.), the decision function can recalculate the set of processing units for the data to be fed. In order to achieve this, the method for data partitioning and assignment (100) is executed occasionally. This decision function is named on-line decision function.
  • the input data may be a color image in planar format consisting of red, green and blue color planes. Typically green color plane is sampled twice more densely compared to red and blue color planes and contains more spatial detail.
  • green color plane is typically targeted for heavier spatial processing.
  • each of the color planes can be processed on different processing unit.
  • green plane can be processed on a GPU while red and blue planes can be processed on FPGAs or vice versa.
  • the input data may be a color image in planar format consisting of Luminance (Y), and chrominance (Cb, Cr) planes. Due to the nature of the YCbCr color space, most of the spatial details are present in the Y plane. Furthermore, based on this observation Cb and Cr planes are down-sampled in the spatial domain to save space. As a result, Y color plane is typically targeted for heavier spatial processing. In such a scenario, the luminance plane can be processed on a GPU and chrominance (Cb, Cr) planes can be processed on FPGAs or vice versa.
  • Y Luminance
  • Cb, Cr chrominance
  • the input data may be consisting of separate frequency bands.
  • each of the frequency bands can be processed on different processor.
  • higher frequency bands can be processed on GPUs and lower frequency bands can be processed on FPGAs or vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé de partitionnement et d'attribution de données (100) permettant d'assurer un traitement multiprocesseur optimal pouvant être utilisé dans des systèmes hétérogènes, où les processeurs peuvent être des processeurs parallèles autonomes ou des cœurs dans lesdits processeurs autonomes, et où les partitions de données sont attribuées en fonction de contraintes.
PCT/IB2013/052098 2013-03-15 2013-03-15 Procédé de partitionnement et d'attribution de données Ceased WO2014140697A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2013/052098 WO2014140697A1 (fr) 2013-03-15 2013-03-15 Procédé de partitionnement et d'attribution de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2013/052098 WO2014140697A1 (fr) 2013-03-15 2013-03-15 Procédé de partitionnement et d'attribution de données

Publications (1)

Publication Number Publication Date
WO2014140697A1 true WO2014140697A1 (fr) 2014-09-18

Family

ID=48289563

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/052098 Ceased WO2014140697A1 (fr) 2013-03-15 2013-03-15 Procédé de partitionnement et d'attribution de données

Country Status (1)

Country Link
WO (1) WO2014140697A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038059A (zh) * 2016-02-03 2017-08-11 阿里巴巴集团控股有限公司 虚拟机部署方法及装置
CN113688137A (zh) * 2021-08-27 2021-11-23 中国电信股份有限公司 数据划分方法、装置、存储介质及电子设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011079942A1 (fr) 2009-12-28 2011-07-07 Hyperion Core, Inc. Optimisation de boucles et de sections de circulation de données
WO2012142069A2 (fr) 2011-04-11 2012-10-18 University Of Florida Research Foundation, Inc. Informatique élastique

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011079942A1 (fr) 2009-12-28 2011-07-07 Hyperion Core, Inc. Optimisation de boucles et de sections de circulation de données
WO2012142069A2 (fr) 2011-04-11 2012-10-18 University Of Florida Research Foundation, Inc. Informatique élastique

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BINGSHENG HE ET AL: "Mars", PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT '08, 29 October 2008 (2008-10-29), New York, New York, USA, pages 260, XP055111303, ISBN: 978-1-60-558282-5, DOI: 10.1145/1454115.1454152 *
COLBY RANGER ET AL: "Evaluating MapReduce for Multi-core and Multiprocessor Systems", 2007 IEEE 13TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE, 1 January 2007 (2007-01-01), pages 13 - 24, XP055111222, ISBN: 978-1-42-440804-7, DOI: 10.1109/HPCA.2007.346181 *
YI SHAN ET AL: "FPMR", PROCEEDINGS OF THE 18TH ANNUAL ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, FPGA '10, 21 February 2010 (2010-02-21), New York, New York, USA, pages 93, XP055111216, ISBN: 978-1-60-558911-4, DOI: 10.1145/1723112.1723129 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038059A (zh) * 2016-02-03 2017-08-11 阿里巴巴集团控股有限公司 虚拟机部署方法及装置
EP3413193A4 (fr) * 2016-02-03 2019-02-13 Alibaba Group Holding Limited Procédé et appareil de déploiement de machine virtuelle
US10740194B2 (en) 2016-02-03 2020-08-11 Alibaba Group Holding Limited Virtual machine deployment method and apparatus
CN113688137A (zh) * 2021-08-27 2021-11-23 中国电信股份有限公司 数据划分方法、装置、存储介质及电子设备

Similar Documents

Publication Publication Date Title
Coelho et al. A GPU deep learning metaheuristic based model for time series forecasting
Hou et al. Auto-tuning strategies for parallelizing sparse matrix-vector (spmv) multiplication on multi-and many-core processors
Teodoro et al. High-throughput analysis of large microscopy image datasets on CPU-GPU cluster platforms
US8799858B2 (en) Efficient execution of human machine interface applications in a heterogeneous multiprocessor environment
Mushtaq et al. Sparkga: A spark framework for cost effective, fast and accurate dna analysis at scale
Kurzak et al. LU factorization with partial pivoting for a multicore system with accelerators
CN112434785B (zh) 一种面向超级计算机的分布式并行深度神经网络性能评测方法
Gouveia et al. Speeding up Rao-Blackwellized particle filter SLAM with a multithreaded architecture
Varghese et al. Acceleration-as-a-service: Exploiting virtualised GPUs for a financial application
Ocaña et al. Optimizing phylogenetic analysis using scihmm cloud-based scientific workflow
Prajapati et al. Analytical study of parallel and distributed image processing
WO2014140697A1 (fr) Procédé de partitionnement et d'attribution de données
Ren et al. Exploration of alternative GPU implementations of the pair-HMMs forward algorithm
Andrade et al. Efficient execution of microscopy image analysis on CPU, GPU, and MIC equipped cluster systems
CN102833200A (zh) 基于对称多处理器的dpd自适应方法及装置
Indragandhi et al. An Application based Efficient Thread Level Parallelism Scheme on Heterogeneous Multicore Embedded System for Real Time Image Processing.
Singh et al. Accelerating smith-waterman on heterogeneous cpu-gpu systems
Hartog et al. Configuring a mapreduce framework for performance-heterogeneous clusters
Wang et al. An efficient architecture for floating-point eigenvalue decomposition
Borhade et al. Image Classification using Parallel CPU and GPU Computing
Liu et al. Iris matching algorithm on many-core platforms
Milluzzi et al. A multi-tiered optimization framework for heterogeneous computing
Fan et al. Optimizing image sharpening algorithm on GPU
Prashanth et al. Evaluating Deep Neural Network Performance on Edge Accelerators: A Roofline Model Adopted Benchmarking Approach
Moolchandani et al. Performance prediction for multi-application concurrency on GPUs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13720595

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13720595

Country of ref document: EP

Kind code of ref document: A1