WO2007105309A1 - Programme, dispositif et procede de generation de programme en parallele - Google Patents

Programme, dispositif et procede de generation de programme en parallele Download PDF

Info

Publication number
WO2007105309A1
WO2007105309A1 PCT/JP2006/305016 JP2006305016W WO2007105309A1 WO 2007105309 A1 WO2007105309 A1 WO 2007105309A1 JP 2006305016 W JP2006305016 W JP 2006305016W WO 2007105309 A1 WO2007105309 A1 WO 2007105309A1
Authority
WO
WIPO (PCT)
Prior art keywords
program
vertex
thread
vertices
dependence graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2006/305016
Other languages
English (en)
Japanese (ja)
Inventor
Makiko Ito
Hideo Miyake
Atsuhiro Suga
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP2008504960A priority Critical patent/JP5083204B2/ja
Priority to PCT/JP2006/305016 priority patent/WO2007105309A1/fr
Publication of WO2007105309A1 publication Critical patent/WO2007105309A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection

Definitions

  • the present invention generally relates to a program generation method, apparatus, and program, and more particularly to a parallelized program generation method, apparatus, and program.
  • Patent Document 1 data dependency in a loop is analyzed, and an array is divided.
  • the loop processing is executed by a plurality of processors. This method is effective when there are many regular loops such as numerical calculations.
  • Patent Document 2 shows a method of replacing speculative thread execution by focusing on branching in a sequential program. This method parallelizes the program based on the flow of control. Therefore, it cannot be said that the potential parallelism of the program has been sufficiently extracted.
  • the cost of the callback at the time of prediction failure is large, so this method is not suitable for an application with a low branch prediction hit rate.
  • Patent Document 1 Japanese Patent No. 3028821
  • Patent Document 2 Japanese Patent No. 3641997
  • Patent Document 2 Jeanne Ferrante, Karl J. Ottenstein, Joe D. Warren, i, he Program D ependence Graph and Its Use in Optimizatio, ACM Transactions on Programming
  • Non-Patent Document 3 Susan Horwitz, Jan Prins, Thomas Reps, "On the adequacy of progra m dependence graphs for representing programs, Proceedings of the 15th Annual A
  • the present invention is directed to a large-scale software, a non-speculative multi-thread program that operates effectively on a multiprocessor by parallelizing sequential programs. It is an object to provide a method, an apparatus, and a program for generating
  • the parallelized program generation program has a sequential program as an input, has each sentence constituting the sequential program as a vertex, and a program dependence having a relation between the sentences as a side between the vertices.
  • a graph is generated, and a degenerate program dependency graph in which the number of vertices is reduced by fusing the vertices of the program dependency graph is generated, and a thread program corresponding to each vertex of the degenerate program dependency graph is generated.
  • Generate Thread ' ⁇ It is characterized by including code that causes the computer to execute each step of generating a thread control program that controls the start and synchronization of the program.
  • a parallelized program generation device includes a memory that stores a sequential program and a parallelized program generation program, and the sequential program stored in the memory by executing the parallelized program generation program stored in the memory.
  • An arithmetic processing unit that generates a parallelized program from the program, and the arithmetic processing unit executes the parallelized program generating program, thereby having each sentence constituting the sequential program as a vertex and a sentence and a sentence Generating a program dependence graph having a relationship between the vertices as edges between the vertices, and generating a degenerate program dependence graph in which the number of vertices is reduced by merging the vertices of the program dependence graph, Thread 'corresponding to each vertex of the degenerate program dependence graph A thread control program for controlling program activation and synchronization is generated.
  • the parallelized program generation method has a sequential program as an input, each sentence constituting the sequential program as a vertex, and a program dependence graph having a relation between the sentence and the sentence as an edge between the vertices. And generating a degenerate program dependency graph in which the number of vertices is reduced by fusing the vertices of the program dependency graph, and a thread program corresponding to each of the vertices of the degenerate program dependency graph And a step of generating a thread control program for controlling the activation and synchronization of the thread program.
  • a parallelized program is generated based on a program dependency graph that is a graph indicating a control dependency relationship rather than a control flow graph.
  • the parallelism of programs exceeding) can be extracted. Also, by reducing the scale of the graph by reducing the program dependence graph, it becomes possible to improve the efficiency and optimization of the subsequent parallel program generation process, and to achieve parallelization with a large granularity.
  • FIG. 1 is a diagram showing an outline of a parallelized program generation method according to the present invention.
  • Threads are diagrams showing an outline of the program generation method.
  • FIG. 3 is a diagram illustrating a thread “program generated by the program generation method” in FIG.
  • FIG. 4 is a flowchart showing a method for generating a thread control program.
  • FIG. 5 is a flowchart showing a method for determining an execution order relationship between vertices.
  • FIG. 6 is a flowchart showing a process (step S2 in FIG. 5) for reconfiguring the control flow below vertex V.
  • FIG. 7 is a flowchart showing a process for calculating the execution order relation of Regions.
  • FIG. 8 is a flowchart showing a process for obtaining inverse dependence and output dependence (step S4 in FIG. 7).
  • FIG. 11 is a flowchart showing an addition process of inverse dependence.
  • FIG. 12 is a flowchart showing an output-dependent addition process.
  • FIG. 13 is a flowchart showing a process for obtaining inverse dependence and output dependence (step S5 in FIG. 7).
  • FIG. 15 is a diagram schematically showing a spanning tree.
  • FIG. 17 is a diagram for explaining the addition of an inverse dependence edge by the process of FIG.
  • FIG. 19 is a flowchart showing a process for generating a thread control program below a vertex vp.
  • FIG. 20 (a) is a diagram showing a part of an input sequential program, and (b) is a diagram showing a corresponding degenerate program dependence graph.
  • FIG. 21 is a diagram showing a thread control program generated according to the first embodiment from the degenerate program dependence graph of FIG.
  • FIG. 22 is a schematic diagram showing the operation of the above thread control program together with the execution of a thread 'program.
  • FIG. 23 is a diagram showing a thread control program generated according to the second embodiment from the degenerate program dependence graph of FIG.
  • FIG. 24 is a schematic diagram showing the operation of the thread control program described above together with the execution of the thread 'program.
  • FIG. 25 is a diagram showing a thread control program generated according to the third embodiment from the degenerate program dependence graph of FIG. 20.
  • FIG. 26 is a schematic diagram showing the operation of the thread control program described above together with the execution of the thread 'program.
  • FIG. 27 is a diagram showing a configuration of an apparatus for executing a parallelized program generation method according to the present invention.
  • FIG. 1 is a diagram showing an outline of a parallelized program generation method according to the present invention.
  • step S 1 a program dependency graph (PDG) is generated from the sequential program.
  • step S2 a degenerate program dependency graph with a thread as a vertex is created by reducing the dependency until a processing amount suitable for execution by another processor element as a thread is reached.
  • step S3 a thread control program that non-speculatively controls thread activation and synchronization is generated from the generated degenerate program dependence graph.
  • step S4 a thread program corresponding to each vertex is generated from the reduced program dependence graph.
  • step S 1 in FIG. 1 First, the process of generating a program dependence graph from a sequential program (step S 1 in FIG. 1) will be described.
  • the program dependence graph is a graph in which a sentence of a program is a vertex and a relation between the sentences is represented by an edge.
  • the program dependence graphs described in Non-Patent Documents 1 to 3 are expressed by the following set of vertex set V and edge set E, and can be generated by analyzing the sequential program.
  • Initial definition represents the definition of the initial value at the start of the program.
  • Predicate Indicates if_then_else or while-loop condition determination.
  • Assignment statement represents an assignment statement of a program.
  • Last use represents a reference to a variable at the end of the program.
  • step S2 in Fig. 1 the process of creating the degenerate program dependence graph
  • the graph has a sentence or substitution expression as a vertex. If a sentence or assignment expression is used as vertices, the number of vertices in the graph will be thousands to tens of thousands in large-scale software. In general, it is known that the computational complexity of optimization problems using compiler graphs increases exponentially with the size of the graph. Therefore, for example, a graph with several tens of vertices for several procedures can be analyzed, but it can be said that it is difficult to optimize the entire software on a realistic scale.
  • a program dependency graph with a coarse granularity is created by degenerating the dependency relationship of the program dependency graph that reduces the number of vertices and sides of the program dependency graph and merging the vertices.
  • the scale of the graph is reduced to 1/10 to 1/100, enabling optimization of the program in a realistic time.
  • the degeneration of the dependency relationship is executed by obtaining a set of dependency relationships and vertices that can be degenerated in the following manner, deleting the dependency relationship, and merging the vertices into one vertex.
  • control structure of the program to be expressed is limited to i ⁇ , while statements, and assignment statements, and the control dependence subgraph of the program dependence graph (partial graph consisting only of vertices and control dependence edges) It is known that the control flow of a program can be reconfigured when the shape of the tree is a tree structure (Non-patent Document 1). Therefore, for the control statements that are not i ⁇ and while statements in the program, find a block of the program that has one entry and one exit. By reducing the dependency between the entire block and the inside of the block to one vertex, a reduced program dependency graph is created in a range where the control flow can be safely reconfigured.
  • the degree of coupling shall be calculated from the data-dependent edge and its size, the control-dependent edge, and the processing size. If vertices with a certain degree of connectivity or higher satisfy the contractible condition, the vertices are joined to reduce the dependency. Here, when the following two conditions are satisfied, reduction by combining vertices is possible.
  • degenerate program dependence graph in which the number of vertices is significantly reduced by “degeneration based on syntax rules” or “degeneration based on connectivity”.
  • the degenerate program dependence graph consists of the following elements.
  • Initial definition represents the definition of the initial value at the start of the program.
  • Predicate Indicates if-then-else or while-loop condition determination.
  • Sentence set represents a set of sentences constituting a program.
  • Last use represents a reference to a variable at the end of the program.
  • step S3 in FIG. 1 a process for generating a thread control program (step S3 in FIG. 1) and a process for generating a thread program (step S4 in FIG. 1) will be described.
  • the vertices of the reduced program dependence graph generated as described above are a subset of sentences of the input sequential program and have information on the flow of control between sentences. Therefore, one thread 'program is generated for one vertex, with the variable represented by one input edge of the data flow to one vertex of interest as input and the variable represented by the data flow output edge as output.
  • the body of the thread 'program is generated from the flow of control, and local variables necessary for the execution of the body are generated.
  • FIG. 2 is a diagram showing an overview of a thread 'program generation method.
  • FIG. 3 is a diagram showing a thread program generated by the thread program generation method of FIG.
  • step S1 of FIG. 2 the variable represented by the data flow input edge for the target vertex is changed.
  • a program part for receiving an input variable is generated.
  • the receiving portion 10 of the input variable shown in FIG. 3 is generated.
  • step S2 the necessary variables are searched.
  • step S3 a variable declaration is generated for the variable found by the search.
  • the variable declaration part 11 shown in FIG. 3 is generated.
  • step S4 a program text is generated based on the control flow information between the sentences at the target vertex. As a result, the program body 12 shown in FIG. 3 is generated.
  • step S5 the variable represented by the data flow output edge of the target vertex is output, and a program part for transmitting the output variable is generated. As a result, the transmission part 13 of the output variable shown in FIG. 3 is generated.
  • Non-Patent Document 1 generation of a thread control program will be described. Based on the technology described in Non-Patent Document 1, it is possible to safely reconstruct the control flow from the degenerated program dependence graph. Specifically, the execution order of child vertices is determined for each intermediate node (scheduling) for the control dependence subtree of the reduced program dependence graph. As a result, a program that calls the control structure displayed by each intermediate node and the thread 'program represented by the child vertex is generated. Thread 'When the program is called, the code that sends and receives input data and output data and waits is also generated. The generation of such a thread control program will be described in detail in the following embodiments.
  • the first embodiment relates to the realization of a parallelized program by asynchronous remote procedure call.
  • the thread 'program is a procedure for executing a sentence / sentence set represented by a vertex. Therefore, create a procedure that accepts input variables as procedure arguments and output variables as return values or addresses that store output variables as arguments.
  • the vertex procedure is called as an asynchronous remote call. Also, when calling the procedure that the subsequent vertex represents, the preceding procedure It shall wait for the call. Finally, to guarantee the execution result of the program (the value of the last used variable), wait for the procedure that outputs the last used variable.
  • FIG. 4 is a flowchart showing a method for generating a thread control program.
  • step S1 the execution order relation between vertices is calculated.
  • a program dependence graph is a graph that expresses only the dependency between vertices, and the execution order between vertices is not specified.
  • it is necessary to find an execution order that satisfies the dependency (control dependency, data dependency).
  • the reverse dependency and output dependency necessary to satisfy the dependency are obtained, and the execution order is determined.
  • step S2 variables and initial value assignment statements are generated.
  • One is a variable necessary for the calculation of the program. It generates a variable represented by the initial definition vertex and the final use vertex, and a variable represented by the data dependence edge of the program dependence graph.
  • the other is a variable necessary for program control, and generates a variable that identifies the remote procedure call at the apex that consists of each sentence / sentence set.
  • the initial value is “not executed”.
  • step S3 a thread control program below the vertex vp is generated. This will be explained in detail below.
  • step S4 an end value waiting is generated.
  • FIG. 5 is a flowchart illustrating a method for determining an execution order relationship between vertices.
  • the process in FIG. 5 corresponds to step S1 in FIG.
  • the input of the process shown in Fig. 5 is the reduced program dependence graph PDG, and the output is the reduced program dependence graph PDG and its control flow.
  • step S1 V is the entry vertex (program start point) of the degenerated program dependence graph PDG.
  • step S2 the control flow below vertex V is reconfigured. This completes the process.
  • FIG. 6 is a flowchart showing a process (step S 2 in FIG. 5) for reconfiguring the flow of control below the vertex V.
  • the input of the processing in Fig. 6 is the degenerate program dependence graph PDG and the top Point V.
  • Region (v, T) is a set of vertices u, and there is a true control dependency from vertex v to vertex u. Where V is a vertex set, E is edge set, v ⁇ T u shows a True control Yi Sonhen.
  • step S2 the execution order relation of Region (v, T) is calculated.
  • Region (v, F) is a set of vertices u, and there is a false control dependency from vertex V to vertex u. The process ends here.
  • FIG. 7 is a flowchart showing processing for calculating the execution order relationship of Regions. This processing corresponds to each of step S2 and step S4 in FIG.
  • the inputs of the processing in Fig. 7 are the degenerate program dependence graph PDG and V '(region of interest).
  • step S1 a loop that repeats the processing in steps S2 to S3 is started for each vertex V of the region of interest V '.
  • step S2 it is determined whether or not V is a predicate vertex (a vertex representing If-then-else or while-loop condition determination).
  • step S3 only if V is a predicate vertex.
  • step S3 the execution order relation below vertex V is calculated.
  • step S4 inverse dependence and output dependence are obtained.
  • the data dependence (inverse dependence, output dependence) due to the flow of control is extracted.
  • the inverse dependence and output dependence in the attention area are expressed from the data dependence relation exceeding the attention area (Region).
  • step S5 inverse dependence and output dependence are obtained.
  • the execution order in the region of interest (Region) is determined.
  • appropriate execution order constraints are determined for a set of vertices in the region where the execution order is not uniquely determined.
  • the execution order is determined by clarifying the reverse dependency relation and output dependency relation within the region based on the execution order constraints based on the obtained reverse dependence relation and output dependence relation. If the execution order is arbitrary, the reverse order and output dependency are obtained by assuming the execution order, and the execution order without contradiction is obtained. Repeat the trial until
  • step S6 scheduling is performed in step S6. That is, the execution order of the vertices is determined based on the execution order relationship obtained above. This can be reduced to the general problem of scheduling graphs with partial order relations. Therefore, well-known scheduling methods such as topological “sorting” and “list with weighted approximation of vertex execution time” scheduling can be applied.
  • FIG. 8 is a flowchart showing a process for obtaining inverse dependence and output dependence (step S4 in FIG. 7).
  • the input of the process in Fig. 8 is the degenerate program dependence graph PDG and V '(target R egion).
  • step S1 a variable reference exceeding the region of interest V 'is extracted and set to V.
  • step S2 a variable reference exceeding the region of interest V 'is extracted and set to V.
  • variable substitution exceeding the region of interest V ' is extracted as V.
  • step S4 output dependent edges are determined based on V and V.
  • FIG. 9 is a flowchart showing processing for extracting a variable reference that exceeds the region of interest.
  • step S1 in FIG. 8 corresponds to step S1 in FIG. 8, and the degenerated program dependence graph PDG and V ′
  • step S1 the vertex set V is emptied.
  • step S2 each flow in the region of interest V '
  • the flow-dependent edge includes a loop-independent flow-dependent edge and a loop carry-over flow-dependent edge.
  • u is the dependency source vertex of the flow-dependent edge e
  • V is the dependency destination vertex of the edge e.
  • step S4 If it is a loop carry-over flow dependent edge, it is determined in step S4 whether or not the condition that the dependency destination vertex V is included in the region of interest V 'is satisfied. If it is a loop-independent flow dependent edge, it is checked in step S5 whether or not the condition that the dependency source vertex u is not included in the attention area V ′ and the dependency destination vertex V is included in the attention area V ′ is satisfied. judge. Only in the case of this determination result power Syes, step S6 is executed. In step S6, the dependent vertex V is added to the vertex set V.
  • step S7 the vertex set V is returned as a value. The process ends here.
  • FIG. 10 is a flowchart showing processing for extracting variable substitution exceeding the region of interest.
  • the processing in FIG. 10 corresponds to step S2 in FIG. 8, and the degenerated program dependence graph PDG and V (target region) are input.
  • step S1 the vertex set V is emptied.
  • step S2 each flow in the region of interest V '
  • the flow-dependent edge includes a loop-independent flow-dependent edge and a loop carry-over flow-dependent edge.
  • u is the dependency source vertex of the flow-dependent edge e
  • V is the dependency destination vertex of the edge e.
  • step S4 If it is a loop carry-over flow dependent edge, it is determined in step S4 whether or not the condition that the dependency destination vertex V is included in the region of interest V 'is satisfied. If it is a loop-independent flow dependent edge, whether or not the condition that the dependence source vertex u is included in the attention area V ′ and the dependence destination vertex V is not included in the attention area V ′ is satisfied in step S5. Determine whether. Only in the case of any judgment result power Syes, step S6 is executed. In step S6, the dependent vertex V is added to the vertex set V.
  • step S7 the vertex set V is returned as a value. The process ends here.
  • FIG. 11 is a flowchart showing the inverse-dependent addition process.
  • the process in Fig. 11 corresponds to step S3 in Fig. 8, and the degenerate program dependence graph PDG, V (region of interest), and vertex set V are input.
  • step SI a loop that repeats the following processing for each vertex V of vertex set V is created.
  • step S2 a loop that repeats the following processing is started for each variable X used at vertex V.
  • step S3 a loop that repeats the subsequent processing is started for each vertex u of the region of interest V ′.
  • step S4 it is determined whether or not the vertex u defines the variable X. Only when the judgment result is yes, execute step S5. In step S5, add an inverse dependence edge from V to u. The process ends here.
  • FIG. 12 is a flowchart showing an output-dependent addition process.
  • the process in FIG. 12 corresponds to step S4 in FIG. 8, and the degenerate program dependence graph PDG, V (target region), and vertex set V are input.
  • step SI a loop that repeats the following processing is performed for each vertex u of the vertex set V.
  • step S3 a loop for repeating the subsequent processing is started for each vertex V of the region of interest V ′.
  • step S4 it is determined whether or not the vertex V defines a variable X.
  • Step S5 is executed only when the judgment result is Syes.
  • step S5 an output dependent edge from V to u is added. The process ends here.
  • FIG. 13 is a flowchart showing a process for obtaining inverse dependence and output dependence (step S5 in FIG. 7).
  • the input of the processing in Fig. 13 is a degenerate program dependence graph PDG and V '
  • step S 1 a spanning tree in the region of interest is obtained and set as S.
  • the spanning tree for variable x of vertex v is
  • FIG. 14 is a diagram for explaining a spanning tree.
  • variable X is defined at vertex V
  • two vertices vl and v2 are
  • variable X a spanning tree 21 is formed by vertices v, vl, and v2. Also, variable X is defined at vertex V, and two vertices v3 and v4 use variable X. in this case,
  • a spanning tree 22 is formed by vertices v, v3, and v4.
  • Figure 15 is a diagram schematically showing a spanning tree.
  • Spanning tree Span (v, X) and spanning tree Span (v, X) are shown as data dependence graphs in Figure 15.
  • step S2 a loop is started in which two arbitrary spanning trees whose execution order is undetermined are sequentially selected and the subsequent processing is repeated.
  • step S3 whether there are independent spanning trees Span (h, x) and Span (h, ⁇ ) for the same variable X with a cycle in the region of interest.
  • step S4 the original R (Region) is saved on the stack.
  • step S5 an output dependence edge h ⁇ h o is added to obtain transitive closure.
  • step S6 the order relation between spanning trees is
  • step S7 it is determined whether or not there is a closed circuit in R (Region). If not, the subsequent processing steps S8 to S11 are skipped. If yes, go to step S8.
  • step S8 it is determined whether the stack is empty. If it is empty, the error ends. If not, in step S9, take R's original from the stack.
  • step S10 h ⁇ h is output.
  • step si i the order relation between spanning trees is calculated.
  • the execution order for 0 1, x) is determined. Further, two arbitrary spanning trees whose execution order is undecided are sequentially selected and the same processing is repeated, and the process ends when the order relations between all spanning trees are determined.
  • FIG. 16 is a flowchart showing a process for calculating the order relationship between spanning trees.
  • the process in FIG. 16 corresponds to Step S6 and Step S11 in FIG.
  • the input of the processing in FIG. 16 is a degenerated program dependence graph PDG and V ′ (region of interest).
  • step S1 a loop that repeats the subsequent processing is started for each edge e (vertex v ⁇ vertex w) in the region of interest.
  • step S2 a loop is started that repeats the following processing for each variable X defined by vertex w and referenced by vertex V.
  • step S3 V ⁇ ⁇ u ⁇ v ⁇ Span (u, x) ⁇ and V ⁇ ⁇ u ⁇ w ⁇ Span (u, x a b
  • step S4 a loop for repeating the subsequent processing is started for each vertex V of V.
  • step S5 a loop that repeats the subsequent processing for each vertex V of V is started. More b b
  • step S6 Span (v
  • step S7 it is determined whether or not vc ⁇ vb is included in E (edge set).
  • step S8 is executed only when the judgment result is ye s.
  • step S8 add an inverse dependence edge of V ⁇ v and select cb
  • FIG. 17 is a diagram for explaining the addition of an inverse dependence edge by the process of FIG.
  • FIG. 17 shows a spanning tree Span (v, x) for the variable X of the vertex V and a spanning tree Span (w, x) for the variable X of the vertex w.
  • Spanning tree Span (v, x) for variable X containing vertex V as an element ie Span a
  • FIG. 18 is a flowchart showing a modification of the method for determining the execution order relationship between vertices.
  • the process shown in the flowchart of FIG. 18 may be used instead of the process shown in the flowchart of FIG.
  • a process of applying SSA static single assignment form
  • the degenerate program dependence graph may be converted into a static single assignment format.
  • the processing of step S7 shown in FIG. 7 processing for obtaining reverse dependence and output dependence and determining the execution order in the region of interest: the flowchart of FIG. 13) can be omitted.
  • FIG. 19 is a flowchart showing a process of generating a thread control program below the vertex vp.
  • the process in FIG. 19 corresponds to step S3 in FIG.
  • the inputs of the process shown in Fig. 19 are the degenerate program dependence graph PDG and vertex V.
  • vertex V is a target vertex.
  • vertex V vertices below vertex V (
  • Child vertices refers to vertices below vertex V in the subgraph (becomes a tree structure) when focusing on the control dependence edge of the degenerate program dependence graph.
  • vertex V vertex V
  • step S1 the data dependency of V (loop from the child vertex of vp to its own loop)
  • step S2 the type of V is determined. Depending on the type of vertex V, the following code is generated
  • step S3 As a result of the determination in step S2, if V is a sentence (or set of sentences) vertex, in step S3,
  • step S4 If the result of determination in step S2 is that V is the predicate vertex of the loop, step S4
  • step S5 create an end-of-execution queue below the child vertices of V
  • step S7 a thread control program of V or less is generated. That is, the program represented by child vertex V is generated as the body of the for / while statement loop generated above.
  • the process of step S6 corresponds to the process of the flowchart of FIG. 19, and the process has a nested structure.
  • step S8 the leading vertex in the own loop with respect to the loop carry forward flow-dependent input edge of V
  • step S2 If the result of step S2 is the predicate vertex of the V force statement,
  • step S10 Generate a statement.
  • step S10 a then clause is generated.
  • step S12 a thread control program of V or less is generated. That is, the program representing the child vertex V force S is generated as the text of the generated then clause.
  • the process of step S12 corresponds to the process of the flowchart of FIG. 19, and the process has a nested structure. [0111]
  • step S14 an else clause is generated.
  • step S16 a thread control program of V or less is generated. That is, the program represented by child vertex V is generated as the body of the else clause generated above.
  • the process of step S16 corresponds to the process of the flowchart of FIG. 19, and the process has a nested structure.
  • FIG. 20 is a diagram showing (a) the input sequential program portion and (b) the corresponding degenerate program-dependent dialog.
  • a program dependence graph is generated from the input sequential program shown in Fig. 20 (a), and the reduced program dependence graph shown in (b) is generated by combining the vertices and degenerating. Vertex V to V exist, and vertex V is a set of sentences due to degeneration.
  • FIG. 21 is a thread control program generated according to the first embodiment from the degenerate program dependence graph of FIG. First, there is a variable declaration. Declare the variables X, y, a, b, and p to be used, and set the variables vl, v3, v4, and v5 representing the procedure to "unexecuted" (variable declaration & setting 41).
  • the procedure call statement 47 calls the procedure v3
  • the procedure call statement 48 calls the procedure v4.
  • the procedure v3 is waited to acquire the value of the variable a, and the process returns to the beginning of the while statement. Use the acquired value of a to make a condition decision, and if the result is false, exit the loop. After that, it waits for the procedure v4 in the end wait statement 50 and acquires the value of the variable b. Finally, the procedure v5 is called by the procedure call statement 51 using the variables a and b, and the value of the variable X is acquired by the end waiting statement 52.
  • Fig. 22 is a schematic diagram showing the operation of the thread control program described above together with the execution of the thread 'program.
  • four processors processor 0 to processor 3, are used.
  • the thread control program is executed on the processor 0.
  • the procedure vl is not executed, so immediately, the procedure v3 is called first, and the thread 'program 61 is executed by the processor 1.
  • procedure v4 is called and thread 'program 62 is executed by processor 2.
  • procedure v3 is awaited to wait for the value of variable a used for condition determination. This corresponds to the procedure waiting statement 49 in FIG.
  • procedure v4 When the condition of the while statement is satisfied again, the procedure v4 called in the previous loop is waited for in the second loop of the while statement. This corresponds to the procedure wait statement 46 in FIG.
  • Procedure v3 is called and thread 'program 63 (program code identical to 61) is executed by processor 1. Subsequently, procedure v4 is called and thread 'program 64 (program code is the same as 62) is executed by processor 2. In addition, procedure v3 will be held again.
  • Thread receives the input variable represented by the flow-dependent edge to the target vertex, the code that receives the message instructing the start of execution, the statement / statement set represented by the vertex, It consists of code that sends an output variable represented by the flow-dependent edge from the vertex, and code that sends a message that indicates the completion of execution.
  • the thread control program sends the input variable and a message instructing the start of execution of the procedure to the code that controls the branch and norep expressed by the predicate vertex, and the vertex that represents the statement / sentence set, and outputs it. It consists of a variable and a code that receives a message that notifies the end.
  • Receiving output variables and termination notifications shall be made when required at subsequent vertices.
  • the variable and the execution completion message are received.
  • an identifier indicating the state of the apex is used, and it has one of three states of “not executed”, “running”, and “waiting”.
  • FIG. 23 is a thread control program generated according to the second embodiment from the degenerate program dependence graph of FIG. First, there is a variable declaration, and the variables X, y, a, b, and p to be used are declared, and the variables vl, v3, v4, and v5 representing the procedure are set to “unexecuted” (variable declaration & setting 71).
  • FIG. 24 is a schematic diagram showing the operation of the above thread control program together with the execution of the thread 'program.
  • four processors processor 0 to processor 3, are used.
  • the thread control program is executed on the processor 0.
  • an execution start message is transmitted to the procedure v3 together with the necessary variables X and a, and the thread program 91 is executed by the processor 1.
  • an execution start message is sent to the procedure v4 together with necessary variables y and b, and the thread ' ⁇ program 92 is executed by the processor 2.
  • variable b of procedure v4 is received, an execution start message is sent to procedure v5 together with the required variables a and b, and thread 'program 95 is executed by processor 3. The This corresponds to the waiting in the message reception text 70 of FIG. 23 and the communication to the procedure v5 in the message transmission text 81, respectively. Finally, the variable X of procedure v5 is received and the program is terminated.
  • Thread The program is executed as a thread, and transfers shared I / O variables using shared memory. Therefore, the shared variable that becomes the input / output is locked. Run the vertex part program, and finally unlock the shared variable and terminate the thread.
  • a thread procedure is generated. Generate code that locks I / O variables to use shared memory for I / O variable passing methods. Next, the variables used and defined by the vertex subprogram other than the input variables are obtained, and the variable declaration is generated. Output the partial program, and finally generate the code that unlocks the shared variable and the code that terminates the thread.
  • the thread control program calls the vertex procedure as a thread.
  • the preceding thread is waited.
  • the execution result of the program the value of the last used variable
  • an identifier indicating the execution state of a thread is used, and it has one of three states: “not executed”, “running”, or “waiting”.
  • FIG. 25 is a thread control program generated according to the third embodiment from the degenerate program dependence graph of FIG. There is a variable declaration first, and the variables X, y, a, b, and p to be used are declared, and the variables vl, v3, v4, and v5 that represent threads are set to “unexecuted” (variable declaration & setting 101 ).
  • the thread v3 is generated by the thread generation statement 107, and the thread v4 is generated by the thread generation statement 108. These threads correspond to vertex V and vertex V shown in Figure 20 (b).
  • the end wait statement 109 waits for the thread v3, acquires the value of the variable a, and returns to the beginning of the w hile statement. Use the acquired value of a to make a condition decision. If the result is false, exit the loop.
  • the end wait statement 50 waits for the thread v4 and acquires the value of the variable b.
  • the thread generation statement 51 generates a thread v5, and the end wait statement 52 acquires the value of the variable X.
  • FIG. 26 is a schematic diagram showing the operation of the above thread control program together with the execution of the thread 'program.
  • one processor 0 is used.
  • the thread control program is executed by the processor 0 and each thread 'program is executed.
  • the procedure vl is not executed, so immediately, the thread v3 is generated first, and the thread 'program 121 is executed.
  • a thread v4 is generated and a thread 'program 122 is executed.
  • These correspond to the thread v3 generation in the thread generation statement 107 and the thread v4 generation in the thread generation statement 108 in FIG.
  • the end of thread v3 which should acquire the value of variable a used for condition determination, is awaited. This corresponds to the wait statement 109 in FIG.
  • Thread v4 When the condition of the while statement is satisfied again, the thread v4 created in the previous loop is waited for in the second loop of the while statement. This corresponds to the wait statement 106 in FIG. Thread v3 is generated thread 'program 123 (program code Is the same as 121). Subsequently, thread v4 is generated and thread 'program 124 (the program code is the same as 122) is executed. Again, thread v3 is queued.
  • the thread v4 is waited for, the thread v5 is generated, and the thread 'program 125 is executed. This corresponds to the waiting in the waiting statement 110 in FIG. 25 and the thread v5 generation in the thread generating statement 111, respectively. Finally, wait for thread v5, get the result, and exit the program.
  • FIG. 27 is a diagram showing a configuration of an apparatus for executing the parallelized program generation method according to the present invention.
  • an apparatus for executing the parallelized program generation method according to the present invention is realized by a computer such as a personal computer or an engineering workstation.
  • 27 includes a computer 510, a display device 520 connected to the computer 510, a communication device 523, and an input device.
  • the input device includes a keyboard 521 and a mouse 522, for example.
  • the computer 510 includes a CPU 511, RAM 512, ROM513, a secondary storage device 514 such as a hard disk, a replaceable medium storage device 515, and an interface 516.
  • the keyboard 521 and the mouse 522 provide an interface with the user, and various commands for operating the computer 510, user responses to requested data, and the like are input.
  • the display device 520 displays the results processed by the computer 510, and displays various data to enable interaction with the user when operating the computer 510.
  • the communication device 523 is for performing communication with a remote place, and includes, for example, a modem or a network interface.
  • the parallelized program generation method according to the present invention is provided as a computer program executable by the computer 510.
  • This computer program is stored in the storage medium M that can be mounted on the replaceable medium storage device 515, and is loaded into the RAM 512 or the secondary storage device 514 via the storage medium M force replaceable medium storage device 515.
  • This computer program is stored in a remote storage medium (not shown) and is loaded into the RAM 512 or the secondary storage device 514 via the communication device 523 and the interface 516. .
  • the CPU 511 When there is a program execution instruction from the user via the keyboard 521 and / or the mouse 522, the CPU 511 loads the program from the storage medium M, the remote storage medium, or the secondary storage device 514 into the RAM 512.
  • the CPU 511 uses the free storage space of the RAM 512 as a work area, executes the program loaded in the RAM 512, and proceeds with the process while appropriately talking to the user.
  • the ROM 513 stores a control program for controlling the basic operation of the computer 510.
  • the computer 510 executes the parallelized program generation method as described in the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

La présente invention concerne un programme de génération de programme en parallèle qui se caractérise par l'inclusion de codes pour entraîner l'exécution par un ordinateur des étapes d'entrée d'un programme séquentiel avec chaque phase construisant le programme séquentiel comme partie supérieure ; par la génération d'un graphe de dépendance de programme avec des côtés définis entre les parties supérieures des phrases ; la génération d'un graphe de dépendance de programme dégénéré dans lequel les parties supérieures du graphe de dépendance de programme sont fusionnées dans le but de réduire le nombre de parties supérieures ; la génération d'un programme à filière correspondant à chacune des parties supérieures du graphe de dépendance de programme dégénéré; et la génération d'un programme de commande de filière afin de commander respectivement le démarrage et la synchronisation du programme à filière.
PCT/JP2006/305016 2006-03-14 2006-03-14 Programme, dispositif et procede de generation de programme en parallele Ceased WO2007105309A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2008504960A JP5083204B2 (ja) 2006-03-14 2006-03-14 並列化プログラム生成プログラム、並列化プログラム生成装置、及び並列化プログラム生成方法
PCT/JP2006/305016 WO2007105309A1 (fr) 2006-03-14 2006-03-14 Programme, dispositif et procede de generation de programme en parallele

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2006/305016 WO2007105309A1 (fr) 2006-03-14 2006-03-14 Programme, dispositif et procede de generation de programme en parallele

Publications (1)

Publication Number Publication Date
WO2007105309A1 true WO2007105309A1 (fr) 2007-09-20

Family

ID=38509160

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/305016 Ceased WO2007105309A1 (fr) 2006-03-14 2006-03-14 Programme, dispositif et procede de generation de programme en parallele

Country Status (2)

Country Link
JP (1) JP5083204B2 (fr)
WO (1) WO2007105309A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009151645A (ja) * 2007-12-21 2009-07-09 Mitsubishi Electric Corp 並列処理装置及びプログラム並列化装置
JP2009175882A (ja) * 2008-01-22 2009-08-06 Fujitsu Ltd 並列化プログラム生成方法、並列化プログラム生成プログラム、及び並列化プログラム生成装置
JP2009205586A (ja) * 2008-02-29 2009-09-10 Sony Computer Entertainment Inc コンパイラおよびプログラム分割方法
JP2010211731A (ja) * 2009-03-12 2010-09-24 Fujitsu Ltd 並列処理支援プログラム、並列処理支援装置および並列処理支援方法
JP2012508939A (ja) * 2008-11-24 2012-04-12 インテル コーポレイション シーケンシャル・プログラムを複数スレッドに分解し、スレッドを実行し、シーケンシャルな実行を再構成するシステム、方法および装置
JP2017506406A (ja) * 2014-02-20 2017-03-02 スティルウォーター スーパーコンピューティング,インク. アフィン従属による単一割当プログラムを実行するための実行エンジン
CN114238078A (zh) * 2021-11-23 2022-03-25 南京邮电大学 一种基于高阶函数的程序间依赖关系抽取方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11714615B2 (en) 2020-09-18 2023-08-01 International Business Machines Corporation Application migration using cost-aware code dependency graph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6898787B2 (en) * 2001-03-22 2005-05-24 Hewlett-Packard Development Company, L.P. Method and apparatus for ordered predicate phi in static single assignment form

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ITO Y. ET AL.: "Seigyo Izon no Kanwa o Koryo shita Heiretsusei Chushutsu Shuho", INFORMATION PROCESSING SOCIETY OF JAPAN KENKYU HOKOKU (2001-ARC-144), vol. 2001, no. 76, 27 July 2001 (2001-07-27), pages 87 - 92, XP003017744 *
KASAHARA H. ET AL.: "Heiretsu Shori Software", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRICAL ENGINEERS OF JAPAN, vol. 113-C, no. 11, 20 November 1993 (1993-11-20), pages 919 - 927, XP003017741 *
NAKANISHI T. ET AL.: "CDP2 Algorithm - Data Bunkatsu Graph Jo deno Togoteki Data Progra Bunkatsu Algorithm -", INFORMATION PROCESSING SOCIETY OF JAPAN KENKYU HOKOKU (95-HPC-57), vol. 95, no. 81, 25 August 1995 (1995-08-25), pages 139 - 144, XP003017742 *
UCHIDA S. ET AL.: "Izon Graph no Henkei niyoru Heiretsuka Shori no Seigosei", FIT2002 (FORUM ON INFORMATION TECHNOLOGY) IPPAN KOEN RONBUNSHU, vol. 1, 13 September 2002 (2002-09-13), pages 7 - 8, XP003017743 *
WEI Y. ET AL.: "Guard Tsuki PDG o Mochiita Meirei Level Heiretsu Architecture no Tameno Saitekika Compiler no Jitsugen", INFORMATION PROCESSING SOCIETY OF JAPAN DAI 58 KAI (HEISEI 11 NEN ZENKI) ZENKOKU TAIKAI KOEN RONBUNSHU, no. 1, 11 March 1999 (1999-03-11), pages 1-269 - 1-270, XP003017745 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009151645A (ja) * 2007-12-21 2009-07-09 Mitsubishi Electric Corp 並列処理装置及びプログラム並列化装置
JP2009175882A (ja) * 2008-01-22 2009-08-06 Fujitsu Ltd 並列化プログラム生成方法、並列化プログラム生成プログラム、及び並列化プログラム生成装置
JP2009205586A (ja) * 2008-02-29 2009-09-10 Sony Computer Entertainment Inc コンパイラおよびプログラム分割方法
US8645910B2 (en) 2008-02-29 2014-02-04 Sony Corporation Compiler capable of partitioning program and program partitioning method
JP2012508939A (ja) * 2008-11-24 2012-04-12 インテル コーポレイション シーケンシャル・プログラムを複数スレッドに分解し、スレッドを実行し、シーケンシャルな実行を再構成するシステム、方法および装置
JP2010211731A (ja) * 2009-03-12 2010-09-24 Fujitsu Ltd 並列処理支援プログラム、並列処理支援装置および並列処理支援方法
JP2017506406A (ja) * 2014-02-20 2017-03-02 スティルウォーター スーパーコンピューティング,インク. アフィン従属による単一割当プログラムを実行するための実行エンジン
CN114238078A (zh) * 2021-11-23 2022-03-25 南京邮电大学 一种基于高阶函数的程序间依赖关系抽取方法

Also Published As

Publication number Publication date
JP5083204B2 (ja) 2012-11-28
JPWO2007105309A1 (ja) 2009-07-30

Similar Documents

Publication Publication Date Title
JP4962564B2 (ja) 並列化プログラム生成方法、並列化プログラム生成装置、及び並列化プログラム生成プログラム
Low et al. Graphlab: A new framework for parallel machine learning
JP2002116916A (ja) プログラムの最適化方法及びこれを用いたコンパイラ
Duarte et al. Parallel variable neighbourhood search strategies for the cutwidth minimization problem
Griebler et al. High-level and productive stream parallelism for Dedup, Ferret, and Bzip2
JP2001166949A (ja) シンボリック実行を用いてソースコードをコンパイルするための方法及び装置
Jiang et al. FlatDD: a high-performance quantum circuit simulator using decision diagram and flat array
JP5083204B2 (ja) 並列化プログラム生成プログラム、並列化プログラム生成装置、及び並列化プログラム生成方法
Zhao et al. AutoGraph: Optimizing DNN computation graph for parallel GPU kernel execution
JP5381302B2 (ja) 並列化スケジューリング装置
WO2008041442A1 (fr) Procédé de création de programme par parallélisation, dispositif de création de programme par parallélisation, et programme de création de programme par parallélisation
Audet et al. Efficient use of parallelism in algorithmic parameter optimization applications
Tóth et al. Pattern candidate discovery and parallelization techniques
Yin et al. Trigger-centric loop mapping on CGRAs
CN118520910A (zh) 模型信息的处理方法、装置、电子设备和存储介质
Li et al. Using design space exploration for finding schedules with guaranteed reaction times of synchronous programs on multi-core architecture
Avis et al. A portable parallel implementation of the lrs vertex enumeration code
CN116795515A (zh) 循环任务的执行方法、设备、芯片和存储介质
JP5315703B2 (ja) 並列化プログラム生成方法、並列化プログラム生成プログラム、及び並列化プログラム生成装置
US10545965B2 (en) Data skew finding and analysis
Lenfers et al. Schedgehammer: Auto-tuning Compiler Optimizations beyond Numerical Parameters
Dance et al. Efficiently vectorized MCMC on modern accelerators
Araujo et al. A DVND local search implemented on a dataflow architecture for the minimum latency problem
Whitlock et al. Scalable collectives for distributed asynchronous many-task runtimes
Das et al. Automatic Task Parallelization of Dataflow Graphs in ML/DL Models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06729050

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2008504960

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06729050

Country of ref document: EP

Kind code of ref document: A1