cOMPunity - The Community of OpenMP Users

OpenMP FAQs


 Q1: What is OpenMP?

 Q2: What does the MP in OpenMP stand for?

 Q3: Why a new standard?

 Q4: How is the OpenMP specification different from the X3H5 draft standard?

 Q5: How does OpenMP compare with ... ?

 Q6: What about nested parallelism?

 Q7: What about task parallelism?


 Q8: What if I just want loop-level parallelism?

 Q9: What does orphaning mean?

Q10: What languages does OpenMP work with?

Q11: Is support for other languages planned?

Q12: Is OpenMP scalable?

Q13: What about non-shared memory machines or networks of workstations?

Q14: Why should OpenMP succeed when PCF and X3H5 failed?

Q15: How will OpenMP be managed as a specification, over the long-term? Who owns it?

Q16: How do I get other questions answered?

Q1: What is OpenMP?
A1: OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs.

Q2: What does the MP in OpenMP stand for?
A2: The MP in OpenMP stands for Multi Processing. We provide Open specifications for Multi Processing via collaborative work with interested parties from the hardware and software industry, government and academia.

Q3: Why a new standard?
A3: Shared-memory parallel programming directives have never been standardized in the industry. An earlier standardization effort, ANSI X3H5 was never formally adopted. So vendors have each provided a different set of directives, very similar in syntax and semantics, and each used a unique comment or pragma notation for "portability". OpenMP consolidates these directive sets into a single syntax and semantics, and finally delivers the long- awaited promise of single source portability for shared-memory parallelism.

OpenMP also addresses the inability of previous shared-memory directive sets to deal with coarse grain parallelism. In the past, limited support for coarse grain work has led to developers thinking that shared-memory parallel programming was inherently limited to fine-grain parallelism -- this isn't the case with OpenMP. Orphaned directives in OpenMP offer the features necessary to represent coarse-grained algorithms.

Q4: How is the OpenMP specification different from the X3H5 draft standard?
A4: The ANSI/X3 authorized subcommittee X3H5 was chartered for the purpose of developing an ANSI standard based on the work done by the Parallel Computing Forum (PCF). PCF was an informal industry group that attempted to complete the work on standardized spelling of DO loop oriented parallelism. They produced a draft standard, but never completed the work. The OpenMP specification addresses the same problem. The difference is that the OpenMP Architecture Review Board completed the task and the specification is gaining industry-wide support. The OpenMP specification is an agreement reached between industry vendors and users - it is not a formal standard.

Q5: How does OpenMP compare with ... ?
A5: MPI? Message-passing has become accepted as a portable style of parallel programming, but has several significant weaknesses that limit its effectiveness and scalability. Message-passing in general is difficult to program and doesn't support incremental parallelization of an existing sequential program. Message-passing was initially defined for client/server applications running across a network, and so includes costly semantics (including message queuing and selection and the assumption of wholly separate memories) that are often not required by tightly-coded scientific applications running on modern scalable systems with globally addressable and cache coherent distributed memories.

HPF? HPF has never really gained wide acceptance among parallel application developers or hardware vendors. Some applications written in HPF perform well, but others find that limitations resulting from the HPF language itself or the compiler implementations lead to disappointing performance. HPF's focus on data parallelism has also limited its appeal.

Pthreads? Pthreads have never been targeted toward the technical/HPC market. This is reflected in the minimal Fortran support, and its lack of support for data parallelism. Even for C applications, pthreads requires programming at a level lower than most technical developers would prefer.

FORALL loops? FORALL loops are not rich or general enough to use as a complete parallel programming model. Their focus on loops and the rule that subroutines called by those loops can't have side effects effectively limit their scalability. FORALL loops are useful for providing information to automatic parallelizing compilers and preprocessors.

BSP or LINDA or SISAL or...? There are lots of parallel programming languages being researched or prototyped in the industry. These may be targeted towards a specific architecture, or focused on exploring one key requirement. If you have a question about how OpenMP compares with a specific language or model, we can help you figure this out.

Q6: What about nested parallelism?
A6: Nested parallelism is permitted by the OpenMP specification. Supporting nested parallelism effectively can be difficult, and we expect most vendors will start out by executing nested parallel constructs on a single thread. OpenMP encourages vendors to experiment with nested parallelism to help us and the users of OpenMP understand the best model and API to include in our specification. We will include the necessasry functionality when we understand the issues better.

Q7: What about task parallelism?
A7: Support for general task parallelism is not included in the OpenMP specification. OpenMP encourages vendors to experiment with task parallelism to help us and the users of OpenMP understand the best model and API to include in our specification. We will include the necessasry functionality when we understand the issues better.

Q8: What if I just want loop-level parallelism?
A8: OpenMP fully supports loop-level parallelism. Loop-level parallelism is useful for applications which have lots of coarse loop-level parallelism, especially those that will never be run on large numbers of processors or for which restructuring the source code is either impractical or disallowed. Typically, though, the amount of loop-level parallelism in an application is limited, and this in turn limits the scalability of the application.

OpenMP allows you to use loop-level parallelism as a way to start scaling your application for multiple processors, but then move into coarser grain parallelism, while maintaining the value of your earlier investment. This incremental development strategy avoids the all-or-none risks involved in moving to message-passing or other parallel programming models.

Q9: What does orphaning mean?
A9: In early shared-memory models, parallel directives were only permitted within the lexical extent of parallel regions. To the application programmer, this meant that the directives had to be defined in such a way that all information needed to parallelize a loop or subroutine had to be specified within the source for that loop or subroutine. If another subroutine was called, parallel information specific to that subroutine had to be specified at the call site, or the called subroutine had to be (manually) inlined. This simplified model was sufficient for the moderate, loop-level parallelism that dominated the use of these models, but never allowed good scalability for very large applications. For such large applications, programmers had to program outside the directive set to achieve good performance, resulting in programs that were non-standard and difficult to maintain.

Orphaning allows parallel directives to be specified outside the lexical extent of parallel regions. A subroutine can be written for use from a number of parallel regions, and parallel directives needed by that subroutine embedded within its source, instead of having to be replicated everywhere that calls it. This is a natural place to specify the parallelism, and avoids programming errors that result when the earlier style is used for complex applications. Orphaning is crucial to implementing coarse grain parallel algorithms, and to the development of portable, parallel libraries.

Q10: What languages does OpenMP work with?
A10: OpenMP is designed for Fortran, C and C++ to support the language that the underlying compiler supports. The OpenMP specification does not introduce any constructs that require specific Fortran 90 or C++ features. OpenMP cannot be supported by compilers that do not support one of Fortran 77, Fortran 90, ANSI 89 C or ANSI C++.

Q11: Is support for other languages planned?
A11: The OpenMP ARB does not plan to introduce support for additional languages at this point.

Q12: Is OpenMP scalable?
A12: OpenMP can deliver scalability for applications using shared- memory parallel programming. Significant effort was spent to ensure that OpenMP can be used for scalable applications. Ultimately, scalability is a property of the application and the algorithms used. The parallel programming language can only support the scalability by providing constructs that simplify the specification of the the parallelism and can be implemented with low overhead by compiler vendors. OpenMP certainly delivers these kinds of constructs.

Q13: What about non-shared memory machines or networks of workstations?
A13: As much as it would be nice to think that a single programming model (OpenMP or MPI or HPF or whatever) might run well on all architectures, this is not the case today. OpenMP was designed to exploit certain characteristics of shared-memory architectures. The ability to directly access memory throughout the system (with minimum latency and no explicit address mapping) combined with very fast shared memory locks, makes shared-memory architectures best suited for supporting OpenMP.

Systems that don't fit the classic shared-memory architecture may provide hardware or software layers that present the appearance of a shared-memory system, but often at the cost of higher latencies or special limitations. For example, OpenMP could be implemented for a distributed memory system on top of MPI, so OpenMP's latencies would be greater than that of MPI (whereas typically the reverse is the case on a shared-memory system). The extent to which these latencies or limitations reduce application portability or performance will help dictate whether vendors choose to develop OpenMP implementations for distributed memory systems.

Q14: Why should OpenMP succeed when PCF and X3H5 failed?
A14: There are a variety of reasons OpenMP will succeed at being accepted as a standard, where earlier efforts failed.

Goals - The OpenMP definition was driven by a small number of experts working to an aggressive schedule, building primarily on current practice.

Technical - OpenMP includes better support for more scalable, coarse grain parallelism and more directives for managing private/shared data. OpenMP also includes query functions, environment variables, and conditional compilation support.

Timing - The need for a standard in this area is better accepted throughout the industry now, as vendors have begun to converge on system architectures that combine aspects of both shared-memory and distributed architectures of the past. The importance of a standard encouraging scalable, parallel application development that can exploit shared-memory hardware is recognized as being more important than ever. (Interest in PCF and X3H5 was partly derailed by the appearance of pure distributed memory MPP systems, whose proponents were arguing that shared-memory parallel programming was no longer interesting.)

Vendor support - The system vendors behind OpenMP collectively have delivered a very large share of the shared-memory parallel systems in use today.

Q15: How will OpenMP be managed as a specification, over the long-term? Who owns it?
A15: The OpenMP specifications are owned and managed by the OpenMP Architecture Review Board (ARB).

Q16: How do I get other questions answered?
A16: You can send your questions to the OpenMP organization in the feedback section of the web site. We endeavor to answer all queries within one week, but sometimes it can take longer due to people's varying schedules, etc.