WL#4739: Physical Structure of Server
Affects: Server-9.x — Status: Assigned — Priority: Medium
In order to simplify working with the server code, both as a developer and when packaging distributions, we want to have a physical structure of the code that makes the code easy to work with. We are also aiming at a long-term solution that will allow the server code base to grow significantly, without making the code unmaintainable. In order to support this, we need a structure that supports: - Clear and simple conventions for creating structure for the code - Easy adding and removing of features to allow: - Features to be added late with a minimal risk of ripple effects into unrelated parts of the code. This can be introduced due to merges causing unintended code changes, as well as logical dependencies that are not clear. - It shall be easy to remove features, should it be necessary for some reason (that might not strictly be technical). - Having a structure that allows the creation of various distribution packages from the same source, such as: - Client development distributions for application programmers - Storage engine development distributions for storage engine writers - Plug-in development distributions for plug-in writers (whom may or may not be storage engine developers) - Having a structure that support working with the code using scripts to perform common tasks, like building special distributions, release testing, and packaging. Working practice ================ There are some working practice that we need to support in this structure. These practices are central to how we work with the code and not supporting them will introduce severe problems for developers. - Bug fixes and features is introduced as a sequence of patches, where each patch is a change to one or more files. - A single patch should not cause a build failure and the server should still pass all tests. If a bug fix or feature requires several patches, each patch should still leave the server in a stable state in the sense that it should still build and still pass all tests. - A patch should not require unwarranted changes in other package. We should discourage practice that may require a developer to make changes in other packages than the one that he/she is working on. Forcing a developer to make changes in code that he/she is not familiar with, however small the changes are, increases the risk of introducing bugs and may go against design principles originally intended for a component or package. - A patch is normally targeted for a single package only: features affecting several packages should be split into separate patches, committed in the right order, and preferably pushed together (bu this is not a requirement). Notes ===== - This worklog needs to be split up into several worklogs, at least: - one for the actual design (this one), - one for implementing the build frame (WL#4875) - one for fixing the current include file header mess (WL#4877) Continuing work =============== - In order to not stall the change of the structure for too long, it is necessary to set a bar for when the code should be changed. If that is not done, we will have to maintain two structures in parallel, which not offer any improvements to the development practice and instead solidify the current situation.
Open Issues =========== - What names shall we use for the packages? We already have storage/ and server/ and client/ (which already exists) have been suggested. Resolved issues =============== - Shall each package have a unique prefix for the files? Also consider the exported header files. The reasons for having different prefixes for header files is to be able to separate header files with same names in different packages when including them. However, by using the package directory name as prefix, a header file prefix is not needed. It would be either: #include "pkg_table.h" or #include "pkg/table.h" The reason for using prefixes for source files would be that linkers have problems distinguishing between files with the same name, but some tests indicate that is not the case on some common platforms (Linux and Solaris). In short, there seems to be no good reason to use file prefixes together with a package structure. - Shall a dynamically loadable module be a separate package or not? There might be reasons to why a loadable component may consist of several packages, so we should not require that each loadable component is a package. Decisions ========= 2009-02-26: We agreed on going for approach 2 when handling header files. The basis was later questioned and clarification of the document was asked for. 2009-05-27: It was agreed that we should not impose a structure on the packages from the build system and represent meta-data for a package separately(typically as a manifest or configuration file). Structure might still be mandated by coding styles and/or practical issues. High-level structure ==================== We envision that the system consists of a number of *packages* that together make up the code of the system. In order to build the server, and associated components, we have a *build frame* (or just *frame*) that is used to manage and, especially, build the system. In order to support the easy addition and removal of features, we assume that each feature is contained in a separate package (see below) and a minimum of changes shall be required (preferably none) to code outside this package to introduce the feature. To support this convention, the build frame has to be independent on the number and type of packages that are available, and use generic methods for deciding what packages are to be included in the build. This in turn requires the packages to provide the necessary information so that the build frame can do its job. Components ========== A component consist of a set of header files and a set of associated C/C++ files. The component is the smallest unit of the physical design. Typically, each component consists of a header file and a C/C++ file with a common base name, for example "parser.h" and "parser.cc". However, there are some cases where it makes sense to have multiple header files for a component and cases when it makes sense to have multiple source files. - Using several header files can be used to present multiple interfaces into a single component. - Using several source files could be mandated when the linker is file-based, and will just map symbols on file-level (loading/linking entire files, not individual functions). In these cases, the files of each component shall have a common prefix distinguishable from other components. =========== ================================================= Component Files =========== ================================================= rpl_filter rpl_filter.h rpl_filter.cc reg_main reg_main_internal.h reg_main_public.h reg_main.cc =========== ================================================= Packages ======== Packages are collections of components that server a common purpose. This formulation is deliberately not exact since what actually makes sense to turn into a package wary from case to case. However, the following issues should be considered when deciding whether a candidate package makes sense as a package: - Can the candidate package be released independently of the rest of the server? If not, i.e., changes to this package is likely to require changes to other packages, then maybe it should not be a package. Releasing here does *not* mean distributing the code in isolation, it means releasing, e.g., a new version of the package for use with the rest of the server. - Is the candidate package very small, e.g., a single component? In this case it might make sense to group several such candidate packages with similar purpose into a single package. A typical example would be support for individual character sets, that does not make sense to place in a single package each, but is sensible as a package of "character set information". Package naming and structure ---------------------------- Each package is represented as a directory. The basic assumption is that everything related to a package should be placed in the directory. This includes, but is not limited to: header files, source files, and unit tests. Basic goals and assumptions are: - Changes in the package internals should not inadvertently affect other packages that use the package - It shall be possible to support third-party solutions as package in the package structure and shall not require re-organization to fit the package structure The package directories will be placed in a *subsystem directory* alongside the ``sql/`` directory. Apart from that, all package directories are placed at the same level. We are placing the packages in a new subsystem directory instead of re-using ``sql/`` to be able to easily distinguish between "unorganized" and "organized" code. The following subsystem directories are proposed (some directories already exists and almost have the basic structure proposed): ========== ============================================== Package Purpose ========== ============================================== storage/ Storage engines server/ Server modules common/ Common utilities ========== ============================================== There are some other directories that are being considered, such as ``mysys/``, and the above list will be extended as needed. Package names shall be small letters only, with underscore to separate individual words in the package name. Note that the package name may not start with an underscore. This choice of name is used to allow the package name to be used both as a file name, a C/C++ symbol, and as identifier in other tools (such as Doxygen). Examples: registry, query_model File names ~~~~~~~~~~ The choice and restrictions on file names is governed by the current coding style. The coding style takes into account operating system restrictions and restrictions imposed by tools such as the compiler, linker, and other processing tools. However, the physical structure itself does not impose any special requirements on the file names. Package namespace ~~~~~~~~~~~~~~~~~ All symbols of a package shall be placed in a single namespace, and the namespace name shall be the same as the name of the package. Since package names as specified above are legal C/C++ symbol names, this will always be possible. Package interfaces ------------------ For each package, there is a set of interfaces into the package. Each interface is represented physically as a header file, meaning that each package have one or more interfaces, but potentially have header files that are not package interfaces. The package owner shall be able to decide what header files are available for users of the package. Initially we will not be able to do this for practical reasons since it requires the build frame to support that. Instead, we will assume that every header file in a package is available as a package interface. Interface usage ~~~~~~~~~~~~~~~ In order to use an interface of a package, the header file is included using the form: #include "package_name/interface.h" The include path is set up by the build system so that this is possible. Note that it is an error to include a file that is not a package interface or not a header file of the same package. Ideally, the build frame will not allow this, but before that feature is implemented in the build frame, it will be possible to do by mistake. Header files of the same package are included using the form: #include "header.h" Coding requirements =================== This section outlines some basic rules that are meant to avoid common problems associated with developing for a package structure as well as allowing tool-support for checking and manipulating components and packages. The need of tool support is necessary to allow the system to grow, since manually resolving issues will unnecessarily waste effort on maintaining inconsistencies. The aim is to keep the rules to a bare minimum and specifically only consider issues that (potentially can) traverse package boundaries or that cause problems when maintaining or operating the build frame. Issues on what is "good coding style" is maintained separately and not part of this worklog. This is done to restrict the scope of the worklog and be able to close it. Every header file should be self-sufficient ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For every header file "header.h", the following program shall compile without errors: #include "header.h" The reason is that when using a header file "header.h", it should be sufficient to include "header.h" holding the functionality sought after. If it is necessary to include any other files before "header.h" because there are definitions required by "header.h", we have two problems: 1. It is hard to find out what dependencies are needed, and it will eventually lead to a trial and error approach that we are now seeing. 2. If the dependencies change, the file might include more files than necessary. Every header file should have an include guard ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For a header file "header.h" in package "package", the include guard should have the name PACKAGE_HEADER_INCLUDED. We choose to standardize the include guard so that we can use external include guards if the need should arise. We omit the extension from the name, since header files may have a number of different extensions and we do not want to standardize on any one of them. Existing include guards that are not violating the standard will not be changed initially, but developers are encouraged to make the change if they are changing the header file. Source and header files should only include definitions it needs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For header files, it is critical to use forward declarations when that suffices. The problems with including definitions that are not needed are twofold: 1. It introduces additional dependencies that are not necessary since definitions contain references to stuff that *it* needs. Note that dependencies may not only be on header files, but that unintended symbols may be pulled into the system. 2. It unnecessarily increase the compile time since it requires opening *at least* one more file (but usually several). This problem is, however, secondary. There shall be no convenience include files inside the server ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Convenience include files are include files whose only purpose is to bundle other include files. The reason to why we want to avoid this *inside* the server is that it introduces unnecessary dependencies between packages (recall that dependencies between components are represented as an #include directive). Should some include file be added to the convenience include file because *one* component needs it, *all* components that include this convenience include will be affected. To avoid introducing unnecessary dependencies in this way we could: 1. Have a rule stating that convenience include may only hold includes that are used by *all* components including this convenience include. This adds an additional burden on developers wanting to add an include to the convenience include to locate each user of the convenience include and decide if they need it. Since the includers of the convenience include is not easily visible in the file, it means searching all packages. Furthermore: with this approach it can be expected that over time, the set of includes in the file will shrink and the purpose of having a convenience include will diminish. 2. Have a rule stating that convenience includes shall not be used, which requires all necessary include files to be mentioned. This is a minor problem from a development perspective and make dependencies between components explicit, hence clear. We chose the latter. However, convenience include files serve a purpose for maintaining interfaces *into* the server is accepted (for example, to make it easier to work with the client interface). For these files it is, however, critical that they are convenience includes and not contain separate definitions. No ``using`` directives (``using namespace``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Placing using directives at namespace level in header files will force any file that includes the header file to resolve symbols in a namespace they have no control over. This can lead to subtle and hard to find bugs, and should therefore not be used. Placing ``using`` directives at namespace level in source files will inject all symbols of that namespace (as ``pkga``) into another namespace (say ``pkgb``). If changes are made to ``pkga``, they may conflict with definitions in ``pkgb`` and since a developer have to ensure the system builds for each patch, he would be forced to make changes in ``pkgb`` despite the fact that the change itself is localized to ``pkga``. No ``using`` declarations before #include directives ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Placing a using declaration before including another file will place all the symbols of the included file in a namespace and should not be used. Entities declared in a component shall be defined in the component ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An object or function declared in the header file of a component shall be defined in the same component (usually in an implementation file). The reason for this rule is that it shall be easy to know what components that need to be linked in order to use the component. If some definition is in another file, it will be hard to find and manage the right dependencies between components in the system. No gratuitous link-time dependencies between components ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Such dependencies can occur if a component, for example, declares an ``extern`` variable and do not include the proper header file. All dependencies shall be explicit in the sense that they shall be visible in the file as an ``#include`` directive. This will allow dependencies to be detected and tracked automatically. ####################################### Appendix A. Definitions and discussions ####################################### Packages ======== Packages are collections of components organized as a cohesive unit (that is, serve a common purpose). Each package has one or more (exported) interfaces, which are represented by one or more header files. Defining files of a package --------------------------- In order to define what files are part of a package, there are basically two options: either supply a file for each package that lists the files of that package, or put all the files of a package into a subdirectory. The advantages of using the file system to define packages by putting all the files of a package in a separate directory suggests that this approach should be used. Interfaces into a package ------------------------- Each package have one or more interfaces represented physically as one or more header files. The header files contains objects and definitions necessary to interface with the package, so we have no restrictions on the structure (but do have some recommendations for how to structure the interfaces for maintainability). Each interface is normally defined with some strategic objective, i.e., it is created for an intended set of users. We use *export target* to denote such a set of users of an interface. Library functions usually have only one export target, but many of our packages have several export targets such as "client developers" who write application clients to the server, "storage engine developers" who are creating a storage engine for the server, "plugin developers" that are writing a plug-in for the server. For each export target, we should ensure that the header files holding the interfaces is defined in such a manner that only the parts needed by that export target is included when that header file is included. Gratuitous definitions is a problem since they might clash with the names defined by the user, and also introduces an unnecessary dependency on parts of the server that the user does not in reality depend on. In short, interfaces into packages are represented as one or more header files, and we have two basic methods to identify the interface files: by naming convention (for example, placing the interface file in a separate directory) or by using one or more configuration files that explicitly the interface interface files. Using naming conventions ~~~~~~~~~~~~~~~~~~~~~~~~ For this discussion, we assume that the exported interface header files are put into the export/ directory. However, the same arguments apply to other schemes for using naming conventions. Note that each header file might correspond to a source file placed in the main package directory, like this: goobar/ export/ goo_interface.h goo_impl.cc . . . The advantage of this approach is: - Simplicity: normal file commands can be used to work with files. For example, to copy all files needed by a plugin-sdk could be as simple as: cp package/export/*.h /distro/include The disadvantages are: - Changing the status of a file from, e.g., internal to public requires moving the file and not all VCS systems support that well. - Having multiple "export targets" (users of the interface) require separate directories. For example, a package could export an interface for third-party users and one towards the rest of the server packages. Configuration file ~~~~~~~~~~~~~~~~~~ We somehow add extra configuration file(s) in the package to denote if the header file is exported. For this approach, we have two alternatives: a) Add a file parallel to the header file, e.g., the fact that "foo.h.export" exists could mean that the header file "foo.h" is an exported file. b) We introduce a "manifest" file for each package, containing information about the files in the package. The advantages of this approach is [incomplete list]: - Changing the properties of a file (e.g., from "internal" to "exported") does not require any changes to the file itself. - It allows header files to be marked with other properties, such as header files that are supposed to be exported to third-party developers. The disadvantages are: - Working with files is not trivial, e.g., copying all header files that goes into the plugin SDK could be: cp `grep plugin-sdk package/manifest | cut -f1` /distro/include Include file and path management -------------------------------- In order to manage the include path and the include files, it is necessary to ensure that all the header files that are exported are available for every package in the system, and *only* those files. To handle this, we basically have two approaches: 1. We have an include path containing the directory where the exported header files for each package is stored. This require the header files to be placed in a dedicated "export/" directory inside the package: otherwise, all header files of a package will be exportable, which is not the intention. So, for example, the include path could be set to pkb_a/export;pkg_b/export;pkg_c/export Whenever a package is added or removed, this would mean that the include path would have to be updated to match the actual packages available. The advantages of this approach are: - Simple model - No need to generate or copy files The disadvantages of this approach are: - If a package is added or removed, the include path have to be updated. Since every package depends on the include path, it might trigger a re-build. - If a header file with the same name is in multiple places, it will not be detected. - Is most cases, the source control systems will generate a conflict for the addition and removal of a directory to the include path in the build file (e.g., configure.ac or Makefile.am). 2. We have a dedicated include directory for, e.g., the server where exported header are available, and let the manifest file contain information on what files are to be made available in the central include directory. This would mean that the path stays the same regardless of what packages are available. For this approach, we have two "sub-approaches" on how to make the header files available from the include directory: a) Copy the files to the dedicated include directory b) Generate a header file holding only an #include directive referencing to the correct header file. The advantages of this approach is: - That there is no need to maintain an extensive include path to be able to compile a package (which might have dependencies on other packages). - Package maintenance is very easy. For example, adding a package does not require changing any include paths or anything at all in the build frame. - Conflicting header files will be detected during the build process (e.g., when copying header files to the include directory). The disadvantages are: - Requires more work in the build frame. - It requires a "staging" phase, where header files are made available in the dedicated include directory, either by copying or generating files. - In the copy approach (2a), it is necessary to build a dependencies Makefile for the include directory, to trigger a copy whenever the original header file changes. - In the copy approach (2a), it is possible that a developer starts editing the wrong file, which will then be overwritten at some later point, which will be hard to discover.
Implementation ============== In order to implement the structure described in the high-level specification, we should approach it in well-contained steps that lead us to the goal. For example, since we need to develop a build frame for supporting this, we need an intermediate solution that does not cause problems for the final deployment of the build structure and allow developers to work on creating packages without introducing problems for the build frame. Stage 1: Create the directory structure --------------------------------------- Introduce the package directories and move the existing packages we have into that directory. At this stage we will keep the existing autotools-based build system and just do the minimal changes necessary to have a fully functional system. We assume that the original "unstructured" sql/ code is dependent on the packages, but that we have control over the dependencies between packages in the "structured" directories. In order to add a package, it will be necessary to: - Create a Makefile.am for the package - Add a reference under "SUBDIRS" in the parent directory Note that in this stage, all header files in a package will be available as package interface files, so care should be used when including header files from other packages. After this stage, developers will be able to create packages properly without affecting the following stages. Stage 2: Evaluate and optionally change to use CMake ---------------------------------------------------- It has been discussed if we should use CMake to build the server on all platforms and not just Windows, since it seems to be a portable alternative. However, concerns have been raised about the portability of CMake to the platforms that we need to support, so this alternative need to be evaluated before implementation starts. The goal is to have an equivalently simple build system compared to existing one, which also include being able to handle the system for defining pluggable storage engines. If the evaluation does not show problems with doing the switch, the replacement should be done in two steps: just switching build system but otherwise maintain the structure and build order of the old system. We do this step separately since it will require merging the build process on windows with the existing autotools-based build frame and still maintain the same functionality. After this stage, we will have a single build frame for all platforms, but there will still be problems such as that package interfaces are not distinguished from other header files. Stage 3: Streamline and consolidate build frame ----------------------------------------------- At this stage, the build frame will be consolidated by ensuring that there is support for easily working with the code.
Copyright (c) 2000, 2018, Oracle Corporation and/or its affiliates. All rights reserved.