Jonathan Corbet | 75b0214 | 2008-09-30 15:15:56 -0600 | [diff] [blame] | 1 | 3: EARLY-STAGE PLANNING |
| 2 | |
| 3 | When contemplating a Linux kernel development project, it can be tempting |
| 4 | to jump right in and start coding. As with any significant project, |
| 5 | though, much of the groundwork for success is best laid before the first |
| 6 | line of code is written. Some time spent in early planning and |
| 7 | communication can save far more time later on. |
| 8 | |
| 9 | |
| 10 | 3.1: SPECIFYING THE PROBLEM |
| 11 | |
| 12 | Like any engineering project, a successful kernel enhancement starts with a |
| 13 | clear description of the problem to be solved. In some cases, this step is |
| 14 | easy: when a driver is needed for a specific piece of hardware, for |
| 15 | example. In others, though, it is tempting to confuse the real problem |
| 16 | with the proposed solution, and that can lead to difficulties. |
| 17 | |
| 18 | Consider an example: some years ago, developers working with Linux audio |
| 19 | sought a way to run applications without dropouts or other artifacts caused |
| 20 | by excessive latency in the system. The solution they arrived at was a |
| 21 | kernel module intended to hook into the Linux Security Module (LSM) |
| 22 | framework; this module could be configured to give specific applications |
| 23 | access to the realtime scheduler. This module was implemented and sent to |
| 24 | the linux-kernel mailing list, where it immediately ran into problems. |
| 25 | |
| 26 | To the audio developers, this security module was sufficient to solve their |
| 27 | immediate problem. To the wider kernel community, though, it was seen as a |
| 28 | misuse of the LSM framework (which is not intended to confer privileges |
| 29 | onto processes which they would not otherwise have) and a risk to system |
| 30 | stability. Their preferred solutions involved realtime scheduling access |
| 31 | via the rlimit mechanism for the short term, and ongoing latency reduction |
| 32 | work in the long term. |
| 33 | |
| 34 | The audio community, however, could not see past the particular solution |
| 35 | they had implemented; they were unwilling to accept alternatives. The |
| 36 | resulting disagreement left those developers feeling disillusioned with the |
| 37 | entire kernel development process; one of them went back to an audio list |
| 38 | and posted this: |
| 39 | |
| 40 | There are a number of very good Linux kernel developers, but they |
| 41 | tend to get outshouted by a large crowd of arrogant fools. Trying |
| 42 | to communicate user requirements to these people is a waste of |
| 43 | time. They are much too "intelligent" to listen to lesser mortals. |
| 44 | |
| 45 | (http://lwn.net/Articles/131776/). |
| 46 | |
| 47 | The reality of the situation was different; the kernel developers were far |
| 48 | more concerned about system stability, long-term maintenance, and finding |
| 49 | the right solution to the problem than they were with a specific module. |
| 50 | The moral of the story is to focus on the problem - not a specific solution |
| 51 | - and to discuss it with the development community before investing in the |
| 52 | creation of a body of code. |
| 53 | |
| 54 | So, when contemplating a kernel development project, one should obtain |
| 55 | answers to a short set of questions: |
| 56 | |
| 57 | - What, exactly, is the problem which needs to be solved? |
| 58 | |
| 59 | - Who are the users affected by this problem? Which use cases should the |
| 60 | solution address? |
| 61 | |
| 62 | - How does the kernel fall short in addressing that problem now? |
| 63 | |
| 64 | Only then does it make sense to start considering possible solutions. |
| 65 | |
| 66 | |
| 67 | 3.2: EARLY DISCUSSION |
| 68 | |
| 69 | When planning a kernel development project, it makes great sense to hold |
| 70 | discussions with the community before launching into implementation. Early |
| 71 | communication can save time and trouble in a number of ways: |
| 72 | |
| 73 | - It may well be that the problem is addressed by the kernel in ways which |
| 74 | you have not understood. The Linux kernel is large and has a number of |
| 75 | features and capabilities which are not immediately obvious. Not all |
| 76 | kernel capabilities are documented as well as one might like, and it is |
| 77 | easy to miss things. Your author has seen the posting of a complete |
| 78 | driver which duplicated an existing driver that the new author had been |
| 79 | unaware of. Code which reinvents existing wheels is not only wasteful; |
| 80 | it will also not be accepted into the mainline kernel. |
| 81 | |
| 82 | - There may be elements of the proposed solution which will not be |
| 83 | acceptable for mainline merging. It is better to find out about |
| 84 | problems like this before writing the code. |
| 85 | |
| 86 | - It's entirely possible that other developers have thought about the |
| 87 | problem; they may have ideas for a better solution, and may be willing |
| 88 | to help in the creation of that solution. |
| 89 | |
| 90 | Years of experience with the kernel development community have taught a |
| 91 | clear lesson: kernel code which is designed and developed behind closed |
| 92 | doors invariably has problems which are only revealed when the code is |
| 93 | released into the community. Sometimes these problems are severe, |
| 94 | requiring months or years of effort before the code can be brought up to |
| 95 | the kernel community's standards. Some examples include: |
| 96 | |
| 97 | - The Devicescape network stack was designed and implemented for |
| 98 | single-processor systems. It could not be merged into the mainline |
| 99 | until it was made suitable for multiprocessor systems. Retrofitting |
| 100 | locking and such into code is a difficult task; as a result, the merging |
| 101 | of this code (now called mac80211) was delayed for over a year. |
| 102 | |
| 103 | - The Reiser4 filesystem included a number of capabilities which, in the |
| 104 | core kernel developers' opinion, should have been implemented in the |
| 105 | virtual filesystem layer instead. It also included features which could |
| 106 | not easily be implemented without exposing the system to user-caused |
| 107 | deadlocks. The late revelation of these problems - and refusal to |
| 108 | address some of them - has caused Reiser4 to stay out of the mainline |
| 109 | kernel. |
| 110 | |
| 111 | - The AppArmor security module made use of internal virtual filesystem |
| 112 | data structures in ways which were considered to be unsafe and |
| 113 | unreliable. This code has since been significantly reworked, but |
| 114 | remains outside of the mainline. |
| 115 | |
| 116 | In each of these cases, a great deal of pain and extra work could have been |
| 117 | avoided with some early discussion with the kernel developers. |
| 118 | |
| 119 | |
| 120 | 3.3: WHO DO YOU TALK TO? |
| 121 | |
| 122 | When developers decide to take their plans public, the next question will |
| 123 | be: where do we start? The answer is to find the right mailing list(s) and |
| 124 | the right maintainer. For mailing lists, the best approach is to look in |
| 125 | the MAINTAINERS file for a relevant place to post. If there is a suitable |
| 126 | subsystem list, posting there is often preferable to posting on |
| 127 | linux-kernel; you are more likely to reach developers with expertise in the |
| 128 | relevant subsystem and the environment may be more supportive. |
| 129 | |
| 130 | Finding maintainers can be a bit harder. Again, the MAINTAINERS file is |
| 131 | the place to start. That file tends to not always be up to date, though, |
| 132 | and not all subsystems are represented there. The person listed in the |
| 133 | MAINTAINERS file may, in fact, not be the person who is actually acting in |
| 134 | that role currently. So, when there is doubt about who to contact, a |
| 135 | useful trick is to use git (and "git log" in particular) to see who is |
| 136 | currently active within the subsystem of interest. Look at who is writing |
| 137 | patches, and who, if anybody, is attaching Signed-off-by lines to those |
| 138 | patches. Those are the people who will be best placed to help with a new |
| 139 | development project. |
| 140 | |
| 141 | If all else fails, talking to Andrew Morton can be an effective way to |
| 142 | track down a maintainer for a specific piece of code. |
| 143 | |
| 144 | |
| 145 | 3.4: WHEN TO POST? |
| 146 | |
| 147 | If possible, posting your plans during the early stages can only be |
| 148 | helpful. Describe the problem being solved and any plans that have been |
| 149 | made on how the implementation will be done. Any information you can |
| 150 | provide can help the development community provide useful input on the |
| 151 | project. |
| 152 | |
| 153 | One discouraging thing which can happen at this stage is not a hostile |
| 154 | reaction, but, instead, little or no reaction at all. The sad truth of the |
| 155 | matter is (1) kernel developers tend to be busy, (2) there is no shortage |
| 156 | of people with grand plans and little code (or even prospect of code) to |
| 157 | back them up, and (3) nobody is obligated to review or comment on ideas |
| 158 | posted by others. If a request-for-comments posting yields little in the |
| 159 | way of comments, do not assume that it means there is no interest in the |
| 160 | project. Unfortunately, you also cannot assume that there are no problems |
| 161 | with your idea. The best thing to do in this situation is to proceed, |
| 162 | keeping the community informed as you go. |
| 163 | |
| 164 | |
| 165 | 3.5: GETTING OFFICIAL BUY-IN |
| 166 | |
| 167 | If your work is being done in a corporate environment - as most Linux |
| 168 | kernel work is - you must, obviously, have permission from suitably |
| 169 | empowered managers before you can post your company's plans or code to a |
| 170 | public mailing list. The posting of code which has not been cleared for |
| 171 | release under a GPL-compatible license can be especially problematic; the |
| 172 | sooner that a company's management and legal staff can agree on the posting |
| 173 | of a kernel development project, the better off everybody involved will be. |
| 174 | |
| 175 | Some readers may be thinking at this point that their kernel work is |
| 176 | intended to support a product which does not yet have an officially |
| 177 | acknowledged existence. Revealing their employer's plans on a public |
| 178 | mailing list may not be a viable option. In cases like this, it is worth |
| 179 | considering whether the secrecy is really necessary; there is often no real |
| 180 | need to keep development plans behind closed doors. |
| 181 | |
| 182 | That said, there are also cases where a company legitimately cannot |
| 183 | disclose its plans early in the development process. Companies with |
| 184 | experienced kernel developers may choose to proceed in an open-loop manner |
| 185 | on the assumption that they will be able to avoid serious integration |
| 186 | problems later. For companies without that sort of in-house expertise, the |
| 187 | best option is often to hire an outside developer to review the plans under |
| 188 | a non-disclosure agreement. The Linux Foundation operates an NDA program |
| 189 | designed to help with this sort of situation; more information can be found |
| 190 | at: |
| 191 | |
| 192 | http://www.linuxfoundation.org/en/NDA_program |
| 193 | |
| 194 | This kind of review is often enough to avoid serious problems later on |
| 195 | without requiring public disclosure of the project. |