8 Writing

This chapter provides guidelines and tips on writing various types of documents reporting research. We will focus specifically on research papers aspiring for publication in a journal or conference proceedings, but the majority of the tips apply also to master’s and PhD theses, as we mention later.

8.1 Paper Structure

Research papers tend to have a standardized structure that helps readers find information they are looking for quickly. In general, the research paper consists of a title, abstract, introduction, main paper body (including the methods and results), related work, and conclusion.

Specific, detailed structure depends on the type of the paper and the research methods used. When in doubt about the structure of our paper, we can use high-quality papers in the given area describing a similar research method for inspiration. For example, if we are conducting an unstructured interview in human-computer interaction, we can use a high-quality HCI paper describing an unstructured interview as a reference, even when the topic is different. For specific situations, there may even exist guidelines, such as the Reporting guidelines for controlled experiments in software engineering (Jedlitschka & Pfahl, 2005) and others.

The section structuring must be logical; the relationship of a subsection to its parent section should be “is-part-of”, “is-type-of”, “is-aspect-of” or similar. When describing multiple entities, e.g., tools, methods, or experiments, order them in some consistent way: chronologically, by specificity, by efficiency, forming a data flow from inputs to results, etc.

Similar to reading, a paper is usually not written head-to-tail. Starting with a hierarchical outline consisting of section headings takes little time but provides a quick overview for self-critique and collaboration. We should reorder or change section structure during writing whenever there is a need.

An overview of the standard parts of research papers follows.

8.1.1 Title

A title should be as short as possible to uniquely identify the paper’s main contribution, but not shorter. A rule of thumb is that there should always exist only one published paper with a given title in the world. Naming a paper “Tree-Based Database Index” would be too general nowadays, as it suggests this is the first paper proposing tree-based database indexes ever. On the other hand, the title “Design, Implementation, and Evaluation of Extremely Time- and Space-Efficient Indexes in Relational Databases Based on A Combination of AVL Trees and B-Trees” contains superfluous words.

8.1.2 Abstract

An abstract is a very compressed summary of the whole paper. Since a reader will often decide whether to read your paper or not based on the abstract, it should receive appropriate attention. In general, the abstract should contain:

Background (context): Important facts that are necessary to understand the research problem.
Problem statement or goal: What are we trying to solve or find out and why?
Approach and/or method: A very high-level overview of the designed approach (if any) and the research methods used.
Results: Specific (e.g., numeric) main results found by applying the method described above.
Conclusion: The implications that our solution has in a larger context; possibly also limitations or future work.

Ideally, each part should be approximately the same length. A common beginner’s mistake is a too long background description, mentioning facts that almost all researchers in the area already know.

The present and past tense should be used in the abstract, unless an actual future event is described. An abstract should not contain too general statements; e.g., “we discuss the results and review related work” applies to almost all papers. An abstract is usually written as one paragraph, but some journals encourage or require structured abstracts, containing named inline headings.

In Koopman’s essay on how to write an abstract, we can find more specific useful hints.

8.1.3 Introduction

An introduction is a more verbose summary of the paper with less or no details about the method and specific results. Instead, a special emphasis is on the motivation: What is the problem and why is it worth solving? Note that motivation cannot be relevant solely to the authors of the paper, such as “we are interested in this topic” or “our department has a spare box of Raspberry Pis in a storage room.”

According to Booth et al. (2016), an introduction should consist of three main parts: context, problem, and response. Context is information that the readers of the paper probably already know, relevant to our main goal. It sets the scene for a problem, which names the shortcoming(s) of the current state of the art, such as missing parts of knowledge or disadvantages of existing techniques. After reading the problem description, the motivation should be obvious for the reader: Do we gain a certain advantage by pursuing this research? Or will ignoring the problem have unintended consequences? A response is our solution to the problem, which is often an empirical study aiming to fill a knowledge gap, a design of an improved approach, or both.

A typical (significantly shortened) example of this three-point structure may look like this:

Smartphone users rely heavily on touch keyboards to enter textual input. _(context) However, touch keyboards are significantly less accurate than the physical ones. _(problem) Foo et al. [1] measured a 54% lower accuracy, while Bar et al. [2] even a 67% lower one. _{(problem evidence)} Haptic feedback improves accuracy only marginally [3]. _{(problem of existing solutions)} We present a 3D keyboard temporarily raising over the display surface, which provides user experience similar to physical keyboards. It was successfully validated in a controlled experiment with 40 participants. _(solution)

In the introduction, we cite relevant sources, but not all of them. We should focus there only on the most significant references that directly cause, acknowledge, or try solving our problem. Although we describe our solution briefly in the introduction, detailed and specific results are usually left until the later parts of the paper.

A bulleted list of contributions (e.g., design of a new method X, a formal proof of the algorithm Y, and a survey on topic Z) is sometimes appreciated by reviewers. Some reviewers also require a paragraph resembling a table of contents (In section 2, we review related work. Section 3 describes …), but others despise it.

When describing the problem we try to solve in the paper, we can illustrate it on a specific example, e.g.: “Suppose a programmer searches for all occurrences of the method xyz()…” Then we can optionally continue by following up on it in the rest of the paper, which is called a running example.

8.1.4 Main Paper Body

Which sections follow after the introduction differs widely between papers. For empirical studies, the generally accepted section structure is:

Method,
Results,
Discussion,
and Threats to Validity.

The Method section describes the design and procedure of the empirical study that was performed. The method should be described in sufficient detail to prevent ambiguity as much as possible. The Results section should clearly and objectively describe the results obtained using the already explained method. In the discussion, we can discuss more subjective or interpretative questions such as: What is the cause of the given results? What implications do the results have? Why are they important? In some cases, particularly if the discussion would be too short, we can merge it with the Results section, provided we distinguish between objective results and our interpretation. Finally, the threats to validity discuss in what ways our results might not be valid and what we did to mitigate them if possible.

Papers describing artifacts (design science), i.e., new approaches, methods, prototypes, or systems, do not have any prescribed structure. The general rule of appropriate section ordering still holds though. We often start with an overall high-level design and then describe the individual smaller components, e.g., in the order of the data flow. The order in which the author designed and implemented the parts is generally irrelevant and should not be mentioned unless it is critical for understanding.

In any case, the section structure is not carved in stone. For example, a paper can contain multiple experiments, each described in a section with its separate Method and Results subsections. If a paper also contains a design of a new approach, we can, for instance, describe a new algorithm in the section “Indexing algorithm” and then evaluate it in the section ”Evaluation” with the subsections Method and Results.

Since an implementation of a prototype is a time-consuming activity, many beginning computer science researchers think that implementation details deserve to be described thoroughly. On the contrary, we should refrain from including source code excerpts unless they represent the core idea. Even then, pseudocode is better. Mentioning the specific classes, methods and variables representing implementation details should generally be avoided in the body of the paper. Does the reader really need to know that the result of calling getCoordinates() is assigned to currentCoordinates?

8.1.6 Conclusion

The Conclusion section is, again, a summary of the article, but this time leaving out the motivation and focused on the results instead. In the conclusion, we describe:

the main point and the most important results of the paper,
applications and implications: how the results can be used, even from a broader point of view,
the limitations of our research,
and possible future work.

8.2 General Guidelines

In general, we should write our paper in a way the reader or reviewer:

will understand it,
and will be convinced the research is relevant and valid.

Therefore, the first question that we need to ask is: Who is our reader? The readers of research papers are usually researchers working in the same subfield of computer science (e.g., operating systems or human-computer interaction) who were, sadly, not looking over our shoulder when we actually did the research. There is thus no need to explain how Git works in a study of GitHub projects. On the other hand, a random software engineering researcher has no idea whether we sampled the repositories using simple random sampling and how exactly did we even enumerate a list of all repositories (as GitHub itself does not have a time-efficient feature for this).

While a paper seems like a monologue, it is in fact an imaginary conversation with a skeptical reader (Booth et al., 2016), who we try to convince about the relevance and validity of our paper using arguments. When we are writing, we should imagine we are the reader (or better, the reviewer), try to anticipate relevant questions, and answer them at appropriate places. A good way to achieve this is to form research arguments in a structure according to Booth et al. (2016). An argument can contain a claim, reasons, evidence, the acknowledgment of objections with responses to them, and a warrant.

A claim is a statement that is true according to us, and we try to persuade the reader about its truthfulness by providing logical reasons. A reason should be supported by evidence, such as the results of empirical research or a formal proof. We can then acknowledge most relevant potential objections of a reader: Why the reasons we mentioned could be invalid or irrelevant and the evidence insufficient? We respond to such objections by providing further claims, reasons, and evidence. In rare cases, a warrant may be necessary to explain our way of thinking, i.e., why given evidence supports the reason, which, in turn, supports the claim.

An argument can be illustrated on this condensed example:

Absent Readme files are an impassable obstacle for newcomers of open-source Java projects, _(reason) as 97% of respondents would not attempt to build a project without Readme _(evidence). An automated Readme generator could thus offer faster onboarding of newcomers to open-source projects. _(claim) Although there are other obstacles for open-source involvement besides the failure to build the project from source _{(acknowledgment)}, this reason was by far the most frequent in the survey. _(response)

In the example, a warrant could be that building a project from source is a necessary precondition to become a meaningful code contributor in a Java project – but the readers probably already know that, so we did not explicitly mention it.

We can write great arguments, but if a reader does not understand what we write, our effort is futile.

Research papers are often limited in length, either strictly by a page count or loosely by the time a reader is willing to allocate to reading it. In a too long text, the main point can be lost easily. We should aim to write in the shortest way possible while still preserving unambiguity and understandability. Ideally, in a paper, “every sentence should be necessary” (Zobel, 2014). To achieve this, we should choose a central point (or a few points) of the paper – a message that we would like to deliver to our reader. Then, we imagine dependencies of this point, which are sentences logically leading to it, such as reasons, evidence, or other statements that support or explain it in some way. We repeat this transitively. Finally, everything that is not a transitive dependency of the main point(s) should be deleted. This rule cannot, of course, be applied literally. However, if in a paper on the automated generation of user interfaces for Android, we read that Android Inc. was founded by Andy Rubin and colleagues in 2003, we know something is definitely wrong.

Some (especially inexperienced) researchers try to hide the fact that not much work was actually done by deliberately obfuscating the paper. We should write clearly and be honest about how the research was performed. Obfuscation will only make the review more difficult, not more positive.

In The Craft of Research, Booth et al. (2016) provide a few useful stylistic suggestions on how to write easy-to-understand texts:

Denote actions in the text by actual verbs; avoid complex subjects produced by nominalization. For example, “A participant clicked the button five times because the user interface of the calendar was unresponsive” is better than “Due to the unresponsiveness of the user interface of the calendar, a participant clicked the button five times.”
Start a sentence with simple concepts or terms already mentioned in the text; new or difficult concepts should be moved to the end. For instance, “Two patterns are combined into a multi-pattern. This multi-pattern is then processed using meta-parsification by the ultra-foobarificator” reads better than “A multi-pattern is produced from two patterns. Using meta-parsification, the ultra-foobarificator then processes this multi-pattern.”
Do not hesitate to use active voice (“We showed that…”), but use passive voice (“The result was produced…”) when it fits better semantically or to satisfy the two previous rules.

For more guidelines on how to write a great research paper, we recommend a presentation by Simon Peyton Jones (it also has a shorter version).

8.3 Writing Process

Writing a paper or a thesis is not a linear process. People rarely produce perfect writing on the first try. We should start writing a rough draft, noting our ideas as they come, even using unfinished sentences. Then we can return to individual portions of the paper and modify, reorganize, and delete them as necessary. Thus, unless the paper is already sent for review, we can conclude:

A research paper is not a blockchain, we can modify it!

During writing, we should read the paper regularly ourselves and ask whether it would make sense for our readers. Before submission, we need to read it completely from start to end to find consistency issues and fix stylistic and grammar errors.

8.4 Tables and Figures

Each table and figure needs to be appropriately labeled. The label should be specific enough, e.g., “Correlation between age and the number of source code lines written per day” instead of “Age and lines”.

Columns of a table should be named. If it makes sense, rows can be named too. Horizontal lines in tables should be used as prescribed by the given template, while vertical lines are to be avoided completely (unless the template says otherwise).

To graphically represent data, many researchers choose inappropriate chart types. For instance, they overuse pie charts, which make it difficult to visually compare the quantities. The type of a chart needs to be chosen based on what we need to show to a reader, e.g., for a comparison of cyclical data with many periods over time, we can use a circular area chart. To select an appropriate visualization, we recommend the document Chart Suggestions – A Thought-Starter by Abela.

Each dimension in a plot (x- and y-axis, color, shape) must be labeled so that the reader knows what it represents. Truncated (not starting at zero) or exponential scale can be used only if it is really necessary, and even then, it has to be explicitly mentioned in the text describing the chart or in its label. Visual elements should have meaning; they should not be added for the sake of decoration, such as in the case of 3D pie charts.

8.5 Citing Sources

In a research paper, we need to cite sources in a way that clearly distinguishes these three options:

our own idea (no reference),
a paraphrased idea: according to X [1], something is true,
and a quoted phrase: according to X [1], “something is true.”

Although BibTeX generates the reference list at the end of the paper automatically, we need to check that the entries are valid and complete. Some bibliography styles transform all letters of a title except the first one to lowercase. This can lead to errors such as “The java and jvm specification”, which can be fixed with braces in the *.bib file: The {J}ava and {JVM} Specification.

Some bibliography styles pair with the natbib package and author-date citations that produce natural-sounding sentences, e.g.: “Smith (2020) described…”. However, even when the publisher’s template prescribes numeric citations (and recommends not using natbib), we would often like to mention the authors’ names in sentences for fluency. Then we need to write the names manually according to these rules:

for one author, mention the surname only: Smith [1],
for two authors, connect the surnames with “and”: Smith and Novak [1],
and for three or more authors, use “et al.”: Smith et al. [1].

8.6 Writing Theses

After reviewing the rules and guidelines common to many types of research documents, including both papers and theses, in this section we focus on the specifics of writing master’s and PhD theses.

8.6.1 Master’s Theses

At this place, we could expect a long list of elaborate tips on writing theses. Instead, there is only one tip:

A master’s thesis should ideally be a longer version of a research paper.

We should thus aim to write the master’s thesis using exactly the same rules and hints, with a few minor differences, such as:

Related work is mandatorily placed after the Introduction, and it can (and should) consist of multiple chapters named appropriately, i.e., not simply “Related Work”. It can be more verbose and detailed, but it cannot become “Unrelated Work”.
Technical details, such as architecture diagrams or measurement instruments (e.g., survey questions) can be described more thoroughly. Still, we should not go into a level of detail that is insignificant for the reader.
Compared a research paper, a thesis also aims to demonstrate that the student is capable of doing research. Given more space allocated for writing, we can state more reasons, evidence, acknowledgments, responses, and warrants in our arguments.

Of course, this is an ideal situation that may or may not be attainable, depending on the specific topic of the thesis and the author’s ambitions.

8.6.2 PhD Theses

Usually, a PhD student has multiple papers published during the studies. A PhD thesis is then often a collection of multiple research papers written by a PhD student on the same topic. For instance, each chapter is based on one research paper: first a systematic literature review, then a survey of the problems, then a solution to the problem with a preliminary evaluation, and finally a full validation of the solution.

Besides stapling the papers together, we need to write the Introduction and Conclusion chapters that clearly describe the logical connection between the individual chapters or papers. Then, the content of the chapters should be re-read and edited, so that the thesis is readable from start to end and makes sense as a whole.

Exercises

Shorten the long title mentioned in Section 8.1.1 appropriately. You can list multiple variants, along with corresponding hypothetical assumptions that make a given variant the best choice.
Find one good abstract and one abstract that breaks multiple recommendations.
Mark the context, problem, and response in the Introduction chapter of your bachelor’s or master’s thesis.
Find a paper that includes source code snippets in its main text (not appendices). Is it well-justified in this case or only a sign of low quality?
How does a conclusion differ from the introduction? You can illustrate this on a hypothetical example.
Extract a complete section structure from a conference paper about 8–12 pages long. Is it logical? How deeply is it nested?
In a paper that you like, find three or four most relevant related works (as indicated by the authors). In which sections are they present? Are they referenced multiple times?
In an existing paper, find an argument consisting of a claim, reason, evidence, acknowledgment/response, and optionally a warrant.
Write a short text (2–4 sentences) that satisfies the first two stylistic rules from The Craft of Research mentioned at the end of Section 8.2. Rewrite the text so that it violates them.
What type of chart would you use to show:
1. the correlation between the monitor resolution and the average number of applications open simultaneously,
2. the average resolution of monitors from year 1970 to 2025?
Which of the rules and tips in this chapter that you did not know before will help you improve your master’s thesis the most? Do you see any impediments why you will not be able to apply some of them?