tracing/function-graph-tracer: provide documentation for the function graph tracer

Update documentation for the function graph tracer.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 758fb42..055bcd2 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -129,6 +129,10 @@
 
   set_ftrace_pid: Have the function tracer only trace a single thread.
 
+  set_graph_function: Select the function where the trace have to start
+		with the function graph tracer (See the section
+		"dynamic ftrace" for more details).
+
   available_filter_functions: This lists the functions that ftrace
 		has processed and can trace. These are the function
 		names that you can pass to "set_ftrace_filter" or
@@ -143,6 +147,12 @@
 
   function - function tracer that uses mcount to trace all functions.
 
+  function_graph_tracer - similar to the function tracer except that the
+		function tracer probes the functions on their entry whereas the
+		function graph tracer traces on both entry and exit of the
+		functions. It then provides the ability to draw a graph of
+		function calls like a primitive C code source.
+
   sched_switch - traces the context switches between tasks.
 
   irqsoff - traces the areas that disable interrupts and saves
@@ -1226,6 +1236,163 @@
 [...]
 
 
+function graph tracer
+---------------------------
+
+This tracer is similar to the function tracer except that it probes
+a function on its entry and its exit.
+This is done by setting a dynamically allocated stack of return addresses on each
+task_struct. Then the tracer overwrites the return address of each function traced
+to set a custom probe. Thus the original return address is stored on the stack of return
+address in the task_struct.
+
+Probing on both extremities of a function leads to special features such as
+
+_ measure of function's time execution
+_ having a reliable call stack to draw function calls graph
+
+This tracer is useful in several situations:
+
+_ you want to find the reason of a strange kernel behavior and need to see
+  what happens in detail on any areas (or specific ones).
+_ you are experiencing weird latencies but it's difficult to find its origin.
+_ you want to find quickly which path is taken by a specific function
+_ you just want to see what happens inside your kernel
+
+# tracer: function_graph
+#
+# CPU  DURATION                  FUNCTION CALLS
+# |     |   |                     |   |   |   |
+
+ 0)               |  sys_open() {
+ 0)               |    do_sys_open() {
+ 0)               |      getname() {
+ 0)               |        kmem_cache_alloc() {
+ 0)   1.382 us    |          __might_sleep();
+ 0)   2.478 us    |        }
+ 0)               |        strncpy_from_user() {
+ 0)               |          might_fault() {
+ 0)   1.389 us    |            __might_sleep();
+ 0)   2.553 us    |          }
+ 0)   3.807 us    |        }
+ 0)   7.876 us    |      }
+ 0)               |      alloc_fd() {
+ 0)   0.668 us    |        _spin_lock();
+ 0)   0.570 us    |        expand_files();
+ 0)   0.586 us    |        _spin_unlock();
+
+
+There are several columns that can be dynamically enabled/disabled.
+You can use every combination of options you want, depending on your needs.
+
+_ The cpu number on which the function executed is default enabled.
+  It is sometimes better to only trace one cpu (see tracing_cpu_mask file)
+  or you might sometimes see unordered function calls while cpu tracing switch.
+
+	hide: echo nofuncgraph-cpu > /debug/tracing/trace_options
+	show: echo funcgraph-cpu > /debug/tracing/trace_options
+
+_ The duration (function's time of execution) is displayed on the closing bracket
+  line of a function or on the same line than the current function in case of a leaf
+  one. It is default enabled.
+
+	hide: echo nofuncgraph-duration > /debug/tracing/trace_options
+	show: echo funcgraph-duration > /debug/tracing/trace_options
+
+_ The overhead field precedes the duration one in case of reached duration thresholds.
+
+	hide: echo nofuncgraph-overhead > /debug/tracing/trace_options
+	show: echo funcgraph-overhead > /debug/tracing/trace_options
+	depends on: funcgraph-duration
+
+  ie:
+
+  0)               |    up_write() {
+  0)   0.646 us    |      _spin_lock_irqsave();
+  0)   0.684 us    |      _spin_unlock_irqrestore();
+  0)   3.123 us    |    }
+  0)   0.548 us    |    fput();
+  0) + 58.628 us   |  }
+
+  [...]
+
+  0)               |      putname() {
+  0)               |        kmem_cache_free() {
+  0)   0.518 us    |          __phys_addr();
+  0)   1.757 us    |        }
+  0)   2.861 us    |      }
+  0) ! 115.305 us  |    }
+  0) ! 116.402 us  |  }
+
+  + means that the function exceeded 10 usecs.
+  ! means that the function exceeded 100 usecs.
+
+
+_ The task/pid field displays the thread cmdline and pid which executed the function.
+  It is default disabled.
+
+	hide: echo nofuncgraph-proc > /debug/tracing/trace_options
+	show: echo funcgraph-proc > /debug/tracing/trace_options
+
+  ie:
+
+  # tracer: function_graph
+  #
+  # CPU  TASK/PID        DURATION                  FUNCTION CALLS
+  # |    |    |           |   |                     |   |   |   |
+  0)    sh-4802     |               |                  d_free() {
+  0)    sh-4802     |               |                    call_rcu() {
+  0)    sh-4802     |               |                      __call_rcu() {
+  0)    sh-4802     |   0.616 us    |                        rcu_process_gp_end();
+  0)    sh-4802     |   0.586 us    |                        check_for_new_grace_period();
+  0)    sh-4802     |   2.899 us    |                      }
+  0)    sh-4802     |   4.040 us    |                    }
+  0)    sh-4802     |   5.151 us    |                  }
+  0)    sh-4802     | + 49.370 us   |                }
+
+
+_ The absolute time field is an absolute timestamp given by the clock since
+  it started. A snapshot of this time is given on each entry/exit of functions
+
+	hide: echo nofuncgraph-abstime > /debug/tracing/trace_options
+	show: echo funcgraph-abstime > /debug/tracing/trace_options
+
+  ie:
+
+  #
+  #      TIME       CPU  DURATION                  FUNCTION CALLS
+  #       |         |     |   |                     |   |   |   |
+  360.774522 |   1)   0.541 us    |                                          }
+  360.774522 |   1)   4.663 us    |                                        }
+  360.774523 |   1)   0.541 us    |                                        __wake_up_bit();
+  360.774524 |   1)   6.796 us    |                                      }
+  360.774524 |   1)   7.952 us    |                                    }
+  360.774525 |   1)   9.063 us    |                                  }
+  360.774525 |   1)   0.615 us    |                                  journal_mark_dirty();
+  360.774527 |   1)   0.578 us    |                                  __brelse();
+  360.774528 |   1)               |                                  reiserfs_prepare_for_journal() {
+  360.774528 |   1)               |                                    unlock_buffer() {
+  360.774529 |   1)               |                                      wake_up_bit() {
+  360.774529 |   1)               |                                        bit_waitqueue() {
+  360.774530 |   1)   0.594 us    |                                          __phys_addr();
+
+
+You can put some comments on specific functions by using ftrace_printk()
+For example, if you want to put a comment inside the __might_sleep() function,
+you just have to include <linux/ftrace.h> and call ftrace_printk() inside __might_sleep()
+
+ftrace_printk("I'm a comment!\n")
+
+will produce:
+
+ 1)               |             __might_sleep() {
+ 1)               |                /* I'm a comment! */
+ 1)   1.449 us    |             }
+
+
+You might find other useful features for this tracer on the "dynamic ftrace"
+section such as tracing only specific functions or tasks.
+
 dynamic ftrace
 --------------
 
@@ -1427,6 +1594,65 @@
 
 We can see that there's no more lock or preempt tracing.
 
+
+* Dynamic ftrace with the function graph tracer *
+
+
+Although what has been explained above concerns both the function tracer and
+the function_graph_tracer, the following concerns only the latter.
+
+If you want to trace only one function and all of its childs, you just have
+to echo its name on set_graph_function:
+
+echo __do_fault > set_graph_function
+
+will produce the following:
+
+ 0)               |  __do_fault() {
+ 0)               |    filemap_fault() {
+ 0)               |      find_lock_page() {
+ 0)   0.804 us    |        find_get_page();
+ 0)               |        __might_sleep() {
+ 0)   1.329 us    |        }
+ 0)   3.904 us    |      }
+ 0)   4.979 us    |    }
+ 0)   0.653 us    |    _spin_lock();
+ 0)   0.578 us    |    page_add_file_rmap();
+ 0)   0.525 us    |    native_set_pte_at();
+ 0)   0.585 us    |    _spin_unlock();
+ 0)               |    unlock_page() {
+ 0)   0.541 us    |      page_waitqueue();
+ 0)   0.639 us    |      __wake_up_bit();
+ 0)   2.786 us    |    }
+ 0) + 14.237 us   |  }
+ 0)               |  __do_fault() {
+ 0)               |    filemap_fault() {
+ 0)               |      find_lock_page() {
+ 0)   0.698 us    |        find_get_page();
+ 0)               |        __might_sleep() {
+ 0)   1.412 us    |        }
+ 0)   3.950 us    |      }
+ 0)   5.098 us    |    }
+ 0)   0.631 us    |    _spin_lock();
+ 0)   0.571 us    |    page_add_file_rmap();
+ 0)   0.526 us    |    native_set_pte_at();
+ 0)   0.586 us    |    _spin_unlock();
+ 0)               |    unlock_page() {
+ 0)   0.533 us    |      page_waitqueue();
+ 0)   0.638 us    |      __wake_up_bit();
+ 0)   2.793 us    |    }
+ 0) + 14.012 us   |  }
+
+You can also select several functions:
+
+echo sys_open > set_graph_function
+echo sys_close >> set_graph_function
+
+Now if you want to go back to trace all functions
+
+echo > set_graph_function
+
+
 trace_pipe
 ----------