From 158b602165fa6ccf88bbcabbddf7787d72236375 Mon Sep 17 00:00:00 2001
From: hetong007 <hetong007@gmail.com>
Date: Fri, 16 Oct 2015 01:33:22 +0000
Subject: [PATCH 1/2] temp doc modification

---
 R-package/R/ndarray.R                 |   4 +
 R-package/vignettes/mxnetTutorial.Rmd | 288 ++++++++++++++++++++++++++
 doc/conf.py                           |   3 +-
 3 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 R-package/vignettes/mxnetTutorial.Rmd

diff --git a/R-package/R/ndarray.R b/R-package/R/ndarray.R
index b5537a298593..fa79120f0aab 100644
--- a/R-package/R/ndarray.R
+++ b/R-package/R/ndarray.R
@@ -95,6 +95,10 @@ mx.nd.copyto <- function(src, ctx) {
 #'
 #' @return An \code{mx.ndarray}
 #'
+#' @rdname mx.nd.array
+#' 
+#' @return An Rcpp_MXNDArray object
+#' 
 #' @examples
 #' mat = mx.nd.array(x)
 #' mat = 1 - mat + (2 * mat)/(mat + 0.5)
diff --git a/R-package/vignettes/mxnetTutorial.Rmd b/R-package/vignettes/mxnetTutorial.Rmd
new file mode 100644
index 000000000000..039399d9d793
--- /dev/null
+++ b/R-package/vignettes/mxnetTutorial.Rmd
@@ -0,0 +1,288 @@
+MXNet R Overview Tutorial
+============================
+
+This vignette gives a general overview of MXNet's R package.  MXNet contains a
+mixed flavor of elements to bake flexible and efficient
+applications. There are mainly three concepts:
+
+* [NDArray](#ndarray-numpy-style-tensor-computations-on-cpus-and-gpus)
+  offers matrix and tensor computations on both CPU and GPU, with automatic
+  parallelization
+* [Symbol](#symbol-and-automatic-differentiation) makes defining a neural
+  network extremely easy, and provides automatic differentiation.
+* [KVStore](#distributed-key-value-store) easy the data synchronization between
+  multi-GPUs and multi-machines.
+
+## NDArray: Vectorized tensor computations on CPUs and GPUs
+
+`NDArray` is the basic vectorized operation unit in MXNet for matrix and tensor computations. 
+Users can perform usual calculations as on R's array, but with two additional features:
+
+1.  **multiple devices**: all operations can be run on various devices including
+CPU and GPU
+2. **automatic parallelization**: all operations are automatically executed in
+   parallel with each other
+
+### Create and Initialization
+
+Let's create `NDArray` on either GPU or CPU
+
+```r
+require(mxnet)
+a = mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu
+b = mx.nd.zeros(c(2, 3), mx.gpu()) # create a 2-by-3 matrix on gpu 0
+c = mx.nd.zeros(c(2, 3), mx.gpu(2)) # create a 2-by-3 matrix on gpu 0
+c$dim()
+```
+
+We can also initialize an `NDArray` object in various ways:
+
+```r
+a = mx.nd.ones(c(4, 4))
+b = mx.rnorm(c(4, 5))
+c = mx.nd.array(1:5)
+```
+
+To check the numbers in an `NDArray`, we can simply run
+
+```r
+a = mx.nd.ones(c(2, 3))
+b = as.array(a)
+class(b)
+b
+```
+
+### Basic Operations
+
+#### Elemental-wise operations
+
+You can perform elemental-wise operations on `NDArray` objects:
+
+```r
+a = mx.nd.ones(c(2, 3)) * 2
+b = mx.nd.ones(c(2, 4)) / 8
+as.array(a)
+as.array(b)
+c = a + b
+as.array(c)
+d = c / a - 5
+as.array(d)
+```
+
+If two `NDArray`s sit on different divices, we need to explicitly move them 
+into the same one. For instance:
+
+```r
+a = mx.nd.ones(c(2, 3)) * 2
+b = mx.nd.ones(c(2, 3), mx.gpu()) / 8
+c = mx.nd.copyto(a, mx.gpu()) * b
+as.array(c)
+```
+
+#### Load and Save
+
+You can save an `NDArray` object to your disk with `mx.nd.save`:
+
+```r
+a = mx.nd.ones(c(2, 3))
+mx.nd.save(a, 'temp.ndarray')
+```
+
+You can also load it back easily:
+
+```r
+a = mx.nd.load('temp.ndarray')
+as.array(a[[1]])
+```
+
+In case you want to save data to the distributed file system such as S3 and HDFS, 
+we can directly save to and load from them. For example:
+
+```r
+mx.nd.save(a, 's3://mybucket/mydata.bin')
+mx.nd.save(a, 'hdfs///users/myname/mydata.bin')
+```
+
+### Automatic Parallelization
+
+`NDArray` can automatically execute operations in parallel. It is desirable when we
+use multiple resources such as CPU, GPU cards, and CPU-to-GPU memory bandwidth.
+
+For example, if we write `a = a + 1` followed by `b = b + 1`, and `a` is on CPU while
+`b` is on GPU, then want to execute them in parallel to improve the
+efficiency. Furthermore, data copy between CPU and GPU are also expensive, we
+hope to run it parallel with other computations as well.
+
+However, finding the codes can be executed in parallel by eye is hard. In the
+following example, `a = a + 1` and `c = c * 3` can be executed in parallel, but `a = a + 1` and
+`b = b * 3` should be in sequential.
+
+```r
+a = mx.nd.ones(c(2,3))
+b = a
+c = mx.nd.copyto(a, mx.cpu())
+a = a + 1
+b = b * 3
+c = c * 3
+```
+
+Luckily, MXNet can automatically resolve the dependencies and
+execute operations in parallel with correctness guaranteed. In other words, we
+can write program as by assuming there is only a single thread, while MXNet will
+automatically dispatch it into multi-devices, such as multi GPU cards or multi
+machines.
+
+It is achieved by lazy evaluation. Any operation we write down is issued into a
+internal engine, and then returned. For example, if we run `a = a + 1`, it
+returns immediately after pushing the plus operator to the engine. This
+asynchronous allows us to push more operators to the engine, so it can determine
+the read and write dependency and find a best way to execute them in
+parallel.
+
+The actual computations are finished if we want to copy the results into some
+other place, such as `as.array(a)` or `mx.nd.save(a, 'temp.dat')`. Therefore, if we
+want to write highly parallelized codes, we only need to postpone when we need
+the results.
+
+## Symbol and Automatic Differentiation
+
+WIth the computational unit `NDArray`, we need a way to construct neural networks. MXNet provides a symbolic interface named Symbol to do so. The symbol combines both flexibility and efficiency.
+
+### Basic Composition of Symbols
+
+The following codes create a two layer perceptrons network:
+
+```r
+require(mxnet)
+net = mx.symbol.Variable('data')
+net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
+net = mx.symbol.Activation(data=net, name='relu1', act_type="relu")
+net = mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
+net = mx.symbol.Softmax(data=net, name='out')
+class(net)
+```
+
+Each symbol takes a (unique) string name. *Variable* often defines the inputs,
+or free variables. Other symbols take a symbol as the input (*data*),
+and may accept other hyper-parameters such as the number of hidden neurons (*num_hidden*)
+or the activation type (*act_type*).
+
+The symbol can be simply viewed as a function taking several arguments, whose
+names are automatically generated and can be get by
+
+```r
+arguments(net)
+```
+
+As can be seen, these arguments are the parameters need by each symbol:
+
+- *data* : input data needed by the variable *data*
+- *fc1_weight* and *fc1_bias* : the weight and bias for the first fully connected layer *fc1*
+- *fc2_weight* and *fc2_bias* : the weight and bias for the second fully connected layer *fc2*
+- *out_label* : the label needed by the loss
+
+We can also specify the automatic generated names explicitly:
+
+```r
+net = mx.symbol.Variable('data')
+w = mx.symbol.Variable('myweight')
+net = sym.FullyConnected(data=data, weight=w, name='fc1', num_hidden=128)
+arguments(net)
+```
+
+### More Complicated Composition
+
+MXNet provides well-optimized symbols (see
+[src/operator](https://github.com/dmlc/mxnet/tree/master/src/operator)) for
+commonly used layers in deep learning. We can also easily define new operators
+in python.  The following example first performs an elementwise add between two
+symbols, then feed them to the fully connected operator.
+
+```r
+lhs = mx.symbol.Variable('data1')
+rhs = mx.symbol.Variable('data2')
+net = mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
+arguments(net)
+```
+
+We can also construct symbol in a more flexible way rather than the single
+forward composition we addressed before.
+
+```r
+net = mx.symbol.Variable('data')
+net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
+net2 = mx.symbol.Variable('data2')
+net2 = mx.symbol.FullyConnected(data=net2, name='net2', num_hidden=128)
+composed_net = net(data=net2, name='compose')
+arguments(composed_net)
+```
+
+In the above example, *net* is used a function to apply to an existing symbol
+*net*, the resulting *composed_net* will replace the original argument *data* by
+*net2* instead.
+
+### Argument Shapes Inference
+
+Now we have known how to define the symbol. Next we can inference the shapes of
+all the arguments it needed by given the input data shape.
+
+```r
+net = mx.symbol.Variable('data')
+net = mx.symbol.FullyConnected(data=ent, name='fc1', num_hidden=10)
+```
+
+The shape inference can be used as an earlier debugging mechanism to detect
+shape inconsistency.
+
+### Bind the Symbols and Run
+
+Now we can bind the free variables of the symbol and perform forward and backward.
+The bind function will create an ```Executor``` that can be used to carry out the real computations.
+
+For neural nets, a more commonly used pattern is ```simple_bind```, which will create
+all the arguments arrays for you. Then you can call forward, and backward(if gradient is needed)
+to get the gradient.
+
+```r
+# Todo: refine code
+# define computation graphs
+A = mx.symbol.Variable('A')
+B = mx.symbol.Variable('B')
+C = A * B
+
+texec = mx.simple.bind(C)
+texec.forward()
+texec.backward()
+```
+
+The [model API](../../python/mxnet/model.py) is a thin wrapper around the symbolic executors to support neural net training.
+
+You are also highly encouraged to read [Symbolic Configuration and Execution in Pictures](symbol_in_pictures.md),
+which provides a detailed explanation of concepts in pictures.
+
+### How Efficient is Symbolic API
+
+In short, they design to be very efficienct in both memory and runtime.
+
+The major reason for us to introduce Symbolic API, is to bring the efficient C++
+operations in powerful toolkits such as cxxnet and caffe together with the
+flexible dynamic NArray operations. All the memory and computation resources are
+allocated statically during Bind, to maximize the runtime performance and memory
+utilization.
+
+The coarse grained operators are equivalent to cxxnet layers, which are
+extremely efficient.  We also provide fine grained operators for more flexible
+composition. Because we are also doing more inplace memory allocation, mxnet can
+be ***more memory efficient*** than cxxnet, and gets to same runtime, with
+greater flexiblity.
+
+
+
+
+
+
+
+
+
+
+
diff --git a/doc/conf.py b/doc/conf.py
index 2551a291a3e2..413f3f661c56 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -42,6 +42,7 @@
 MarkdownParser.github_doc_root = github_doc_root
 source_parsers = {
     '.md': MarkdownParser,
+    '.Rmd': MarkdownParser,
 }
 os.environ['MXNET_BUILD_DOC'] = '1'
 # Version information.
@@ -71,7 +72,7 @@
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 # source_suffix = ['.rst', '.md']
-source_suffix = ['.rst', '.md']
+source_suffix = ['.rst', '.md', '.Rmd']
 
 # The encoding of source files.
 #source_encoding = 'utf-8-sig'

From b2ab9e8b30f0b9e3b6a4645a7b7f96e95dace061 Mon Sep 17 00:00:00 2001
From: hetong007 <hetong007@gmail.com>
Date: Sat, 17 Oct 2015 15:46:33 -0700
Subject: [PATCH 2/2] add two tutorials

---
 R-package/vignettes/mnistCompetition.Rmd      | 113 +++++
 ...orial.Rmd => ndarrayAndSymbolTutorial.Rmd} | 138 +++---
 doc/R-package/Makefile                        |   2 +
 doc/R-package/index.md                        |   2 +
 doc/R-package/mnistCompetition.md             | 209 ++++++++
 doc/R-package/ndarrayAndSymbolTutorial.md     | 454 ++++++++++++++++++
 6 files changed, 848 insertions(+), 70 deletions(-)
 create mode 100644 R-package/vignettes/mnistCompetition.Rmd
 rename R-package/vignettes/{mxnetTutorial.Rmd => ndarrayAndSymbolTutorial.Rmd} (75%)
 create mode 100644 doc/R-package/mnistCompetition.md
 create mode 100644 doc/R-package/ndarrayAndSymbolTutorial.md

diff --git a/R-package/vignettes/mnistCompetition.Rmd b/R-package/vignettes/mnistCompetition.Rmd
new file mode 100644
index 000000000000..b749bc9cb4e0
--- /dev/null
+++ b/R-package/vignettes/mnistCompetition.Rmd
@@ -0,0 +1,113 @@
+---
+title: "Handwritten Digits Classification Competition"
+author: "Tong He"
+date: "October 17, 2015"
+output: html_document
+---
+
+[MNIST](http://yann.lecun.com/exdb/mnist/) is a handwritten digits image data set created by Yann LeCun. Every digit is represented by a 28x28 image. It has become a standard data set to test classifiers on simple image input. Neural network is no doubt a strong model for image classification tasks. There's a [long-term hosted competition](https://www.kaggle.com/c/digit-recognizer) on Kaggle using this data set. We will present the basic usage of `mxnet` to compete in this challenge.
+
+## Data Loading
+
+First, let us download the data from [here](https://www.kaggle.com/c/digit-recognizer/data), and put them under the `data/` folder in your working directory.
+
+Then we can read them in R and convert to matrices.
+
+```{r, eval=FALSE}
+train <- read.csv('data/train.csv', header=TRUE)
+test <- read.csv('data/test.csv', header=TRUE)
+train <- data.matrix(train)
+test <- data.matrix(test)
+
+train.x <- train[,-1]
+train.y <- train[,1]
+```
+
+Here every image is represented as a single row in train/test. The greyscale of each image falls in the range [0, 255], we can linearly transform it into [0,1] by
+
+```{r, eval = FALSE}
+train.x <- train.x/255
+test <- test/255
+```
+
+In the label part, we see the number of each digit is fairly even:
+
+```{r, eval=FALSE}
+table(train.y)
+```
+
+## Network Configuration
+
+Now we have the data. The next step is to configure the structure of our network.
+
+```{r}
+data <- mx.symbol.Variable("data")
+fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)
+act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
+fc2 <- mx.symbol.FullyConnected(act1, name = "fc2", num_hidden = 64)
+act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
+fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10)
+softmax <- mx.symbol.Softmax(fc3, name = "sm")
+```
+
+1. In `mxnet`, we use its own data type `symbol` to configure the network. `data <- mx.symbol.Variable("data")` use `data` to represent the input data, i.e. the input layer.
+2. Then we set the first hidden layer by `fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)`. This layer has `data` as the input, its name and the number of hidden neurons.
+3. The activation is set by `act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")`. The activation function takes the output from the first hidden layer `fc1`.
+4. The second hidden layer takes the result from `act1` as the input, with its name as "fc2" and the number of hidden neurons as 64.
+5. the second activation is almost the same as `act1`, except we have a different input source and name.
+6. Here comes the output layer. Since there's only 10 digits, we set the number of neurons to 10.
+7. Finally we set the activation to softmax to get a probabilistic prediction.
+
+## Training 
+
+We are almost ready for the training process. Before we start the computation, let's decide what device should we use.
+
+```{r}
+devices <- lapply(1:2, function(i) {
+  mx.cpu(i)
+})
+```
+
+Here we assign two threads of our CPU to `mxnet`. After all these preparation, you can run the following command to train the neural network!
+
+```{r}
+set.seed(0)
+model <- mx.model.FeedForward.create(softmax, X=train.x, y=train.y,
+                                     ctx=devices, num.round=10, array.batch.size=100,
+                                     learning.rate=0.07, momentum=0.9,
+                                     initializer=mx.init.uniform(0.07),
+                                     epoch.end.callback=mx.callback.log.train.metric(100))
+```
+
+## Prediction and Submission
+
+To make prediction, we can simply write
+
+```{r}
+preds <- predict(model, test)
+dim(preds)
+```
+
+It is a matrix with 28000 rows and 10 cols, containing the desired classification probabilities from the output layer. To extract the maximum label for each row, we can use the `max.col` in R:
+
+```{r}
+pred.label <- max.col(preds) - 1
+table(pred.label)
+```
+
+With a little extra effort in the csv format, we can have our submission to the competition!
+
+```{r}
+submission <- data.frame(ImageId=1:nrow(test), Label=pred.label)
+write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE)
+```
+
+
+
+
+
+
+
+
+
+
diff --git a/R-package/vignettes/mxnetTutorial.Rmd b/R-package/vignettes/ndarrayAndSymbolTutorial.Rmd
similarity index 75%
rename from R-package/vignettes/mxnetTutorial.Rmd
rename to R-package/vignettes/ndarrayAndSymbolTutorial.Rmd
index 039399d9d793..2b608066b753 100644
--- a/R-package/vignettes/mxnetTutorial.Rmd
+++ b/R-package/vignettes/ndarrayAndSymbolTutorial.Rmd
@@ -1,4 +1,4 @@
-MXNet R Overview Tutorial
+MXNet R Tutorial on NDArray and Symbol
 ============================
 
 This vignette gives a general overview of MXNet's R package.  MXNet contains a
@@ -27,27 +27,27 @@ CPU and GPU
 
 Let's create `NDArray` on either GPU or CPU
 
-```r
+```{r}
 require(mxnet)
-a = mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu
-b = mx.nd.zeros(c(2, 3), mx.gpu()) # create a 2-by-3 matrix on gpu 0
-c = mx.nd.zeros(c(2, 3), mx.gpu(2)) # create a 2-by-3 matrix on gpu 0
+a <- mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu
+b <- mx.nd.zeros(c(2, 3), mx.gpu()) # create a 2-by-3 matrix on gpu 0
+c <- mx.nd.zeros(c(2, 3), mx.gpu(2)) # create a 2-by-3 matrix on gpu 0
 c$dim()
 ```
 
 We can also initialize an `NDArray` object in various ways:
 
-```r
-a = mx.nd.ones(c(4, 4))
-b = mx.rnorm(c(4, 5))
-c = mx.nd.array(1:5)
+```{r}
+a <- mx.nd.ones(c(4, 4))
+b <- mx.rnorm(c(4, 5))
+c <- mx.nd.array(1:5)
 ```
 
 To check the numbers in an `NDArray`, we can simply run
 
-```r
-a = mx.nd.ones(c(2, 3))
-b = as.array(a)
+```{r}
+a <- mx.nd.ones(c(2, 3))
+b <- as.array(a)
 class(b)
 b
 ```
@@ -58,24 +58,24 @@ b
 
 You can perform elemental-wise operations on `NDArray` objects:
 
-```r
-a = mx.nd.ones(c(2, 3)) * 2
-b = mx.nd.ones(c(2, 4)) / 8
+```{r}
+a <- mx.nd.ones(c(2, 3)) * 2
+b <- mx.nd.ones(c(2, 4)) / 8
 as.array(a)
 as.array(b)
-c = a + b
+c <- a + b
 as.array(c)
-d = c / a - 5
+d <- c / a - 5
 as.array(d)
 ```
 
 If two `NDArray`s sit on different divices, we need to explicitly move them 
 into the same one. For instance:
 
-```r
-a = mx.nd.ones(c(2, 3)) * 2
-b = mx.nd.ones(c(2, 3), mx.gpu()) / 8
-c = mx.nd.copyto(a, mx.gpu()) * b
+```{r}
+a <- mx.nd.ones(c(2, 3)) * 2
+b <- mx.nd.ones(c(2, 3), mx.gpu()) / 8
+c <- mx.nd.copyto(a, mx.gpu()) * b
 as.array(c)
 ```
 
@@ -83,22 +83,22 @@ as.array(c)
 
 You can save an `NDArray` object to your disk with `mx.nd.save`:
 
-```r
-a = mx.nd.ones(c(2, 3))
+```{r}
+a <- mx.nd.ones(c(2, 3))
 mx.nd.save(a, 'temp.ndarray')
 ```
 
 You can also load it back easily:
 
-```r
-a = mx.nd.load('temp.ndarray')
+```{r}
+a <- mx.nd.load('temp.ndarray')
 as.array(a[[1]])
 ```
 
 In case you want to save data to the distributed file system such as S3 and HDFS, 
 we can directly save to and load from them. For example:
 
-```r
+```{r,eval=FALSE}
 mx.nd.save(a, 's3://mybucket/mydata.bin')
 mx.nd.save(a, 'hdfs///users/myname/mydata.bin')
 ```
@@ -108,22 +108,22 @@ mx.nd.save(a, 'hdfs///users/myname/mydata.bin')
 `NDArray` can automatically execute operations in parallel. It is desirable when we
 use multiple resources such as CPU, GPU cards, and CPU-to-GPU memory bandwidth.
 
-For example, if we write `a = a + 1` followed by `b = b + 1`, and `a` is on CPU while
+For example, if we write `a <- a + 1` followed by `b <- b + 1`, and `a` is on CPU while
 `b` is on GPU, then want to execute them in parallel to improve the
 efficiency. Furthermore, data copy between CPU and GPU are also expensive, we
 hope to run it parallel with other computations as well.
 
 However, finding the codes can be executed in parallel by eye is hard. In the
-following example, `a = a + 1` and `c = c * 3` can be executed in parallel, but `a = a + 1` and
-`b = b * 3` should be in sequential.
-
-```r
-a = mx.nd.ones(c(2,3))
-b = a
-c = mx.nd.copyto(a, mx.cpu())
-a = a + 1
-b = b * 3
-c = c * 3
+following example, `a <- a + 1` and `c <- c * 3` can be executed in parallel, but `a <- a + 1` and
+`b <- b * 3` should be in sequential.
+
+```{r}
+a <- mx.nd.ones(c(2,3))
+b <- a
+c <- mx.nd.copyto(a, mx.cpu())
+a <- a + 1
+b <- b * 3
+c <- c * 3
 ```
 
 Luckily, MXNet can automatically resolve the dependencies and
@@ -133,7 +133,7 @@ automatically dispatch it into multi-devices, such as multi GPU cards or multi
 machines.
 
 It is achieved by lazy evaluation. Any operation we write down is issued into a
-internal engine, and then returned. For example, if we run `a = a + 1`, it
+internal engine, and then returned. For example, if we run `a <- a + 1`, it
 returns immediately after pushing the plus operator to the engine. This
 asynchronous allows us to push more operators to the engine, so it can determine
 the read and write dependency and find a best way to execute them in
@@ -152,13 +152,13 @@ WIth the computational unit `NDArray`, we need a way to construct neural network
 
 The following codes create a two layer perceptrons network:
 
-```r
+```{r}
 require(mxnet)
-net = mx.symbol.Variable('data')
-net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
-net = mx.symbol.Activation(data=net, name='relu1', act_type="relu")
-net = mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
-net = mx.symbol.Softmax(data=net, name='out')
+net <- mx.symbol.Variable('data')
+net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
+net <- mx.symbol.Activation(data=net, name='relu1', act_type="relu")
+net <- mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
+net <- mx.symbol.Softmax(data=net, name='out')
 class(net)
 ```
 
@@ -170,7 +170,7 @@ or the activation type (*act_type*).
 The symbol can be simply viewed as a function taking several arguments, whose
 names are automatically generated and can be get by
 
-```r
+```{r}
 arguments(net)
 ```
 
@@ -183,10 +183,10 @@ As can be seen, these arguments are the parameters need by each symbol:
 
 We can also specify the automatic generated names explicitly:
 
-```r
-net = mx.symbol.Variable('data')
-w = mx.symbol.Variable('myweight')
-net = sym.FullyConnected(data=data, weight=w, name='fc1', num_hidden=128)
+```{r}
+net <- mx.symbol.Variable('data')
+w <- mx.symbol.Variable('myweight')
+net <- sym.FullyConnected(data=data, weight=w, name='fc1', num_hidden=128)
 arguments(net)
 ```
 
@@ -198,22 +198,22 @@ commonly used layers in deep learning. We can also easily define new operators
 in python.  The following example first performs an elementwise add between two
 symbols, then feed them to the fully connected operator.
 
-```r
-lhs = mx.symbol.Variable('data1')
-rhs = mx.symbol.Variable('data2')
-net = mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
+```{r}
+lhs <- mx.symbol.Variable('data1')
+rhs <- mx.symbol.Variable('data2')
+net <- mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
 arguments(net)
 ```
 
 We can also construct symbol in a more flexible way rather than the single
 forward composition we addressed before.
 
-```r
-net = mx.symbol.Variable('data')
-net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
-net2 = mx.symbol.Variable('data2')
-net2 = mx.symbol.FullyConnected(data=net2, name='net2', num_hidden=128)
-composed_net = net(data=net2, name='compose')
+```{r}
+net <- mx.symbol.Variable('data')
+net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
+net2 <- mx.symbol.Variable('data2')
+net2 <- mx.symbol.FullyConnected(data=net2, name='net2', num_hidden=128)
+composed_net <- net(data=net2, name='compose')
 arguments(composed_net)
 ```
 
@@ -226,9 +226,9 @@ In the above example, *net* is used a function to apply to an existing symbol
 Now we have known how to define the symbol. Next we can inference the shapes of
 all the arguments it needed by given the input data shape.
 
-```r
-net = mx.symbol.Variable('data')
-net = mx.symbol.FullyConnected(data=ent, name='fc1', num_hidden=10)
+```{r}
+net <- mx.symbol.Variable('data')
+net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=10)
 ```
 
 The shape inference can be used as an earlier debugging mechanism to detect
@@ -243,19 +243,17 @@ For neural nets, a more commonly used pattern is ```simple_bind```, which will c
 all the arguments arrays for you. Then you can call forward, and backward(if gradient is needed)
 to get the gradient.
 
-```r
-# Todo: refine code
-# define computation graphs
-A = mx.symbol.Variable('A')
-B = mx.symbol.Variable('B')
-C = A * B
+```{r, eval=FALSE}
+A <- mx.symbol.Variable('A')
+B <- mx.symbol.Variable('B')
+C <- A * B
 
-texec = mx.simple.bind(C)
+texec <- mx.simple.bind(C)
 texec.forward()
 texec.backward()
 ```
 
-The [model API](../../python/mxnet/model.py) is a thin wrapper around the symbolic executors to support neural net training.
+The [model API](../../R-package/R/model.R) is a thin wrapper around the symbolic executors to support neural net training.
 
 You are also highly encouraged to read [Symbolic Configuration and Execution in Pictures](symbol_in_pictures.md),
 which provides a detailed explanation of concepts in pictures.
diff --git a/doc/R-package/Makefile b/doc/R-package/Makefile
index a59a3fde4220..7ca47d63776d 100644
--- a/doc/R-package/Makefile
+++ b/doc/R-package/Makefile
@@ -3,6 +3,8 @@ PKGROOT=../../R-package
 
 # ADD The Markdown to be built here
 classifyRealImageWithPretrainedModel.md:
+mnistCompetition.Rmd:
+ndarrayAndSymbolTutorial.Rmd:
 
 # General Rules for build rmarkdowns, need knitr
 %.md: $(PKGROOT)/vignettes/%.Rmd
diff --git a/doc/R-package/index.md b/doc/R-package/index.md
index d8e381f8f442..20d5e70f1ac3 100644
--- a/doc/R-package/index.md
+++ b/doc/R-package/index.md
@@ -10,6 +10,8 @@ The MXNet R packages brings flexible and efficient GPU computing and deep learni
 Tutorials
 ---------
 * [Classify Realworld Images with Pretrained Model](classifyRealImageWithPretrainedModel.md)
+* [Handwritten Digits Classification Competition](mnistCompetition.md)
+* [Tutorial on NDArray and Symbol](ndarrayAndSymbolTutorial.md)
 
 Installation
 ------------
diff --git a/doc/R-package/mnistCompetition.md b/doc/R-package/mnistCompetition.md
new file mode 100644
index 000000000000..dd806dfe777b
--- /dev/null
+++ b/doc/R-package/mnistCompetition.md
@@ -0,0 +1,209 @@
+---
+title: "Handwritten Digits Classification Competition"
+author: "Tong He"
+date: "October 17, 2015"
+output: html_document
+---
+
+[MNIST](http://yann.lecun.com/exdb/mnist/) is a handwritten digits image data set created by Yann LeCun. Every digit is represented by a 28x28 image. It has become a standard data set to test classifiers on simple image input. Neural network is no doubt a strong model for image classification tasks. There's a [long-term hosted competition](https://www.kaggle.com/c/digit-recognizer) on Kaggle using this data set. We will present the basic usage of `mxnet` to compete in this challenge.
+
+## Data Loading
+
+First, let us download the data from [here](https://www.kaggle.com/c/digit-recognizer/data), and put them under the `data/` folder in your working directory.
+
+Then we can read them in R and convert to matrices.
+
+
+```r
+train <- read.csv('data/train.csv', header=TRUE)
+test <- read.csv('data/test.csv', header=TRUE)
+train <- data.matrix(train)
+test <- data.matrix(test)
+
+train.x <- train[,-1]
+train.y <- train[,1]
+```
+
+Here every image is represented as a single row in train/test. The greyscale of each image falls in the range [0, 255], we can linearly transform it into [0,1] by
+
+
+```r
+train.x <- train.x/255
+test <- test/255
+```
+
+In the label part, we see the number of each digit is fairly even:
+
+
+```r
+table(train.y)
+```
+
+## Network Configuration
+
+Now we have the data. The next step is to configure the structure of our network.
+
+
+```r
+data <- mx.symbol.Variable("data")
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.symbol.Variable"
+```
+
+```r
+fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.symbol.FullyConnected"
+```
+
+```r
+act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.symbol.Activation"
+```
+
+```r
+fc2 <- mx.symbol.FullyConnected(act1, name = "fc2", num_hidden = 64)
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.symbol.FullyConnected"
+```
+
+```r
+act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.symbol.Activation"
+```
+
+```r
+fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10)
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.symbol.FullyConnected"
+```
+
+```r
+softmax <- mx.symbol.Softmax(fc3, name = "sm")
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.symbol.Softmax"
+```
+
+1. In `mxnet`, we use its own data type `symbol` to configure the network. `data <- mx.symbol.Variable("data")` use `data` to represent the input data, i.e. the input layer.
+2. Then we set the first hidden layer by `fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)`. This layer has `data` as the input, its name and the number of hidden neurons.
+3. The activation is set by `act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")`. The activation function takes the output from the first hidden layer `fc1`.
+4. The second hidden layer takes the result from `act1` as the input, with its name as "fc2" and the number of hidden neurons as 64.
+5. the second activation is almost the same as `act1`, except we have a different input source and name.
+6. Here comes the output layer. Since there's only 10 digits, we set the number of neurons to 10.
+7. Finally we set the activation to softmax to get a probabilistic prediction.
+
+## Training 
+
+We are almost ready for the training process. Before we start the computation, let's decide what device should we use.
+
+
+```r
+devices <- lapply(1:2, function(i) {
+  mx.cpu(i)
+})
+```
+
+```
+## Error in FUN(1:2[[1L]], ...): could not find function "mx.cpu"
+```
+
+Here we assign two threads of our CPU to `mxnet`. After all these preparation, you can run the following command to train the neural network!
+
+
+```r
+set.seed(0)
+model <- mx.model.FeedForward.create(softmax, X=train.x, y=train.y,
+                                     ctx=devices, num.round=10, array.batch.size=100,
+                                     learning.rate=0.07, momentum=0.9,
+                                     initializer=mx.init.uniform(0.07),
+                                     epoch.end.callback=mx.callback.log.train.metric(100))
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "mx.model.FeedForward.create"
+```
+
+## Prediction and Submission
+
+To make prediction, we can simply write
+
+
+```r
+preds <- predict(model, test)
+```
+
+```
+## Error in predict(model, test): object 'model' not found
+```
+
+```r
+dim(preds)
+```
+
+```
+## Error in eval(expr, envir, enclos): object 'preds' not found
+```
+
+It is a matrix with 28000 rows and 10 cols, containing the desired classification probabilities from the output layer. To extract the maximum label for each row, we can use the `max.col` in R:
+
+
+```r
+pred.label <- max.col(preds) - 1
+```
+
+```
+## Error in as.matrix(m): object 'preds' not found
+```
+
+```r
+table(pred.label)
+```
+
+```
+## Error in table(pred.label): object 'pred.label' not found
+```
+
+With a little extra effort in the csv format, we can have our submission to the competition!
+
+
+```r
+submission <- data.frame(ImageId=1:nrow(test), Label=pred.label)
+```
+
+```
+## Error in nrow(test): object 'test' not found
+```
+
+```r
+write.csv(submission, file='submission.csv', row.names=FALSE, quote=FALSE)
+```
+
+```
+## Error in is.data.frame(x): object 'submission' not found
+```
+
+
+
+
+
+
+
+
+
+
diff --git a/doc/R-package/ndarrayAndSymbolTutorial.md b/doc/R-package/ndarrayAndSymbolTutorial.md
new file mode 100644
index 000000000000..b5572c5e3d9d
--- /dev/null
+++ b/doc/R-package/ndarrayAndSymbolTutorial.md
@@ -0,0 +1,454 @@
+MXNet R Tutorial on NDArray and Symbol
+============================
+
+This vignette gives a general overview of MXNet's R package.  MXNet contains a
+mixed flavor of elements to bake flexible and efficient
+applications. There are mainly three concepts:
+
+* [NDArray](#ndarray-numpy-style-tensor-computations-on-cpus-and-gpus)
+  offers matrix and tensor computations on both CPU and GPU, with automatic
+  parallelization
+* [Symbol](#symbol-and-automatic-differentiation) makes defining a neural
+  network extremely easy, and provides automatic differentiation.
+* [KVStore](#distributed-key-value-store) easy the data synchronization between
+  multi-GPUs and multi-machines.
+
+## NDArray: Vectorized tensor computations on CPUs and GPUs
+
+`NDArray` is the basic vectorized operation unit in MXNet for matrix and tensor computations. 
+Users can perform usual calculations as on R's array, but with two additional features:
+
+1.  **multiple devices**: all operations can be run on various devices including
+CPU and GPU
+2. **automatic parallelization**: all operations are automatically executed in
+   parallel with each other
+
+### Create and Initialization
+
+Let's create `NDArray` on either GPU or CPU
+
+
+```r
+require(mxnet)
+```
+
+```
+## Loading required package: mxnet
+## Loading required package: methods
+```
+
+```r
+a <- mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu
+b <- mx.nd.zeros(c(2, 3), mx.gpu()) # create a 2-by-3 matrix on gpu 0
+```
+
+```
+## Error in eval(expr, envir, enclos): [15:41:37] src/storage/storage.cc:43: Please compile with CUDA enabled
+```
+
+```r
+c <- mx.nd.zeros(c(2, 3), mx.gpu(2)) # create a 2-by-3 matrix on gpu 0
+```
+
+```
+## Error in eval(expr, envir, enclos): [15:41:37] src/storage/storage.cc:43: Please compile with CUDA enabled
+```
+
+```r
+c$dim()
+```
+
+```
+## Error in c$dim: object of type 'builtin' is not subsettable
+```
+
+We can also initialize an `NDArray` object in various ways:
+
+
+```r
+a <- mx.nd.ones(c(4, 4))
+b <- mx.rnorm(c(4, 5))
+c <- mx.nd.array(1:5)
+```
+
+To check the numbers in an `NDArray`, we can simply run
+
+
+```r
+a <- mx.nd.ones(c(2, 3))
+b <- as.array(a)
+class(b)
+```
+
+```
+## [1] "matrix"
+```
+
+```r
+b
+```
+
+```
+##      [,1] [,2] [,3]
+## [1,]    1    1    1
+## [2,]    1    1    1
+```
+
+### Basic Operations
+
+#### Elemental-wise operations
+
+You can perform elemental-wise operations on `NDArray` objects:
+
+
+```r
+a <- mx.nd.ones(c(2, 3)) * 2
+b <- mx.nd.ones(c(2, 4)) / 8
+as.array(a)
+```
+
+```
+##      [,1] [,2] [,3]
+## [1,]    2    2    2
+## [2,]    2    2    2
+```
+
+```r
+as.array(b)
+```
+
+```
+##       [,1]  [,2]  [,3]  [,4]
+## [1,] 0.125 0.125 0.125 0.125
+## [2,] 0.125 0.125 0.125 0.125
+```
+
+```r
+c <- a + b
+```
+
+```
+## Error in eval(expr, envir, enclos): [15:41:37] src/ndarray/./ndarray_function.h:20: Check failed: lshape == rshape operands shape mismatch
+```
+
+```r
+as.array(c)
+```
+
+```
+## [1] 1 2 3 4 5
+```
+
+```r
+d <- c / a - 5
+```
+
+```
+## Error in eval(expr, envir, enclos): [15:41:37] src/ndarray/./ndarray_function.h:20: Check failed: lshape == rshape operands shape mismatch
+```
+
+```r
+as.array(d)
+```
+
+```
+## Error in as.array(d): object 'd' not found
+```
+
+If two `NDArray`s sit on different divices, we need to explicitly move them 
+into the same one. For instance:
+
+
+```r
+a <- mx.nd.ones(c(2, 3)) * 2
+b <- mx.nd.ones(c(2, 3), mx.gpu()) / 8
+```
+
+```
+## Error in eval(expr, envir, enclos): [15:41:37] src/storage/storage.cc:43: Please compile with CUDA enabled
+```
+
+```r
+c <- mx.nd.copyto(a, mx.gpu()) * b
+```
+
+```
+## Error in eval(expr, envir, enclos): [15:41:37] src/storage/storage.cc:43: Please compile with CUDA enabled
+```
+
+```r
+as.array(c)
+```
+
+```
+## [1] 1 2 3 4 5
+```
+
+#### Load and Save
+
+You can save an `NDArray` object to your disk with `mx.nd.save`:
+
+
+```r
+a <- mx.nd.ones(c(2, 3))
+mx.nd.save(a, 'temp.ndarray')
+```
+
+```
+## Error in eval(expr, envir, enclos): could not convert using R function : as.list
+```
+
+You can also load it back easily:
+
+
+```r
+a <- mx.nd.load('temp.ndarray')
+```
+
+```
+## Error in eval(expr, envir, enclos): [15:41:37] src/io/local_filesys.cc:149: Check failed: allow_null  LocalFileSystem: fail to open "temp.ndarray"
+```
+
+```r
+as.array(a[[1]])
+```
+
+```
+## Error in a[[1]]: object of type 'externalptr' is not subsettable
+```
+
+In case you want to save data to the distributed file system such as S3 and HDFS, 
+we can directly save to and load from them. For example:
+
+
+```r
+mx.nd.save(a, 's3://mybucket/mydata.bin')
+mx.nd.save(a, 'hdfs///users/myname/mydata.bin')
+```
+
+### Automatic Parallelization
+
+`NDArray` can automatically execute operations in parallel. It is desirable when we
+use multiple resources such as CPU, GPU cards, and CPU-to-GPU memory bandwidth.
+
+For example, if we write `a <- a + 1` followed by `b <- b + 1`, and `a` is on CPU while
+`b` is on GPU, then want to execute them in parallel to improve the
+efficiency. Furthermore, data copy between CPU and GPU are also expensive, we
+hope to run it parallel with other computations as well.
+
+However, finding the codes can be executed in parallel by eye is hard. In the
+following example, `a <- a + 1` and `c <- c * 3` can be executed in parallel, but `a <- a + 1` and
+`b <- b * 3` should be in sequential.
+
+
+```r
+a <- mx.nd.ones(c(2,3))
+b <- a
+c <- mx.nd.copyto(a, mx.cpu())
+a <- a + 1
+b <- b * 3
+c <- c * 3
+```
+
+Luckily, MXNet can automatically resolve the dependencies and
+execute operations in parallel with correctness guaranteed. In other words, we
+can write program as by assuming there is only a single thread, while MXNet will
+automatically dispatch it into multi-devices, such as multi GPU cards or multi
+machines.
+
+It is achieved by lazy evaluation. Any operation we write down is issued into a
+internal engine, and then returned. For example, if we run `a <- a + 1`, it
+returns immediately after pushing the plus operator to the engine. This
+asynchronous allows us to push more operators to the engine, so it can determine
+the read and write dependency and find a best way to execute them in
+parallel.
+
+The actual computations are finished if we want to copy the results into some
+other place, such as `as.array(a)` or `mx.nd.save(a, 'temp.dat')`. Therefore, if we
+want to write highly parallelized codes, we only need to postpone when we need
+the results.
+
+## Symbol and Automatic Differentiation
+
+WIth the computational unit `NDArray`, we need a way to construct neural networks. MXNet provides a symbolic interface named Symbol to do so. The symbol combines both flexibility and efficiency.
+
+### Basic Composition of Symbols
+
+The following codes create a two layer perceptrons network:
+
+
+```r
+require(mxnet)
+net <- mx.symbol.Variable('data')
+net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
+net <- mx.symbol.Activation(data=net, name='relu1', act_type="relu")
+net <- mx.symbol.FullyConnected(data=net, name='fc2', num_hidden=64)
+net <- mx.symbol.Softmax(data=net, name='out')
+class(net)
+```
+
+```
+## [1] "Rcpp_MXSymbol"
+## attr(,"package")
+## [1] "mxnet"
+```
+
+Each symbol takes a (unique) string name. *Variable* often defines the inputs,
+or free variables. Other symbols take a symbol as the input (*data*),
+and may accept other hyper-parameters such as the number of hidden neurons (*num_hidden*)
+or the activation type (*act_type*).
+
+The symbol can be simply viewed as a function taking several arguments, whose
+names are automatically generated and can be get by
+
+
+```r
+arguments(net)
+```
+
+```
+## [1] "data"       "fc1_weight" "fc1_bias"   "fc2_weight" "fc2_bias"  
+## [6] "out_label"
+```
+
+As can be seen, these arguments are the parameters need by each symbol:
+
+- *data* : input data needed by the variable *data*
+- *fc1_weight* and *fc1_bias* : the weight and bias for the first fully connected layer *fc1*
+- *fc2_weight* and *fc2_bias* : the weight and bias for the second fully connected layer *fc2*
+- *out_label* : the label needed by the loss
+
+We can also specify the automatic generated names explicitly:
+
+
+```r
+net <- mx.symbol.Variable('data')
+w <- mx.symbol.Variable('myweight')
+net <- sym.FullyConnected(data=data, weight=w, name='fc1', num_hidden=128)
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "sym.FullyConnected"
+```
+
+```r
+arguments(net)
+```
+
+```
+## [1] "data"
+```
+
+### More Complicated Composition
+
+MXNet provides well-optimized symbols (see
+[src/operator](https://github.com/dmlc/mxnet/tree/master/src/operator)) for
+commonly used layers in deep learning. We can also easily define new operators
+in python.  The following example first performs an elementwise add between two
+symbols, then feed them to the fully connected operator.
+
+
+```r
+lhs <- mx.symbol.Variable('data1')
+rhs <- mx.symbol.Variable('data2')
+net <- mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
+arguments(net)
+```
+
+```
+## [1] "data1"      "data2"      "fc1_weight" "fc1_bias"
+```
+
+We can also construct symbol in a more flexible way rather than the single
+forward composition we addressed before.
+
+
+```r
+net <- mx.symbol.Variable('data')
+net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=128)
+net2 <- mx.symbol.Variable('data2')
+net2 <- mx.symbol.FullyConnected(data=net2, name='net2', num_hidden=128)
+composed_net <- net(data=net2, name='compose')
+```
+
+```
+## Error in eval(expr, envir, enclos): could not find function "net"
+```
+
+```r
+arguments(composed_net)
+```
+
+```
+## Error in inherits(x, "Rcpp_MXSymbol"): object 'composed_net' not found
+```
+
+In the above example, *net* is used a function to apply to an existing symbol
+*net*, the resulting *composed_net* will replace the original argument *data* by
+*net2* instead.
+
+### Argument Shapes Inference
+
+Now we have known how to define the symbol. Next we can inference the shapes of
+all the arguments it needed by given the input data shape.
+
+
+```r
+net <- mx.symbol.Variable('data')
+net <- mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=10)
+```
+
+The shape inference can be used as an earlier debugging mechanism to detect
+shape inconsistency.
+
+### Bind the Symbols and Run
+
+Now we can bind the free variables of the symbol and perform forward and backward.
+The bind function will create an ```Executor``` that can be used to carry out the real computations.
+
+For neural nets, a more commonly used pattern is ```simple_bind```, which will create
+all the arguments arrays for you. Then you can call forward, and backward(if gradient is needed)
+to get the gradient.
+
+
+```r
+A <- mx.symbol.Variable('A')
+B <- mx.symbol.Variable('B')
+C <- A * B
+
+texec <- mx.simple.bind(C)
+texec.forward()
+texec.backward()
+```
+
+The [model API](../../R-package/R/model.R) is a thin wrapper around the symbolic executors to support neural net training.
+
+You are also highly encouraged to read [Symbolic Configuration and Execution in Pictures](symbol_in_pictures.md),
+which provides a detailed explanation of concepts in pictures.
+
+### How Efficient is Symbolic API
+
+In short, they design to be very efficienct in both memory and runtime.
+
+The major reason for us to introduce Symbolic API, is to bring the efficient C++
+operations in powerful toolkits such as cxxnet and caffe together with the
+flexible dynamic NArray operations. All the memory and computation resources are
+allocated statically during Bind, to maximize the runtime performance and memory
+utilization.
+
+The coarse grained operators are equivalent to cxxnet layers, which are
+extremely efficient.  We also provide fine grained operators for more flexible
+composition. Because we are also doing more inplace memory allocation, mxnet can
+be ***more memory efficient*** than cxxnet, and gets to same runtime, with
+greater flexiblity.
+
+
+
+
+
+
+
+
+
+
+