Home » SAS – Merge Dataset

SAS – Merge Dataset

by Online Tutorials Library

SAS – Merge Dataset

Joining | Combining

In the last topic, we have learned multiple variables sorting in SAS, and saw that we could sort data values on the basis of multiple variables simultaneously. Now, we are going to learn how we can merge data sets in SAS Programming language. You will find many examples, for great understanding.

So let’s start…….

What is the SAS Merge?

Merge in SAS is a process which combines observations from two or more SAS datasets. The base of merging is, the merging datasets must have a common variable with common data values (or observations) inside them.

Use the following steps to merge datasets:

  • Create a new dataset for merging.
  • We use the By statement to denote the name of a common variable which is used for matching prerequisites to merge.
  • Use MERGE statement before the name of the dataset.
  • The merging datasets must have at least one common variable.

For Example:

SAS Merge Dataset

As you can see above, the ID is a common variable among both datasets.

Use the following syntax to merge dataset first and second:

Output:

ID    A   B   C 101    1   2    6 102    3   4   5 103    5   6   7 104    3    7   9 

In the above example, both data sets have been merged. Both datasets contain different data, and the variable ID contains the same data as it is a unique identifier (no duplicate).

Let’s understand through an example:

When we merge two data sets using the merge statement, and we already know that each observation of a common variable has a match in the other dataset, then we can merge very straight-forward. Here is an example.

Often, in the same case, different data is stored in two or more separate data sets. For example, we want to store data of an employee, for that we created two datasets, one is essential, and another is additional. The essential dataset contains the very basic information of employee such as employment ID, name, mobile, email, adhar etc, and the additional dataset contains information which is rarely used.

Data essential

Dataset additional

Now, create new dataset “combined” to merge essential and additional datasets.

Now, execute this code in SAS studio.

SAS Merge Dataset

Output:

SAS Merge Dataset

When Matching is Not Perfect

What happens if the variable is common, but its data values are not common between datasets?

For Example:

SAS Merge Dataset

In data set mother, second observation (ID=201) does not match the second observation (ID =401) of data set child. A match-merge will be like this:

Result

ID   A   B   C 101   1   2     0    both data sets are contributing to this observation 201   3   4     .     only data set first contributed to this observation           301   5   6     1    both data sets contributed to this observation 

Let’s understand through an example:

Here we are using the example mentioned above, but changing some of its data values . Taking two data sets; one is essential, and another is additional. Now, we have changed two data values of variable ID in the additional dataset, let’s see what will be the result.

Data essential; input employeeid name$ age mobile email$ adhar$; datalines; 101 Vikas 25 9374747373 [email protected] 234390876344 102 Nikita 23 9086746353 [email protected] 128743526709 103 Ginni 27 9845637238 [email protected] 980706509823 104 tutor 35 9893235487 [email protected] 123214567876 105 Gaurav 26 9893235445 [email protected] 123454326789 106 Ayush 24 9893235432 [email protected] 234565432123 107 Harshita 23 9893232345 [email protected] 908756897645 108 Preeti 27 9993235487 [email protected] 987605432123 109 Mayur 28 9893235432 [email protected] 123213487654 110 Rajesh 29 9893235409 [email protected] 456578987623 ; run;   data additional; input employeeid address$ father$ mother$; datalines; 101 bhopal narendra amrita 111 kolkata mohan anita 103 timarnee ramkumar vaishnavi 123 meeruth arjun megha 105 delhi kailash manorama 106 noida ramakant sunita 107 gorakhpur shreelal shikha 108 seehor shekher sheela 109 indore krishna tulsi 110 dewas susheel janki ; run; Data combined; merge essential additional; By employeeid; run; proc print data=combined; run; 

Execute this code in SAS Studio

SAS Merge Dataset

Output:

SAS Merge Dataset

In the above code you can see, variable ID is common in both data sets essential and additional, but two data values (highlighted in images, one by green and another by red) of variable ID changed in dataset “additional.”

When we execute this code in SAS Studio, it will produce output according to the data values mentioned in the data set “essential” (consider the output image), but if you check it in the log window, you can see the error message.

ERROR: BY variables are not properly sorted on data set WORK.ADDITIONAL.  employeeid=111 name=  age=. mobile=. email=. adhar=  address=kolkata   father=mohan mother=anita FIRST.employeeid=1 LAST.employeeid=1  _ERROR_=1 _N_=6 

IN= Variables

What to do, if you want to put only the merge output in the data set? Means only those observations in which both input datasets are contributing.

In other words, when you don’t want to print observations that are not contributing then use IN=Variables.

SAS has already installed special temporary variables for you, which are called “IN = variables” so that you can do more.

Now, what you have to do:

1. Use extra variables with merge statement for both merging datasets

Syntax

2. In the data step, use “IN= variables” appropriately.

Let’s understand through an example:

SAS Merge Dataset

SAS Merge Dataset

We can see in the output; data set other kept only data values of matched IDs.

  • If we want to keep only contributing observations then, give value 1 to the variable a and b.
  • If we want to keep only the observations that are not contributing then give 0 to the variable a and b.

In the above example, the “IN= variables” A and B taking values like this:

SAS Merge Dataset

If you want to keep not only matching data values, but also to track different data sets of non-matching data values, you can create three data sets in this way:

Summary

So, this is all about merge data sets. In this section we have learned, what is SAS merge Datasets, how to merge two or more datasets and what happens when there is no match in Data Sets.

Hope, you understood the topic clearly. But, if you still have any problem, please ask in our contact section.


You may also like