How to merge data sets using an Excel or Calc spreadsheet

The scenario

Let us imagine two related data sets in Excel or LibreOffice’s Calc. One contains a list of names and corresponding phone numbers, and the other is a list of names and email addresses. The two data sets have a unique id in common (like a Social Security number). We want to merge the two data sets together into a single table. This is a common problem with an easy solution.

Here is an example of two related data sets.

More about this table

There is a list of names and phone numbers on the left side of this table. On the right side is a list of names and email addresses. Our goal is to merge the two lists into one.

The first column shows row numbers. The second column shows ID numbers. The third column shows telephone numbers. The fourth column shows names. The fifth and sixth columns are purposely left blank and represent the separation between the two data sets. The seventh column contains ID numbers. The eighth column contains names. The ninth column contains email addresses.

  A B C D E F G H
1 ID Tel Number Name     ID Name Email
2 01 555-1313 Dennis Osorio     06 Jesse Pinkman
3 02 555-4242 Carrie Mathison     09 Tom Haverford
4 03 555-2323 Saul Berenson     05 Walter White
5 04 555-5353 King Joffrey     01 Dennis Osorio dposorio@fake.huh
6 05 555-3636 Walter White     10 Andy Dwyer
7 06 555-6464 Jesse Pinkman     07 Hank Schrader
8 07 555-4747 Hank Schrader     02 Carrie Mathison
9 08 555-7373 Saul Goodman     08 Saul Goodman
10 09 555-5858 Tom Haverford     02 Saul Berenson
11 10 555-8484 Andy Dwyer     04 King Joffrey

The solution

Vlookup is used by both Excel and LibreOffice’s Calc.

Vlookup: step-by-step

More about this table

There is a list of names and phone numbers on the left side of this table. On the right side is a list of names and email addresses. Our goal is to merge the two lists into one.

The first column shows row numbers. The second column shows ID numbers. The third column shows telephone numbers. The fourth column shows names. The fifth column contains a vlookup formula. The sixth column is purposely left blank and represents the separation between the two data sets. The seventh column contains ID numbers. The eighth column contains names. The ninth column contains email addresses.

  A B C D E F G H
1 ID Tel Number Name Email Lookup   ID Name Email
2 01 555-1313 Dennis Osorio =vlookup(a2,$f$2:$h$11,3,0)   06 Jesse Pinkman
3 02 555-4242 Carrie Mathison =vlookup(a3,$f$2:$h$11,3,0)   09 Tom Haverford
4 03 555-2323 Saul Berenson =vlookup(a4,$f$2:$h$11,3,0)   05 Walter White
5 04 555-5353 King Joffrey =vlookup(a5,$f$2:$h$11,3,0)   01 Dennis Osorio dposorio@fake.huh
6 05 555-3636 Walter White =vlookup(a6,$f$2:$h$11,3,0)   10 Andy Dwyer
7 06 555-6464 Jesse Pinkman =vlookup(a7,$f$2:$h$11,3,0)   07 Hank Schrader
8 07 555-4747 Hank Schrader =vlookup(a8,$f$2:$h$11,3,0)   02 Carrie Mathison
9 08 555-7373 Saul Goodman =vlookup(a9,$f$2:$h$11,3,0)   08 Saul Goodman
10 09 555-5858 Tom Haverford =vlookup(a10,$f$2:$h$11,3,0)   02 Saul Berenson
11 10 555-8484 Andy Dwyer =vlookup(a11,$f$2:$h$11,3,0)   04 King Joffrey


Parameter-1 should refer to a cell. In our example, that cell is A2. The formula will try to find the contents of A2 elsewhere – like within another data set.

Parameter-2 should refer to a range of cells that represent the “somewhere else” in question. In our example, the range is F2:H11. So far, the formula in the example looks like =vlookup(a2,f2:h11,parameter-3,parameter-4) which indicates that the cell A2 is the value which will be looked up in the range F2:H11.

Parameter-3 is a number that represents [X number of columns] from the initial match. It is contains the value that will be returned upon successful completion of this formula. So far, the formula in the example looks like =vlookup(a2,f2:h11,3,parameter-4) which indicates that the cell A2 is the value which will be looked up in the range F2:H11. And, when a match is found, the value from the cell 2 columns to the right will be returned to me.

  • 1 would be the same column where the match is found
  • 2 would be one column to the right
  • 3 is two columns to the right.

Parameter-4 only accepts a few possibilities: 0, 1, false, or true. This refers to whether fuzzy matches are acceptable, which would depend on (for starters) whether the first column in the cell reference specified in parameter-2 was sorted in ascending order. In the example, the formula is now =vlookup(a2,f2:h11,3,0). I entered 0 or false for parameter-4 because I want an exact match.

What about the dollar signs?

In the example above, when entering certain cell references, I also included dollar signs ($). It is to control shifting when copying and pasting formulas. If you do not know what I mean, then prepare to unlock The Mystery of Dollar Signs in your Spreadsheet, or How Do Cells Find Each Other?

Paste Special

If I were to delete columns f:h, the Email Lookup column would change, and I would get a bunch of #N/A values. I would have broken the reference – the formula needs to be able to refer to the other cells.

To correct for this, what I would do is

  • Insert a new column next to “email lookup.”
  • Highlight and copy “email lookup.”
  • Select the newly created column.
  • Select “paste as values” – if you don’t know what this means, find the edit pull-down menu, and select “paste special.” There is a choice there for paste values – meaning it will paste only the text (or number). No formatting, no formulas – just the values.

I Need Feedback

Did this tutorial work? Was it exactly what you needed? Was it confusing? Did I leave something out?

Spread the love! Ask questions, leave a comment, or simply share this with someone you know would really benefit. What other spreadsheet tutorials would you like to see?

If you find yourself in a pinch, you should know that I am available to help with spreadsheet projects.

40 thoughts on “How to merge data sets using an Excel or Calc spreadsheet

  1. Tina: It sounds like you have two datasets. One dataset has two columns (like ‘First Name’ and ‘Last Name’). The other has the information in one column (like ‘name’). And, you want to merge the two datasets. The first step would be to combine the two columns of the first dataset into one column, so that it matches up with the other dataset.

  2. Leroy: yes, that’s exactly how it works. You can simply mouse your way to another sheet for the vlookup. Notice how if your sheet name has a space in it, it is enclosed within apostrophes/single quotes. I have found it easy to break these references from one worksheet to another, so I try to minimize it whenever I can.

  3. I want to no use of vlookup in data entry for correction on data
    one real sheet and another throughout that new sheet which willl be made
    so how will vlook up used on new sheet with there old refernece…………

  4. Brilliant. Exactly what I was looking for. Thank you so much. Incindentally, I believe that to paste values in Calc you have to select “text” in the paste special box. A marginally shorter way is to click the little tab next to the paste icon and select “unformatted text” and press enter.

    • Yes! As you’re typing in the formula =vlookup… at certain points (usually after the 1st comma), try clicking on another worksheet and selecting a range. It’ll work beautifully, you’ll end up with something that will look like: =vlookup(D1,Sheet2!A1:C11,2,0) — in this example, the “Sheet2!” indicates the name of a different worksheet.

  5. Great tip! Thanks.

    I’m wondering if their is a method to handle many-to-one relationships. For example if Dennis has 2 phone numbers, can they each be merged?

    • Hi Mark:

      Thanks for the question. Are you talking about something like:

      1) Some contact info that includes a column for “phone”
      2) Another contact list that also includes a column for “phone”

      And the problem is that for an individual (say Dennis) is in both lists, has entries for “phone” in each, but the entries are different? And, you don’t want to get rid of either of those entries, you want to keep both?

      If so, I can see a few ways of handling this. Let me know if I’ve got the problem right, so I don’t reply with a solution to a different problem all together.

  6. Hi Dennis,

    Good idea to clarify. My challenge is less about an additional column or data conflict between thetwo sides being merged. It’s more about if there are additional rows/instances on one side of the merge than the other. So in your original sample screenshot if instead of: Dennis, Amy, Layla in column C, rows 2,3,4 it actually read: Dennis, Dennis, Dennis. The additional instances may represent your alternate numbers.

    I looked around a little and found this link, which includes conditional logic. I have not tried this yet though.


  7. It sounds like it is a question of cleaning up the data.

    If you have multiple entries for the same individual, then the problem is around having duplicates.

    I think the way to handle this is to prep each list for an eventual merge, and that you will spare yourself heartache if you do this BEFORE anything gets merged.

    Take the following with a grain of salt, because I have not seen the file and there is probably a lot I do not know, but this is one way I might handle it.

    Using the original example as a basis, let us imagine that column C has three instances of “Dennis”.

    Before merging anything, I would simply sort by name (column C). I would create a new temporary column (let’s say column D), and enter a formula like:

    If there are duplicates, the cells in column D would say “duplicate”. If not, the cells would look blank.

    When I found duplicates, I’d stare at them, and figure out a plan.

    If the duplicate rows are identical, I can simply delete one. If they differ, I would wonder if one contained obsolete data and I would try to figure a way to get rid of what I do not need.

    If names are identical, but the other columns are not, and I want to keep all the data for some reason, (for example, I have phone entries, they are all correct, it is just that one is a home land line, other is a work land line, cell, work cell, etc…) I would take the most complete entry and start adding columns for phone 1, phone 2, phone 3, etc… and then do some copying and pasting.

    Either way, the goal is to have one row per individual (or whatever) for each list. Once that is accomplished, I would then proceed to the merge portion.

    I hope this is helpful and relevant to what you are facing.

  8. This is really useful but I can’t quite fit it to my problem. I’ve got two columns of data one (column B) is a list of 500 errors over 3 months, each month runs on straight after the next. The other (column A) is the number of times that error has occurred per month. I want to pull into a separate worksheet the total number of instances per error message.

    Is there a way I can lookup up the error message text and get it to return the sum of all the matching numerical entries?

    • Hi Ti,

      The 3 month issue in the same column is kind of throwing me. Do you mean something like:

      Column A: # of instances
      Column B: error type
      Column C: June
      Column D: July
      Column E: August

      Not sure yet, but I think the answer will end up being related to SUMIF() or COUNTIF().

  9. Thanks for getting back to me – the table looks like this

    Col A: # instances
    Col B: type of error

    I’ve removed the month value element, Col B has some errors types repeated, each with their own # value in Col A.

    So error type ‘x’ appears 3 times in Col B with a value for each, error ‘y’ appears twice and error ‘z’ only once and so on.

    I guess I want to pull into a worksheet, one column with a list of all the unique error types and one column with the combined total of all the # of instances for each error type occurence.

    Hope that makes sense! Thanks, Ti

    • Ok, this doesn’t sound so bad.

      First, I’d get rid of duplicates from column B: make sure column B is selected -> Data -> Filter -> Advanced Filter (copy to new location, unique records only). This will copy unique values from column B into a new column (let’s say column F)

      Second, next to column F (let’s say column G): for each value, enter something like =sumif(column-B-range,f1,column-A-range).

      I hope this makes sense!

  10. Hi,
    I am analysing household survey data in Excel and there isa need to merge six datasets. The structue looks as follows:
    Each row contains data on one, unique household.
    Column 1: household ID (like 001-03-11-04, which is actually a combination of codes for household, village, district and region)
    Column 2: First name of household head
    Column 2: Age of household head
    Column 3: Gender of household head (1-male; 2-female)
    Column 4: Membership in… (1-yes; 2-no)

    A total of 44 columns.

    The other dataset has informaton for only some of the households and all the household members are listed in rows, some columns are also different from the other dataset:
    Column 1: Household ID (the same as in the first dataset)
    Column 2: Household member ID (like 001-03-11-04-01, here the last two digits are codes for household head, spouse, son/daughter etc.)
    Column 3: Household member age
    Column 4: Education (for each household member, coded)

    I want to merge the two datasets and use PivotTable to analyse, for example membership in the organizaton by age groups and gender. This is easy if the data is merged.
    Any suggestion is appreciated. Thanks.

  11. Hi Sash, This doesn’t sound so bad. As I’m sure you already know, merging is made possible because you have a common column in both datasets (Household ID).

    I don’t have a lot of advice, but let me say this:

    — personally, I love making copies of things. I would keep the datasets you have, and make “working” copies of each. I’d delete unnecessary columns from each dataset, so that the end result contained only the columns I wanted.

    — if you have two datasets that are being merged, be deliberate about which dataset is appended to which dataset. Should you start merging using the dataset with more records? Or the one with more columns? Will the end result contain “everything” it can even if there are a lot of blank values?

    — I would probably spend time making sure each dataset is as “clean” and consistent with itself before merging it to another dataset.

    — I also love keeping a “log” of what I’m doing & why in a separate text file (like NotePad).

    — If you’re good about keeping copies at various stages and good about taking notes, you are more free to make mistakes. If you do, take a deep breathe, try to figure out what happened, accept it, and start over.

    Best of luck!

  12. Hi Dposorio,

    Thanks for responding. Yes, I do make a lot of copies and what I actually want is…getting the job done without making too many copies 🙂
    I am indebted to two actions I take a lot while analysing data in Excel: copy-paste and filtering.

    Have a good day!

  13. Hi Dposorio,

    This how to is great. I am starting to understand the VLOOKUP function but I hope you can help me out with a more complicated way of using it (I think).

    I have 3 different workbooks. Lets keep using names for this example. Workbook 1 has ID, Names and Phone Numbers. The second workbook has ID, Names, Age, Department. The 3rd workbook has ID, Names, Email, Home Address, and Emergency Contact. Workbook 1 is the main workbook as workbook 2 or 3 may be missing some employees (or not, but this makes it impossible to simply sort each workbook and copy/paste the information to the first/main spreadsheet). Can VLOOKUP be used to merge all this information into one? If so, how and if not I hope you know of a way. Thanks for any response.

  14. Hi Miguel C,

    I’m glad you found this post useful. What you propose is fairly straightforward. And, yes, VLOOKUP can be used to merge all the info into one. The main reason that you will be able to do so is because of the duplicated (triplicated?) ID columns – one in each sheet.

    The trickiest question you have is how and where to merge the information to. Personally, I would create a new sheet. I’d call it summary. It would contain the summarized info from the 3 other sheets thanks to VLOOKUP.

    Next, I’d take the ID column from sheet 1 and copy it into column A in the “summary” sheet. After all, sheet 1 is the “main” worksheet. This ID column can be used as the basis for all VLOOKUPS to come.

    To get the VLOOKUP to work from one sheet to another, you’ll type out the formula, and when you get to the 2nd parameter (the table array), just select the section in the other sheet with your mouse.

    The biggest headache, will be cleaning things up. You’ll end up with a bunch of values of zero. These result from doing a VLOOKUP on something that is blank. The VLOOKUP may have worked & found a match, it’s just that what it’s bringing back to you is an empty cell and that’s getting translated to a zero value.

    I would probably go through sheets 1-3 before starting any vlookup, and finding empty cells (this can be done using the pull down menu data -> filter -> auto-filter). Once they are found, you can substitute empty cells with a value like “BLANK”. This would be a lot easier to clean up than dealing with empty cells.

    I hope you find this useful & not overwhelming. Happy merging!

  15. Hi Dposorio,

    The Vlookup function works great if I use the 1 (1 would be the column where the match is found) as the value from the cell it found. But if I use a 2 instead of 1, I get “#REF!” as a result instead of the numbers in the 1st column to the right of the matched column. I tried to useExcel help but it just added more fog factor.

    Any suggestions?

  16. Hi Dposorio,

    Wonderful information you are providing here. My question is somewhat related in that it involves merging data. I have a list of people and their emails. The company has changed names so the email’s domain changed too. Example is now It’s a challenge to do a text to columns because names are of differing lengths. And even if this was possible, how would I incorporate the new domain into the column?
    Thanks in advance for any suggestions,

  17. Hi Al,

    Typically you get #REF! when you are “looking up” a column that is outside of the table array.

    Imagine that I enter

    In the example, I have columns g-i. That’s 3 columns for my table array. If I had entered a 4 instead of 3 (as the 3rd parameter), Excel would return a value of #REF!

    Check that – is that the problem you’re facing?

    — Denns

  18. Hi Roy,

    I’m not entirely sure, but I suspect that the answer might be the use of Contr0l+H (or Find & Replace). You can quickly replace every instance of with There will be options around whether the “find” represents the entire cell contents or not. You can perform this whole Find & Replace operation based on a selection of a column or some cells or on no selection at all (that last one would mean the entire table) – you can even do this across multiple sheets in one Excel file.

    Also, with text-to-columns, one option for creating columns from text is around setting a fixed width. I never use that. I always do it based on a separator – a comma or a semi-colon. You can do more than one pass. One of those passes could be based on the @ symbol. — But, maybe you don’t even need to worry about that – maybe the find & replace thing is all you need.

    — Dennis

  19. Hi Rachel — I think you should be fine. And it doesn’t matter whether you’re using Calc or Excel – either way it’s the same series of steps. VLOOKUP is indeed the answer, you’d just have to be a bit careful and deliberate when carrying this out.

    Let me illustrate:

    When performing the merge, which dataset is being merged to which other dataset? You could take the bigger dataset (750 records) and merge the smaller dataset on top of it. The end result would be an “expanded” dataset with 750 records in it but now it would have more columns. And some of those new columns would even contain data.

    You’d have a very different result if you started with the smaller dataset. Similarly, you’d do things differently if you wanted to merge everything into one dataset without losing any data.

    It’s all solved using VLOOKUP, it’s just the beginning steps that vary.

    It sounds like you want the 3rd dataset to consist of only “the perfect intersection” of the other two datasets. If so, I’d copy and paste the smaller dataset (the 250 record one) and use vlookup to bring in the data from the other dataset. Afterwards, you’d delete any records containing #N/A values. The best case scenario would be the 3rd dataset having 250 records.

  20. I have two datasets I would like to merge. Each dataset has 14 columns with the only overlapping column being the names. But the first dataset has 750 records while the second dataset has only 340 (most of which are the same from the first dataset, but not all). I want to combine the first and second dataset into a third dataset that will consist of the names that the first and second dataset have in common along with the 26 columns of information (13 from the first dataset, 13 from the second, if you don’t count the name column from each). Do I use vlookup for this or is it too complicated?

    I am using Calc. Thank you!

  21. Hello,

    Your post would be extremely useful for me if my data would just merge!

    I have 2 sets of data, 1 is a list of over 60,000 entries, whereas the other list is just over 30,000. Both sets of data have a unique ID but the smaller list displays the fields that I want to look at. I am trying to locate the unique ID’s that are available from the larger list to match the ID’s from the smaller list. My question is: do both sets of data need the same amount of entries to merge, or can I merge only the ID’s found in the larger list to the smaller and exclude the rest where there is no match? Is there a trick to this that I am missing?

    Thanks for your help, this has been the easiest post I have read on the subject!

    • Hi Tori,
      I’m glad this has been a helpful post! Reading over your question, I want to make three points.

      1. It doesn’t matter whether you start with the larger list or the smaller one – either way will work. When in doubt, start with the larger dataset. Just in case. It’s more about losing data, throwing data away, etc… than it is about whether the match will work or not.

      2. Regarding tricks – one of my favorites involves using data->filter->advanced filter to get all unique ids copied in one column with no duplicates.

      3. The other trick I just learned.
      Basically, it just returns empty values instead of n/a values. The vlookup() formulas should be exactly the same. They are abreviated above. So, a final version looks more like this: =IF(ISNA(VLOOKUP($A6,Sheet1!$A$1:$B$6,2,0)),””,VLOOKUP($A6,Sheet1!$A$1:$B$6,2,0))

      I hope I didn’t just overwhelm or confuse you. Good luck & feel free to reply with any questions!

      — Dennis

  22. Hello, your knowledge is great.
    I have the following Issue:

    Column A Column B
    01245 06/07/2011
    01245 07/09/2011
    01245 10/12/2011
    24781 06/07/2011
    24781 07/09/2011
    24781 04/02/2012

    I want to see the following:
    Column T Column U Column V Column W
    01245 06/07/2011 07/09/2011 10/12/2011
    24781 06/07/2011 07/09/2011 04/02/2012

    Is this possible? Further More Perfect would be

    Column T Column U
    01245 10/12/2011 (Just get the last date of the list)
    24781 04/02/2012

    Any tricks for this?

    Thank You

  23. Hi
    I’m trying to combine multiple spreadsheets into one. Each spreadsheet has a column with last name and first name. I’d like to combine the information on both sheets into one, having excel match the first names and last names in both spreadsheets, so that the data doesn’t get mixed up, and it follows the correct person’s name. Any information you can provide me would be greatly appreciated.

  24. Bless You!!! Your explanation for using the dollar sign was what I was looking for. I was trying to figure out how to keep the value static to continue searching the whole array. Thank you so much!

  25. I want to create a spreadsheet to track orders. Heres my problem:

    I want to create like a 3 column entry form. One for the name of the customer, one for the type of product and one for the size. Is there any formula I can use to track the number of products i have entered with their corresponding sizes? I know about data validation and have already created drop down menus.To give an example I have 3 products, type A, type b, and type c. All have 5 sizes. Is there any one to enter all 3 products in one sheet and then have excel spit out the total for each product and size? I no I could create different sheets for different products but I dont want to enter the customer repeatedly. One sheet for the customer and what they ordered and a way to track it. Im having a hard time with this and I dont know all the formulas out there. Thanks


  26. I have an excel file that has 912266312249,66312250/23451985
    in a single cell i need to separate these files to separate cells and some files have +91 22 264 23801,26428804 same way i have to separate them.. and add the missing data from the previous field to the second one like +91 22 missing in the 2nd field how do i do this using excel or any script or is their any easy way to do this

    • Hi Vignesh — this seems a bit laborious. I think some serious, well-thought out “find & replace” in a programmer’s text editor would work wonders. Personally, I use a program called NotePad++ which is free and fantastic. I can use symbols such as t to indicate a tab-spacing or r to indicate a new row. Later, I can open things as a tab-separated file in Excel or OpenOffice. I think this might be a better solution than trying to do this in Excel. — Dennis

Leave a Reply

Your email address will not be published. Required fields are marked *