US Surname Distribution Analysis - The Aggregated Distribution File

US Surname Distribution Analysis
The Aggregated Distribution File

Home | UK Distribution | US Distribution

When you upload a Census National Index Extract file, you will be given the opportunity to download the Aggregated Distribution file which is created from the National Index Extract file.

Alternately, if you are a Windows user, you can download a program that will convert a National Index Extract file into a an Aggregated Distribution File on your own PC.

The Aggregated Distribution file can be uploaded instead of the National Index Extract file if you wish to create distribution maps in a subsequent session. Because the Aggregated Data File is considerably smaller than the National Index file, it is much quicker to upload.

The format of the Aggregated Distribution File downloaded from this facility when you upload a National Index File is 'Tab delimited text', which uses TAB characters to separate the three fields in each record, as follows:-

WICKES (ROBERTSON)VA1
WICKIRMI6
WICKSCT17
WICKSNY381
WICKSOH3
WICKSPA2
WICKSSC44
WICKSSD12
WICKSVA1

Note that the TAB character is non-printing, and is represented by the in the above example. The 'Tab delimited text' format is convenient for loading into a spreadsheet for editing or further analysis.

The three fields in each record are the Surname, the abbreviation of the State, and the number associated with this state. When the Aggregated Distribution File is downloaded from this site, or created with the usdistag program which can be downloaded from this site, this number is the count of individuals with the Surname in the State. However, it could be some other figure, such as a population density (e.g. 0.000013) providing you calculate that figure and create an Aggregated Distribution File of your own.

In addition to the data, the file can also contain an optional title, which identifies the data in this set. The title appears on the first line of the file, thus:-

Title=WYKES - 1880 Census Place
WICKES (ROBERTSON)VA1
WICKIRMI6
WICKSCT17
WICKSNY381
WICKSOH3
WICKSPA2
WICKSSC44
WICKSSD12
WICKSVA1

The title must appear on the first line of the file and it must be preceded by 'Title=' which identifies the record as the title record.

Creating Your Own Aggregated Distribution File

If you want to create distribution maps from a source of data other than the LDS 1880 US Census CDs, you can do so, but you will need to create your own Aggregated Distribution file.

As noted above, the file is a plain text file. You may use either TAB characters or commas (,) to separate the three fields in each record, as follows:-

WICKES (ROBERTSON)VA1
WICKIRMI6
WICKSCT17
WICKSNY381
WICKSOH3
WICKSPA2
WICKSSC44
WICKSSD12
WICKSVA11

WICKES (ROBERTSON),VA,1
WICKIR,MI,6
WICKS,CT,17
WICKS,NY,381
WICKS,OH,3
WICKS,PA,2
WICKS,SC,44
WICKS,SD,12
WICKS,VA,1

The three fields in each record are:

Surname
State Abbreviation
The number associated with the surname in the state.

The number can be a count of the number of people in the state with the surname, or any other meaningful measurement, such as density or frequency of the surname.

If your data are based on pre-1889 sources, then you will probably have state data for Dakota Territory, rather than the post-1889 states of North and South Dakota. In this case you should use the code DT for Dakota Territory, rather than the codes SD and ND. Take care that you use either DT or the state codes, ND and SD. If you do include DT and either of the codes ND or SD then the data for the two states North Dakota and South Dakota will not be plotted and will appear as Other in the Legend.

The easiest way to create your own Aggregated file is using a spreadsheet. Your worksheet should contain 3 columns, corresponding to the 3 fields in each aggregated record. When you have entered all your data save the file as a Text (Tab Delimited) file or as a CSV file. This file can then be uploaded for analysis, using the form on the Aggregated Distribution File Upload page. Some spreadsheets will enclose fields in double quotes ("). That's OK - the upload process strips any double quotes from the file being uploaded. If you are entering a title, it should be placed in cell A1, according to the rules specified above.

The table below shows the state abbreviations and the full names of the States.

State		State
Code	State	Code	State
AK	Alaska	MT	Montana
AL	Alabama	NC	North Carolina
AR	Arkansas	ND	North Dakota (post-1889)
AZ	Arizona	NE	Nebraska
CA	California	NH	New Hampshire
CO	Colorado	NJ	New Jersey
CT	Connecticut	NM	New Mexico
DC	District of Columbia	NV	Nevada
DE	Delaware	NY	New York
DT	Dakota Territory (pre-1889)	NYC	New York City
FL	Florida	OH	Ohio
GA	Georgia	OK	Oklahoma
HI	Hawaii	OR	Oregon
IA	Iowa	PA	Pennsylvania
ID	Idaho	RI	Rhode Island
IL	Illinois	SC	South Carolina
IN	Indiana	SD	South Dakota (post-1889)
KS	Kansas	TN	Tennessee
KY	Kentucky	TX	Texas
LA	Louisiana	UT	Utah
MA	Massachusetts	VA	Virginia
MD	Maryland	VT	Vermont
ME	Maine	WA	Washington
MI	Michigan	WI	Wisconsin
MN	Minnesota	WV	West Virginia
MO	Missouri	WY	Wyoming
MS	Mississippi