Search for content in message boards

Identifiying individual;s with an Unique Reference

Identifiying individual;s with an Unique Reference

Posted: 15 Nov 2012 2:50PM GMT
Classification: Query
Surnames: Barry
Does anyone know how to generate a uniquue ID reference for an individual or what systems are already in existance?

I want to combine the 1901 and 1911 census extracted relating to my family name of Barry, so Ideally each person should have a unique ID or primary Identifier. there are approx 9500 people in each census. I have experimented with one made up of [Surname]+[Given Name]+[DOB]+ [Marital status] + [ Place of Birth]which works quite well but there are still several instances of the same sequence being generated because of the limited and repetetive use of the same group of given names.
The information is held in excel Spread sheets, which allows a good range of sorting and filtering arrangements and marcos so it won't be difficult to do what I want if each person can have a unique identifier.

Within one census data its OK as you can use a simple numbering system for each individual, however that doesn't help when the other census data is added, or when family groups are extracted from head of family filtering to then subtract these people from the whole census

Any suggestions please?

Re: Identifiying individual;s with an Unique Reference

Posted: 16 Nov 2012 1:45PM GMT
Classification: Query
By hand or using a genealogy software program? If a program, please state which one you're using.

Re: Identifiying individual;s with an Unique Reference

Posted: 16 Nov 2012 4:26PM GMT
Classification: Query
Surnames: Barry
Hello,

I'm looking for a repeatable system, I have FTM Family Tree Maker 2012, but the data is held in MSExcel 2003/07/10.

I would like to be able to apply the system maybe a macro or simple copying and dragging down the rows in excel. It could be Visual Basic for applications.

The problem is that even with five elements the results aren't unique, but at least it would give consistant results across both census, even if there are a few manual interventions required. It may not be obvious that having filtered the heads of families, the dependant that are then extracted from the census data base online are correctly referenced into a family but the connection to the full list of people is not made, so unless you look at each instance manually with or without the identifier there is still alot of time and effort required.

BR
MikeB

Re: Identifiying individual;s with an Unique Reference

Posted: 16 Nov 2012 8:22PM GMT
Classification: Query
There are several methods of numbering. If you'll do a web search for "Genealogical numbering systems" you'll find them.

Re: Identifiying individual;s with an Unique Reference

Posted: 16 Nov 2012 10:58PM GMT
Classification: Query
Surnames: Barry
Hello again,
I've read about these various numbering systems after searching, but feel none will do what I need, as I'm not dealing with one family but 2000+ some of whom may be related. Also they will on the whole be in both census, plus new families which have been formed from the older children which feature in the 1901 census, and some will not feature for other reasons in the 1911 families. The next problem I can see is that the data exists in MSexcel which is a very feature full program, but not a relational database as is MsAccess, and to apply a numbering scheme to one family group would be easily possible my manual means, but there are about 1900 groups in each census many of which will be the same people. If the first family was numbered from 1, then the second from 1000, third 2000 then the numbers are going to get really large, but the point is you want the same head of family in both census to have the same identifier as he is the same person being recorded 10 years later, so if there are more children then the system needs to be able to recognise that a particular set of parents has a larger family. There are many instances where the head of a family is a widow, or widower, so the missing partner is not recorded, also sibling living together with the eldest being the head, and single people with no partner or dependants, plus a number of variations with inlaws and grand children being present. The combined census in some instances gives three generation of a family and there may be husbands or wifes family present as well which would consistute the beginings of a small family tree. Others only have minimal entries and may have to stand alone until further information arrives from the 1921 census or extracts from the civil registers. From my findings so far for both census of the 9500 people listed the extracted families about 1900 account for about 8000 people so there are about 1000 singletons which will have to be linked to a family in some way. The 1911 census is more helpfull requesting the years married, childen born and children living at the time of the census. Which now puts a frame work around each family. When I listed the 1901 census I also made an entry for missing partners of widows and widowers, and also for those who listed as married but the partner wasn't at home the day of the census. I intend to use this approach with the final version of the 1911 families, plus place holders for all children born to the family. At this point I will be reaching a limiting point if the whole thing can't be presented on line in Ancestry so that the search engine can help fill in the missing individuals from the civil registration and other family trees and other online resourses. It would be interesting to know how many Barry family trees exist with people who are listed in either or both census, and more intreging those families who emigrated from say 1800 who have a continuious heritage which remained in Ireland. Of course this will never be a comprehensive piece of work, as there are to many unknowns but it could well become a significant reference for any Barry geneoligist, future and present, I have a web site which will be populated with all this informantion for interested people to have free access to at www.mikebarry.eu. If it could be organised in such a way that individual contributors could domate their GEDCOM and identify which family they belong to in the census then we would have enhanced the knowledge base of our family. Having a facility to add information on subsequent generations and additions to existing families would be good too. One idea I have is to step back in time and generate a 1891 census from the known families and singletons admitedly it wouldn't be a complete statement of those around in that year but it would be a significant step forward, extracting from the civil registration system Births deaths and marriages, and emigration documents may enhance the accuracy. It may be possible to go back more, once a cohesive body of data were available in a format which would allow a SQL qerry to generate the results. Which in a way answers one question on what tyoe of storage program is needed, i.e. a relational data base but setting one up for 2000 families and avoiding manual entry of the data other than the structure, tables and relationships is going to be a challenge. The unique identifier per individual would have to be available to make this possible I feel, and would have to include references to the parents and siblings somehow. There still the problem of generating a GEDCOM from excel or access to put online without having to manually enter all the data.

Food for thought?! Any ideas from those with relational database skills and how to import / export the data, and the identifier problem?
BR
MikeABarry

Re: Identifiying individual;s with an Unique Reference

Posted: 16 Nov 2012 11:56PM GMT
Classification: Query
I don't think you can do it and have Uniqueness and Repeatability.

Uniqueness can be achived easily by adding a sequencial number to the end of you current algorythm. This number can be either N=N+1 for all individuals or just when a duplicate is created.

Repeatability is harder since:
1) People can be reported with different names census to census but be the same person.
2) Marital status can change, or DOB can be off by a year, day or month,
3) The first John Doe found in the 1900 censu could be the second John Doe in the 1910 census.

Personally I do not think that you can automate this process since if you could Ancestry.Com probably would have done it already and charged you to see what they did. Remember, families do not alway report together in the same place. Men at sea or working as labourers in fields 100's or 1000's of miles away will not be matched.

Re: Identifiying individual;s with an Unique Reference

Posted: 17 Nov 2012 9:49AM GMT
Classification: Query
Hello KJ?
I'm confident that it may be possible to get a system inplace within one census as ultimately the online census database is providing all the data. I agree quality of data and accuracy from one census to the other is a big issue, and manual comparison by human may be the only workable system. I tried my five point system with the 1901, I also had added indexing 1-9500 to the full census and 1-1900 to the refined heads of families as I was very anxious not to loose my original work, I do keep reference copies of raw collected data and refined data so I can start again if my experiments don't work. I persisted with the manual comparison and it was very time consumming, there was an unexpected problem with the way excel sorts alpha/numerical data so although the majority did line up there were many exceptions which took more time, and the repeated entries where the system wasn't able to discriminate sufficiently well. I also did a significant amount of what I call data cleaning such as eliminating spelling mistakes and the way data was presented. Taking religion as an example there were so many variations of Roman Catholic, RC, Catholic, Church of Rome etc etc plus spelling errors. The same with place names, and occupations, marital status and I added a DOB worked out from the declared age, and an age for 1911. The order of the data has to be changed as the initial wildcard search returns in a different order to the families data, and this is also different between census. This cleaning was also required again when I had down loaded the dependants from the refined head of family list some 8000 people in total particularly on the five search catagories for the identifier otherwise it wouldn't work at all.
One problem I have had is insufficient skills with excel, and this is an area where I'm improving, making a macro to do the simplest thing saves so much time. For instance after I have the refined heads of family list, you have to add extra lines between each entry to accept the dependants. I initially used to add these extra lines manually before collecting the data, but now I have a macro which will add 10 free lines after each entry. The more I can do like this the quicker I'll get to the end result. I think I'd like to see a referencing identifier added which is flexible enough to add people later and have an inherent meaning, ideally which can be added in an automated way. One other problem is that each family group is not made up of Father,Mother, Child1-N, and there are numberous others, parents inlaws, Grand children, cousin,etc.etc. Ideally each generation will have a standard form, but the census doesn't supply names of deceased parents. A numbering system to cope with all these permutations might be possible based on one of the standard systems, for each family group plus a sequencial family ID. But we are getting more and more labour intensive to achieve the goal, and its begining to look like a realational database implented in a spread sheet.
There's still the issues of exporting to GEDCOM. I'm thinking out loud that I'll have to progess the 1911 family groups until I can compare them to the 1901 list of families, maybe there's a total number of groups like 2500 when the two census are combined, as there must be a large overlap in the two sets of data. Then start manually building a Family tree for every family, in one large family tree which is possible in FTM 2012 being careful to keep the final arrangement for identifiers which might be a combination of straight forward numerical list plus one of the standard methods for family members. I'd like it to be future proof as future generations will come along i.e. 1921 census, and there must already be instances within the data on hand which should be included within older groups and not added as a new group.
I'm now more than 50% through the collection of dependants in the 1911 census so soon I'll have the benfit of experience when trying to combine the families in due course. I've just been made redundant so I could well have the time in the new year. I'm not quite old enough to retire but if thats the way it turns out then I know I'll have plenty to keep me occupied, and I'll be off to Ireland to do some reaseach too.

BR
MikeB
per page

Find a board about a specific topic

  • Visit our other sites:

© 1997-2014 Ancestry.com | Corporate Information | New Privacy | New Terms and Conditions