02 August 2013

Third Parties, Aggregators, and Bigger is Better Databases


The "United States, Public Records Index" was just updated on FamilySearch. With over 70 million names, it is bound to draw attention and, like any source, it should be used responsibly and as a potential clue-not as a fact.

But how this information was obtained is a little sketchy.

This is from the description on FamilySearch:

"This collection is an index of names, birthdates, addresses, phone numbers, and possible relatives of people who resided in the five boroughs of New York City between 1970 and 2010. These records were generated from telephone directories, driver licenses, property tax assessments, credit applications, voter registration lists and other records available to the public." There are entries for outside the New York State area.

I always thought indexes lead you to another record--the original record. That's one way in which Dictionary.com defines an index. Although I see that Dictionary.com also defines index as "a sequential arrangement of material." I'm not certain that a database of this type qualifies as an index in this sense either as users never "see" the entire "index" like they would a phone book. Users query the database and see what matches their search terms.

This is the "source" information as listed on the FamilySearch site:

"United States, Public Records Index." Index. FamilySearch. http://FamilySearch.org : accessed 2013. From a third party aggregator of publicly available information."

"Third party aggregator?" Sounds like data harvesting to me--which isn't bad, but "third party" sounds a little vague. "Publicly available information" seems a little broad as well--that covers a lot ground and is somewhat non-descript. Sounds like something a private investigator or bill collector might use to track someone down.

The vague nature of this database makes evaluation of perceived reliability difficult.

Some say "any data is better than no data at all."

Sometimes I wonder.