I mostly agree, however, there are some issues I'm foreseeing. I've done some NLP work myself and have had access to very private information because of it. The team we were working with were all highly educated and well paid. Because of this, we were aware of the implications of the data we were working with, so at one point, in one of our randomly selected data sets, we found data that was from a public figure and removed it to avoid any possibility of a conflict of interest.
Where my concern is, is that a lot of annotating work is now being done by low wage, low education workers and because of the increasing demand of annotated data sets this group is increasing. Also, because this work is increasingly getting outsourced, there is less direct control of who is doing the work and therefore the chance that a bad apple slips through is bigger. That's what's scaring me going forward.
Educated or not you can't 'unhear' something. What if the conversation is political or financial and has major implications on someone you know? Will you still be able to remain professional and not act on the information?
I feel like devices like this should at least give the user the opportunity to play back whatever is sent for manual evaluation. e.g. it could send you an email at the end of the month listing the recordings it would like to use.
Maybe, I'm just making an observation. My biggest issue is that lack of control over outsourced workers over internal workers.
In my situation it was very clear that breaching the agreements would not just lead to direct end of contract, but also possible legal steps and a permanent record, leading to a much smaller chance for a similar job, for which I've specifically had an education. For uneducated workers, this if often less of an issue, especially for off-shore workers.
I also don't think low wage/education workers have less morals, but as the workforce gets bigger the chance of bad apples also gets bigger. I wrote exactly that in my previous comment.
The problem isn't a single-value metric of morals, whatever that may be, the problem is that one group is bored and not invested in the project while the other has far more exciting things to do than wondering who they might be listening to.
I'd wager "low wage, low education" workers are more likely to be simply acting immorally in different ways.
The "high education well paid" team is more likely to do something with far reaching harmful impact whereas the "low wage, low education" worker doesn't have the power to do anything at that scale.
Where my concern is, is that a lot of annotating work is now being done by low wage, low education workers and because of the increasing demand of annotated data sets this group is increasing. Also, because this work is increasingly getting outsourced, there is less direct control of who is doing the work and therefore the chance that a bad apple slips through is bigger. That's what's scaring me going forward.