The evolution of personnel evaluations at State is reflected in the dossier of Frances Elizabeth Willis, the first woman to make a career of the Foreign Service.
BY NICHOLAS J. WILLIS
As most Foreign Service Journal readers know, Employee Evaluation Reports for all U.S. Foreign Service personnel are signed by the rating officer, the rated employee, a reviewing officer and the panel chairperson. Further, the EER program has been vetted and fine-tuned by the Government Accountability Office in 2010 and again in 2013.
But as the career of Frances Elizabeth Willis, the third woman to join the Foreign Service—and my aunt—illustrates, the process wasn’t always so transparent and objective. Frances Willis entered the Service in 1927, serving for 37 years until reaching the mandatory retirement age of 65 in 1964.
Her personnel evaluations started in 1927 with grades and comments from instructors in the Foreign Service School (now the Foreign Service Institute), and ended in 1955 when she was evaluated for the last time, one month before she was promoted to Career Minister. Data for Lucile Atcherson, the first woman to enter the Foreign Service, has also become available for 1925 and 1926, so this article covers those two additional years, as well.
As H. L. Calkin documents in his 1978 book, Women in the Department of State, these female pioneers were actively discouraged from entering or staying in the Service. Just six women were accepted between 1922 and 1941, and only two stayed. Frances Willis was the first of these, and the evaluations in her dossier illustrate the gender-biased procedures used to hold her back professionally. More positively, they also remind us of the extent to which the State Department personnel evaluation system has evolved since then.
The annual Efficiency Rating could hardly be called a report, since it consisted of short, unsigned comments—sometimes just one word each—about the officer’s performance.
While many sources in the Foreign Service and Department of State generated personnel evaluations during this period, one element of the system remained constant: the Annual Efficiency Report submitted to the department by the employee’s onsite supervisor. Eventually the AER evolved into the EER, the paramount metric in the current system, but between 1925 and 1946 it was only one of many inputs considered. It became significantly more important after World War II.
Personnel evaluations generated during this period changed frequently and took various forms: narrative, numerical and even multiple-choice. Here is a chronology of their evolution.
The school instructed all newly commissioned Foreign Service officers, rated “FSO (unclassified),” in the elements of consular work: passports, visas, accounts, indexes, invoices, etc. The instructors conducted a written exam in each area, and then numerically rated the student on mental keenness, practical judgment, effectiveness and general attitude. They then attached comments to their rating, and the chief instructor ranked members of each class. This process was not unlike public school grading in the first half of the 20th century.
Gender bias reared its ugly head here, at the very beginning of Frances Willis’ career. Her passports instructor stated: “Miss Willis showed excellent judgment and other qualities, which in a man would have called for a higher rating.”
Four personnel evaluation reports appeared during this period: the Annual Efficiency Report, a special Inspection Report, an Efficiency Rating and a department-generated report. The AER was a one-page, narrative report generated by the onsite supervisor, describing the officer’s duties and performance. It usually ended with a comment about the officer’s suitability for retention, reassignment, promotion, etc. But occasionally it consisted of a simple, handwritten note, especially if the supervisor was a political appointee.
The American Foreign Service Inspection Report was the only structured format of the four. And despite its title, it was only used to evaluate consular officers. A multiple-topic, multiple-page, narrative document, it was written by a consul general who had been detailed as a Foreign Service inspector to evaluate a consular employee at post. Sections of the document addressed personality, mode of living, contacts, cooperation, standing and professional attributes, the last taking up two pages. It ended with the inspector’s opinion about the officer’s suitability to continue in the consular service or transfer into the diplomatic service.
The third format, the annual Efficiency Rating, was also generated by the department’s Consular Service. But it could hardly be called a report, since it consisted of short, unsigned comments—sometimes just one word each—about the officer’s performance in assigned duties: passports, shipping, notarials, etc., as viewed from the department. Following the comments were a date and rating—average, high average, good, very good and none.
In my aunt’s case, these comments appeared at random places in her dossier, including handwritten notations on the bottom and back of her 1928 Foreign Service School record. But perhaps it didn’t make any difference where the reports were filed, because none of these pioneering women were expected to last very long. Fortunately, these haphazard evaluations faded away in the early 1930s, though they were later resurrected in a more structured format. None of these reports indicated whether the officer had read them.
Little is known about the fourth State Department-generated report, in this case about Lucile Atcherson, the first woman to join the Foreign Service. Although Marilyn Greenwald, Lucile’s granddaughter, obtained Atcherson’s dossier and quoted narrative evaluations of her performance as a third secretary in the first chapter of a biography of Lucile’s daughter, she did not cite the actual documents. One quotation appears to have resembled the AER, while the department-generated quotation seems to have been an early version of a Rating Sheet—the next stage in the evolution of personnel evaluations.
The State Department adopted a two-year promotion process, using a selection board consisting of officers senior to the rated officer. After the board convened, it updated a Rating Sheet, the essential document for promotion, and permanently filed it in the officer’s dossier. This consisted of a one-paragraph summary of material in the dossier for the current, two-year period under review.
The material consisted of memos, letters, notes, newspaper articles and evaluation reports from the department and the onsite supervisor. Apparently, the one-paragraph summary was generated by a member of the current board, but no description of the process was included in the dossier. Then the summarizer (more accurately, a redactor) would assign a rating of excellent, very good, satisfactory or unsatisfactory and add the entry to previous paragraphs in the Rating Sheet. These paragraphs were dated, but neither signed nor initialed.
The redactor, in a special 1946 Rating Sheet review, gave her a mere “very good” rating, stating that Class III was high enough for her “because of her sex.”
This process made the board’s job much easier, because everything they needed to know about the candidate was summarized in the new paragraph. They could immediately tell whether the candidate was improving or slipping. But it also opened the door for mischief if the anonymous redactor were biased or had a personal ax to grind. In Frances Willis’ case, for example, a 1932 entry in her dossier reported that as a third secretary in Stockholm, she had assumed chargé duties, the first time a woman had done so. That report was indeed filed—but inside Frances’ 1928 Foreign Service School record, where it was overlooked by the redactor, who also ignored two press articles reporting the event that had been properly filed. As a result, that event was omitted from the first (1933) entry in her Rating Sheet, which assigned her an unimpressive “Satisfactory” rating.
Onsite, supervisor-generated AERs continued, virtually unchanged, during this period. However, the department-generated Efficiency Ratings returned as a short paragraph, covering both consular and diplomatic tasks, now with a rating (E [Excellent], VG [Very Good], S [Satisfactory], U [Unsatisfactory]) added, just like the Rating Sheet. They were not signed, but some bore reviewer initials. And in my aunt’s case, they often seemed to influence Rating Sheet deliberations more than AERs.
In response to abuses by Rating Sheet redactors, Under Secretary of State Joseph Grew and Dean Acheson, his successor in the position (the equivalent of today’s Deputy Secretary of State), mandated major changes in the evaluation process starting in 1946. These included abolishing the redactor’s rating on the Rating Sheet and transferring it to the Annual Efficiency Report, so that the onsite supervisor became responsible for rating the officer, just as is currently done. They also elevated the AER to major—not paramount, but major—importance in the promotion process.
While no directive is available defining these changes, consider the following events: Under Secretary Grew had appointed Frances Willis as his assistant in 1945 and given her an excellent review, immediately promoting her to Class III. Acheson subsequently gave her an outstanding review. Then they discovered that the redactor, in a special 1946 Rating Sheet review triggered by that promotion, gave her a mere “Very Good” rating, stating that Class III was high enough for her “because of her sex.” Frances’ next AER included—for the first time—a rating at the end of it, and the subsequent Rating Sheet entry had none.
The department-generated Efficiency Ratings—now signed—continued, along with a periodic Inspector’s Efficiency Report, which added 16 questions to be answered by the inspected officer (e.g., “What is your ultimate goal in the Foreign Service?” “Are you in debt?”).
New forms started to appear, as well. One was a seven-page Position Description with a two-page, 1,100-word set of instructions requiring a detailed description of duties, including the officer’s estimate—within a 5-percent margin of error—quantifying such factors as time taken, successes and adverse consequences for each duty.
A one-page, department-generated, Annual End-User Summary Report also appeared, with a narrative evaluation similar to that in the AER. It ended with a new numerical rating system, consisting of six boxes and the requirement to check one box. The boxes ranged from a low of 1 to a high of 6, with 6 defined as “... superior in every respect, denoting the highest degree of resourcefulness and initiative, with no recognizable room for possible improvement”—a high bar, indeed.
The AER, which remained nearly unchanged for 16 years (apart from minor revisions in 1933 and 1943), was massively revised and expanded in 1949. Not only did it grow from one to six pages, but it now had four parts. Part II alone listed 13 factors to be graded in three categories: superior, satisfactory and not up to standard. These included versatility in knowledges (sic) and skills, such as accuracy, productivity, trustworthiness and reliability.
Part II concluded with narrative comments from the reviewer, similar to the old AER. Part III was a short section grading language skills, and Part IV listed activities to be graded, including political, economic, consular, etc., much like the Foreign Service cones that showed up at this time.
Part I was the oddest section of all, consisting of 31 groups of statements descriptive of FSO performance. The reviewing officer was required to underline the most descriptive and cross out the least descriptive. Here are typical examples:
A) He will probably not go much further in the Service.
B) He demands a high degree of efficiency from those associated with him.
C) He is not active in seeking desirable contacts.
D) He is imaginative.
E) He is probably one of our future Career Ministers.
A) He has a good sense of humor.
B) He is adaptable.
C) He shows little taste in his clothes.
D) He is inclined to be pompous.
This evaluation process is flawed on three counts. First, the statements in each group are binary, either good or bad, and usually very good or very bad, which significantly skews the scoring. Second, the statements have very little correlation with each other: What does a sense of humor have to do with taste in clothes? Third, weighing answers between groups is entirely subjective: Is “probably one of our future Career Ministers” twice as good as “adaptable”? Four times as good? The saving grace was that narrative comments were allowed to remain.
This process made the board’s job much easier, because everything they needed to know about the candidate was summarized in the new paragraph. But it also opened the door for mischief.
In my aunt’s case, this new format appeared when she was assigned as first secretary/consul in the political section at the London embassy and had received excellent reviews. While she received an anonymous 89 score (out of 100, or a Very Good rating) on Parts I–IV, the Part II narrative section, written by Minister-Counselor Julius Holmes and approved by Ambassador L.W. Douglas, recommended her for appointment as a chief of mission.
One explanation for this odd addition is that while the number of State Department employees had stayed relatively constant (1,000-2,000) between 1900 and 1940, it rapidly rose to more than 10,000 during World War II, and topped 16,000 by 1950. With too much time on their hands, these bureaucrats created tasks to keep themselves busy, including massively revising and expanding existing reports and procedures and issuing new requirements.
Fortunately, order was restored in 1952 when the AER was totally revised and renamed “Efficiency Report.” It now had six parts: Parts I–V numerically graded the employee, using the new six-point grading system, on duties performed, personal qualities, factor analysis (30 factors about knowledge, performance and personality traits) and language, followed by a single, overall rating number.
Part VI, Summary Comments and Recommendations, covered 15 topics such as attitude, executive ability, physical fitness, adverse factors, etc., followed by summary comments, all in narrative form. In addition, there were boxes asking if a review panel was used and if the report was discussed with the officer under review—obviously desirable, but not mandatory steps in the evaluation. This format continued up to 1955, when Frances Willis received her final evaluation, one month before she was appointed Career Minister. She was promoted to Career Ambassador in 1962.
Frances Willis never complained about discrimination during her Foreign Service career. In fact, she said just the opposite in a 1951 speech to the National Council of Women in Finland: “I can say to you with complete honesty that since the day when I entered the Foreign Service I have been given equal treatment with the men in the Service. I have heard it said, of course, that there is discrimination against women who wish to enter the Service. All I can say is that my personal experience does not bear that out.”
It is likely that my aunt never reviewed her dossier, which contained many negative gender bias comments over the first 20 years of her career. Even if she had done so, she would probably have simply ignored them and pressed on.
In important ways, personnel evaluation is always a work in progress. For example, numerical grading systems have their own flaws. In the 1970s such a system was abused by some U.S. Air Force reviewing officers who wanted to ensure that the best and brightest members of their organization were promoted, so gave them the maximum 4.0 grade in every factor in the review. It didn’t take long for word to get out about that tactic, which greatly increased the number of perfect reviews submitted, ending the utility of that system. And, of course, a similar tactic has captured academic grading to an even greater degree, with grades higher than 4.0 being routinely awarded.
The fallback position is to revert to the ponderous—but much harder to abuse—narrative-type evaluation, which the Foreign Service has embraced.