fbpx

Unleashing the Power of ChatGPT: A Deep Dive into IELTS Task 1 Essay Evaluation with GPT-4-Vision

In the world of education, technology has become an indispensable ally for both students and teachers. It has not only transformed the way we learn but has also streamlined various academic tasks. One such revolutionary tool that has taken the educational realm by storm is ChatGPT. Its ability to assist with everyday tasks, including essay evaluation, has made it a valuable asset for students and educators alike.

In this blog post, we embark on a journey to explore the accuracy and effectiveness of the GPT-4-Vision model of ChatGPT in the context of assessing the accuracy of Writing Task 1 essays. As we delve deeper, we will dissect four common question types, putting ChatGPT to the test. Join us in this exciting exploration of the potential of AI in education.

The two plans show an industrial area as it is now and its redevelopment plan for the future.

Overall, the site is expected to experience a host of significant alterations, with the most prominent being the development of accommodation, greenery and transportation amenities at the expense of workplaces. This, as a result, will completely transform this industrial place into a commercial area.

At present, the industrial complex is surrounded by Toe Lane, Chimney Road, Pie Walk, and Swallow Street, but workers can only access the place via the entrance and reception located on Pie Walk in the north. Once inside, they can park their vehicles in the parking lot situated in the east of the site, just below which is a factory. In the south of the map is an office building that also houses a canteen, while in the east lies a factory. Finally, there is a yard in the middle of the site.

Following redevelopment, only the surrounding roads will remain unchanged. However, the parking and storage spaces will be removed, making way for 3-bedroom and 4-bedroom houses. The entrance, reception and yard will be replaced by three 2-bedroom houses and communal gardens, bordered by two private roads. Lastly, the office building and factory will be turned into parking garages and an apartment with its own parking area, respectively.

* Bold text indicates intentional errors

Source: http://zim.vn/

Response from Chat GPT

The description provided mostly aligns with the information depicted in the maps, but there are a few discrepancies:

  1. The description mentions that the entrance and reception are located on Pie Walk in the north, which is not clearly specified in the current map. The entrance and reception are indeed on the northern side of the industrial site, but the map does not explicitly state that this is Pie Walk. – Vision correctly identified an intentional mistake.
  2. The parking lot is described as situated in the east of the site, which is incorrect according to the current map; it is actually in the western part, adjacent to Toe Lane. – Vision correctly identified an intentional mistake.
  3. The office building is mentioned as being in the south, but it is more accurately placed in the central part of the map, with the canteen at the southern end of the office building. – Vision correctly improved the essay’s accuracy.
  4. The factory is incorrectly stated as being in the east when it is actually on the eastern side. – Vision hallucinate.
  5. The description of the redevelopment plan is generally correct. However, it specifies that the entrance, reception, and yard will be replaced by three 2-bedroom houses and communal gardens, which is a slight simplification. The map shows two 2-bedroom houses in place of the entrance and reception area, with the third 2-bedroom house slightly to the east. – Vision correctly improves the essay.
  6. The office building and factory being turned into parking garages and an apartment respectively are not detailed in the redevelopment map. Instead, the redevelopment plan shows a 4-bedroom house where the parking lot used to be, and the factory’s location will become an apartment building with a parking area. – Vision correctly improved the essay’s accuracy.
  7. The description says the site will transform into a commercial area, but the redevelopment plan indicates it will be a residential area with houses and apartments, not commercial. – Vision correctly identified an intentional mistake.
  8. Lastly, the description does not mention the orientation change in the redevelopment map; the north is oriented differently in the two maps. – Vision incorrectly describe the map.

Overall, the description captures the essence of the transformation from an industrial site to a residential area, but it contains several inaccuracies regarding the layout and positioning of certain elements.

Key points:

  • Vision does have spatial awareness, which allow it to understand the map and identify most of the mistakes
  • It’s not flawless and sometimes hallucinate.

Process

The diagram shows the process by which pineapples are grown and processed into different products.

Overall, this process involves both natural and man-made elements and can be divided into two main stages: the farming of pineapples and the processing of the crop, which eventually turns into a beverage, a canned food, or an export.

First, pineapples are grown in the field with their crowns spaced 26 centimeters apart, and in a climate where the temperature is 23 or 30 degrees Celsius. After 3 months, ethylene is sprayed on the plants, after which they continue to grow for an additional 8 months until they are ready for harvesting. At this stage, the pineapples should be 30 centimeters high and 2 kilograms in weight.

The process continues with the harvested pineapples being washed prior to being graded according to their sizes. Once sorted, the smallest and medium-sized pineapples have their tops removed and their rinds peeled. While the former pineapples are then put into an extractor to produce juice, the latter are cut into slices or smaller chunks, both of which are subsequently bottled. In contrast, the largest pineapples are simply coated in wax and placed in crates before being shipped abroad.

* Bold text indicates intentional errors

Source: http://zim.vn/

ChatGPT Response

The description provided contains several inaccuracies when compared to the diagram:

  1. The timeline for growing pineapples is incorrect. The diagram shows a growth period of 7 months before ethylene is applied, followed by an additional 5 months of growth, not 3 months before and 8 months after ethylene application as described. – Vision correctly identified an intentional mistake.
  2. The spacing between the pineapple crowns is correctly noted as 26 centimeters.
  3. The temperature range for growing pineapples is correctly stated as being between 23 to 30 degrees Celsius. – Vision correctly describe the chart and accidentally identified an intentional mistake.
  4. The description of the pineapple’s size at harvest is accurate — they should be 30 centimeters high and weigh 2 kilograms.
  5. The post-harvest process is misdescribed. The diagram does not show a washing or grading process. Instead, it depicts that pineapples are taken to an unspecified processing step (The diagram actually does show there’s a washing and grading process), which leads to three different outcomes:

– Some pineapples are sent to an extractor where their juice is extracted.

– Others are cut into slices or chunks, which are then canned.

– The remaining pineapples are coated in wax and packed into crates for export.

  1. The size classification of the pineapples (smallest and medium-sized for juicing and canning, largest for waxing and export) is not depicted or mentioned in the diagram. (The diagram actually does show there’s a size classification process)
  2. The diagram does not explicitly mention that the pineapples are graded according to size, nor does it show a differentiation in processing based on size. (The diagram actually does show the pineapples are graded according to sizes)

In summary, while the description captures the general stages of pineapple growth and processing, it inaccurately details the timeline and specific steps involved in the process as per the diagram. The diagram illustrates a clear sequence from growing to processing pineapples for juice, canning, and export, without indicating size grading or a distinction in processing based on pineapple size.

Key points:

  • Vision does has the ability to understand the process, the meaning of symbols, numbers and identify errors
  • Some limitations: cannot tell the differences between sizes (or at least small differences), and will sometimes hallucinate and make mistakes

Line graph (approximate data)

The charts show how many passengers traveled by train between 2000 and 2009, along with the punctuality rate of trains.

Overall, there was an downward fluctuation in the number of rail passengers during the given period. Additionally, except for the period between 2004 and 2006, more trains ran punctually. Although the punctuality target remained unchanged, it was either achieved or exceeded only from 2002 to 2005 and again during the last two years of the period in question.

At the beginning of the period, 30 billion people traveled by train, after which it had risen steadily, reaching a peak of roughly 30 billion by 2005, despite a slight drop in 2003. Between 2005 and 2008, a gradual decline of about 9 billion passengers was witnessed, with a subsequent drop to approximately 45 billion in the final year.

During the whole period, the punctuality goal was consistently set at 95%. The percentage of trains being punctual, meanwhile, increased considerably from 92% initially to 96% 4 years later, followed by a substantial rise to the former level in 2005. Thereafter, the figure soared to a high of 97% in 2008 before leveling off toward the end of the time frame.

Source: http://zim.vn/

ChatGPT Response

The description provided does not accurately reflect the data shown in the charts:

  1. The number of passengers does not show a consistent downward fluctuation but rather remains relatively stable with some variation throughout the period. The description’s mention of “a peak of roughly 30 billion by 2005” is incorrect since the graph indicates that the number of passengers never reaches this figure; it fluctuates slightly above and below the 40 billion mark. – Chat GPT correctly identified an intentional mistake.
  2. The description suggests a “gradual decline of about 9 billion passengers” between 2005 and 2008, which is not depicted in the graph. The number of passengers does not show such a dramatic decline; instead, there is only a slight variation around the 40 billion mark throughout the years. (There’s indeed a decline of around 9 billion passengers)
  3. The description inaccurately states the number of passengers in the final year as “approximately 45 billion.” According to the graph, the number remains around 40 billion, similar to the rest of the period. (The number is indeed around 45 billion)
  4. Regarding punctuality, the description is partly correct in noting that the punctuality target is consistently at 95%. However, the description of the punctuality rates’ changes over time is not entirely accurate. There is no “substantial rise to the former level in 2005” as the percentage of trains running on time actually decreases significantly below the target around 2005 and 2006. – Chat GPT correctly identified an intentional mistake
  5. The description incorrectly suggests a “figure soared to a high of 97% in 2008,” when in fact, the graph shows a sharp decline in punctuality starting in 2005, reaching its lowest in 2006, and then recovering to slightly above the 95% target in 2009. – Chat GPT correctly identified an intentional mistake.
  6. The punctuality of trains does not increase consistently at the beginning but rather starts above the target, dips below it in the middle of the period, and then ends slightly above the target. – Chat GPT correctly identified an intentional mistake.

In summary, the provided description does not accurately reflect the trends shown in the charts. The number of train passengers remains around 40 billion without major declines or rises, and the percentage of trains running on time shows a significant dip in the middle of the period before recovering towards the end, not a steady increase or high peak as described.

Key points:

  • Vision does seem to be able to understand about the trends of the graph
  • However, Vision struggle and failed at correctly identify “approximate” data from line graph, if this number is explicit, the result may be different.

Bar chart (explicit data)

The bar chart illustrates how many visitors of three types visited a particular museum from 1997 to 2012. Overall, although there were some minor fluctuations, the total number of visitors to this museum increased, with those below 15 years old being the only category with a upward trend. It is also noteworthy that adult visitors were the overwhelming majority throughout the years, whereas the opposite was true in the case of their special exhibition counterparts.

Regarding those of adult age visiting the museum, there were 200,100 in the first year. Over the following decade, their attendance experienced a significant rise to reach a peak of 400,500, but by 2012 had declined minimally to 82,300. Special exhibition visitors followed a similar upward trend, albeit at a much lower rate, growing gradually from 28,000 initially to end the period at 380,000.

In contrast, 28,000 under-15-year-old visitors went to the museum at the beginning of the timeframe, followed by a steady rise to 82,300 in the final year.

Turning to the total attendance at the museum, it started at 448,100 visitors before climbing to a high of 525,200 in 2007. By the end of the period, the figure had fallen, reaching 504,300 visitors.

* Bold text indicates intentional errors

Chat GPT response

Based on the data from the bar chart image you provided, here is the information in a tabular format:

Year Adult Visitors Visitors Aged Under 15 Special Exhibition Visitors Total
1997 300,100 120,000 28,000 448,100
2002 302,200 110,000 29,200 441,200
2007 400,500 87,000 37,700 525,200
2012 380,000 82,300 42,000 504,300

Now let’s evaluate the accuracy of the description against this table:

  1. The total number of visitors to the museum indeed increased from 1997 to 2007, showing a slight decline by 2012, which supports the statement in the description.
  2. The description states an upward trend for visitors under 15 years old, but according to the data, their numbers actually decreased from 120,000 in 1997 to 82,300 in 2012. – Chat GPT correctly identified an intentional mistake.
  3. Adult visitors were the largest group every year, which is consistent with the description.
  4. The description inaccurately reports the number of adult visitors in the first year as 200,100, while the chart shows 300,100. – Chat GPT correctly identified an intentional mistake.
  5. Adult visitor numbers peaked at 400,500 in 2007, not over the following decade as the description suggests, and the number in 2012 was 380,000, not 82,300. – Chat GPT correctly identified an intentional mistake.
  6. Special exhibition visitors increased from 28,000 in 1997 to 42,000 in 2012, not to 380,000 as mentioned in the description. – Chat GPT correctly identified an intentional mistake.
  7. The total attendance figures provided in the description for 1997 and 2007 are correct, but the total for 2012 should be 504,300, not 525,200. – Chat correctly describe the data but this is not a mistake.

The description contains several inaccuracies when compared to the actual data from the chart.

Key points:

  • With explicit number on the chart, Vision does has the ability to tell the difference and identify numerical mistakes
  • Vision still misses some intentional mistakes, but to be fair, the essay does have lots of intentional errors and more careful prompting may help alleviate this.

 

Bài viết liên quan

Phản hồi

Learner

  • 10 bài chấm/ ngày

  • Tốc độ trả bài chậm

  • Có thể không truy cập được ở giờ cao điểm

Plus

199K

119K/th

  • Không giới hạn bài chấm

  • Tốc độ trả bài nhanh hơn

  • Truy cập 24/7

  • Hoàn tiền 30 ngày

    Bạn được đảm bảo trong 30 ngày đầu tiên được phép hoàn tiền bất kỳ lúc nào với bất kỳ lý do nào.

Best for Teacher

Premium

249K

149K/th

  • Gói Plus

  • Hỗ trợ kĩ thuật

  • Xuất file Word/Google Docs kèm comments: Link Demo

    - Bài chấm sẽ dc xuất kèm comments gợi ý vocab
    - File Word có thể dc up lên Google Docs và các comments sẽ dc giữ nguyên
    - Các comments có thể dc chỉnh sửa theo ý muốn của gv
    - File Word cá nhân hóa & White label

  • Hoàn tiền 30 ngày

    Bạn được đảm bảo trong 30 ngày đầu tiên được phép hoàn tiền bất kỳ lúc nào với bất kỳ lý do nào.

VIP

499K

299K/th

Learner

  • 10 bài chấm/ ngày

  • Tốc độ trả bài chậm

  • Có thể không truy cập được ở giờ cao điểm

Plus

199K

159K/th

  • Không giới hạn bài chấm

  • Tốc độ trả bài nhanh hơn

  • Truy cập 24/7

  • Hoàn tiền 30 ngày

    Bạn được đảm bảo trong 30 ngày đầu tiên được phép hoàn tiền bất kỳ lúc nào với bất kỳ lý do nào.

Best for Teacher

Premium

249K

199K/th

  • Gói Plus

  • Hỗ trợ kĩ thuật

  • Xuất file Word/Google Docs kèm comments: Link Demo

    - Bài chấm sẽ dc xuất kèm comments gợi ý vocab
    - File Word có thể dc up lên Google Docs và các comments sẽ dc giữ nguyên
    - Các comments có thể dc chỉnh sửa theo ý muốn của gv
    - File Word cá nhân hóa & White label

  • Hoàn tiền 30 ngày

    Bạn được đảm bảo trong 30 ngày đầu tiên được phép hoàn tiền bất kỳ lúc nào với bất kỳ lý do nào.

VIP

499K

399K/th

Learner

  • 10 bài chấm/ ngày

  • Tốc độ trả bài chậm

  • Có thể không truy cập được ở giờ cao điểm

Plus

199K/th

  • Không giới hạn bài chấm

  • Tốc độ trả bài nhanh hơn

  • Truy cập 24/7

Best for Teacher

Premium

249K/th

  • Gói Plus

  • Hỗ trợ kĩ thuật

  • Xuất file Word/Google Docs kèm comments: Link Demo

    - Bài chấm sẽ dc xuất kèm comments gợi ý vocab
    - File Word có thể dc up lên Google Docs và các comments sẽ dc giữ nguyên
    - Các comments có thể dc chỉnh sửa theo ý muốn của gv
    - File Word cá nhân hóa & White label

VIP

499K/th

  • Everthing in Premium

  • Hand Writing Image Recognition

  • Better Accuracy with GPT-4

  • Early Access to New features

    - Speaking Feedback

  • Customization

    We help with minor customizations to get it working just right.

  • Support Development of New Features

    • Speaking Practice
    • Classroom Management (e.g., Google Class Room)
    • Reading Practice
    • Listening Practice