The explosive growth of web archives presents a significant challenge: manually curating specialized document collections from this vast data. Existing approaches rely on supervised techniques, but recent advancements in Large Language Models (LLMs) offer new possibilities for automating collection creation. Large Language Models (LLMs) are demonstrating impressive performance on various tasks even without fine-tuning. This paper investigates the effectiveness of prompt design in achieving results comparable to fine-tuned models. We explore different prompting techniques for collecting specialized documents from web archives like UNT.edu, Michigan.gov, and Texas.gov. We then analyze the performance of LLMs under various prompt configurations. Our findings highlight the significant impact of incorporating task descriptions within prompts. Additionally, including the document type as justification for the search scope leads to demonstrably better results. This research suggests that well-crafted prompts can unlock the potential of LLMs for specialized tasks, potentially reducing reliance on resource-intensive fine-tuning. This research paves the way for automating specialized collection creation using LLMs and prompt engineering.
https://dl.acm.org/doi/10.1145/3677389.3702591
| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |