Show simple item record

dc.contributor.authorTaylor, Stacey Dianne
dc.date.accessioned2024-06-07T18:00:17Z
dc.date.available2024-06-07T18:00:17Z
dc.date.issued2024-06-07
dc.identifier.urihttp://hdl.handle.net/10222/84277
dc.description.abstractInformation is fundamental to decision-making. Yet, data is very sparse for the financial domain, even though, in this era of big data, it seems abundant. The work presented in this thesis addresses that scarcity over seven projects which investigate and examine creating synthetic financial data, both quantitative and textual. In the first two projects, we examine methods to generate synthetic financial statement data as well as the effects of synthetic data on a downstream classification task. The next four projects evaluate how well ChatGPT generates textual financial data for the notes to the financial statements, selected parts of financial reports, as well as how it adapts its responses based on the identified knowledge of its end users, ranging from a non-financial user to a financially sophisticated user. The authorship attribution project is of the utmost importance particularly since company authorship attribution has not been studied yet, to the best of our knowledge. We have author profiles and a good understanding for identified authors such as William Shakespeare, Mary Shelley, or George Washington, but we do not yet have that depth of understanding and identifiability for corporate communication. This attribution task is a non-trivial problem given that lengthy corporate communication is often collaboratively written by many authors, many (or all) of which are never identified, with contributions by non-writing authors as well who vet and review the text or sign off on the text, for example. This plethora of unidentified authors means that we have to treat the text as a single "figurehead" author, with the understanding that many (likely) unidentified authors (writing and not) have contributed to the work. In our experiments, the Common N-Gram Distance algorithm provided the best and most consistent results, achieving between 95% and 100% accuracy for character n-grams and 100% accuracy for word n-grams. Tools like ChatGPT can be exploited and used to commit fraud. Given the potential for significant effect and harm on the capital markets, tools that can quickly detect fraudulent corporate communication will be needed. Our research contributes to that effort.en_US
dc.language.isoenen_US
dc.subjectMachine Learningen_US
dc.subjectNatural Language Processingen_US
dc.subjectGenerative AIen_US
dc.subjectAccountingen_US
dc.subjectFinanceen_US
dc.titleAugmentation of Financial Datasets and Evaluating Financial Text Generated By A.I.en_US
dc.date.defence2024-05-02
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Howard Hamiltonen_US
dc.contributor.thesis-readerDr. Evangelos Miliosen_US
dc.contributor.thesis-readerDr. Malcolm Heywooden_US
dc.contributor.thesis-readerDr. Vladimir Lucicen_US
dc.contributor.thesis-supervisorDr. Vlado Keseljen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record