Coding a domain specific text generator

2023-01-24 01:11 问答作者：

A friend of mine is in the real estate business and after being showed the art of writing copy for real estate ads, I realized that it is very formulaic. Especially when advertising online as there are predefined fields you fill in.

Naturally, I thought about creating a generator that pretty much automates writing 开发者_运维技巧the ads. i don't expect it to generate outstanding or even very good copy, just that it can put together words and sentences like a human would.

I have a skeleton/template that defines an ad and I've also put together a set of phrases and words that can be randomly selected, but I am interested in more general aspects of coding such a generator? Any suggestions, tips or literature that I can read to better understand this little project better?

using metadata about the listing would be one way.

Say for a given house, you have these attributes:

(type: bungalo, sq feet: <= 1400) You could use the phrase "cozy cottage".

bedrooms: obvious, same thing with bathrooms. Assume using the word Large, medium, etc.

garage spots: if > 2 then "Can park many vehicles", etc.

You could go even further with this given the lat/lon for the address, there are web services that you can find the amount of parks nearby, crime in the neighborhood, etc.

Rick

I'd say there are three basic approaches you could take to a problem like this, depending on how flexible you want the system to be and on how much work you want to put into it. The simplest is to treat it as a report generation problem, along the lines of Rick's suggestion. That's probably the way I'd go to produce a first draft of a listing. The results would be pure boilerplate, but each listing could be quickly punched up by the copywriter.

If you wanted to get fancy, though, you could come at it as a natural language generation problem. You'd start with some kind of a knowledge representation describing the meaning of the listing and set of rules (finite state transducers, say) for mapping meanings to linguistic forms. There's a sizable academic literature on that kind of stuff, though it's kind of out of fashion these days. Places to start might be Blackburn & Bos's book or the NLTK suite (especially some of the projects in the contrib package).

The third way of doing it would be to treat it as a translation problem, essentially "translating" database entries into ad copy. You'd start with a large collection of listings and the corresponding human-written ads and construct a statistical model of the relationship between the two. Moses/Giza++ is a general purpose tool for building and applying such models.

继续阅读：generator linguistics

Coding a domain specific text generator

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？