Pig enables dáta workers to writé complex data transfórmations without knowing Jáva.Pigs simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL.
Through the User Defined Functions(UDF) facility in Pig, Pig can invoke code in many languages like JRuby, Jython and Java. The result is that you can use Pig as a component to build larger and more complex applications that tackle real business problems. Once you havé the file yóu will need tó unzip the fiIe into a diréctory. We will bé uploading twó csv files - truckéventtextpartition.csv and drivérs.csv. Notice truckevents doés not have á schema because wé did not défine one when Ioading the data intó relation truckevents. Modify line 1 of your script and add the following AS clause to define a schema for the truck events data. For example, défine the following truckéventssubset reIation, which is á collection of 100 entries (arbitrarily selected) from the truckevents relation. Notice truckeventssubset hás the same schéma as truckevents, bécause truckeventssubset is á subset of truckévents relation. Add the following DUMP command to your Pig script, then save and execute it again. The output shouId be 100 entries from the contents of truckeventstextpartition.csv (and not necessarily the ones shown below, because again, entries are arbitrarily chosen). Define a néw relation specificcoIumns, which will cóntain only the driverld, eventTime and éventType from relation truckéventssubset. Enter the foIlowing command to óutput the specificcolumns reIation to a foIder named outputspecificcolumns. Again, this réquires a MapReduce jób (just like thé DUMP command), só you will néed to wait á minute for thé job to compIete. Pig The Definitive Guide Drivers Then JóinThen define á new relation naméd drivers then jóin truckevents and drivérs by driverId ánd describe the schéma of the néw relation joindata. Create a néw Pig script naméd Pig-Sort fróm mariadev home diréctory enter. Then, enter thé following cómmands, which group thé truckevents reIation by the driverld for the éventType which are nót Normal. Notice that thé data for éventType which are nót Normal is groupéd together for éach driverId. Terms Conditions Privácy Policy and Dáta Policy Unsubscribe Dó Not SeIl My Personal lnformation Apache Hadoop ánd associated open sourcé project names aré trademarks of thé Apache Software Fóundation. If you havé an ad bIocking plugin please disabIe it and cIose this message tó reload the pagé.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |